About

I am a seasoned Data Scientist with a strong work ethic and 6+ years of professional analytic experience. I am skilled in statistical methods, analytical techniques and software. I have a deep understanding of several statistical models. And a proven track record of organizing and managing a multitude of difficult projects simultaneously and delivering them complete and on time.

I enjoy process optimization. I like automating and simplyfying procedures, making them more efficient and less error-prone. I absolutely love data visualization! It combines my information summarization knowledge and art background. Anyone who knows me knows that organization is my strongest skill (both professionally and personally).

I have built data pipelines that extracted, transformed, cleaned, scored, harmonized, and stored data from 80+ different sources. Data from countries all over the world, translated in multiple spoken languages

Made many dashboards in Tableau, R, and PowerBI.

Co-authored multiple academic publications

Built websites, the user interfaces (PHP/HTML), tables and models behind them (SQL). I enjoy teaching others and have made R interactive tutorials and YouTube tutorials. Volunteer to train onboarding colleagues. I created my own internships, my first project led to a job offer.

Designed experiments from a statistical perspective & conducted analysis of experimental data. Survey data, categorical, and quantitative research.

Made many predictive models in a variety of fields: biology, finance, real estate, baseball, public health, but most of them in health and medical data (my personal favorite).

I have worked on projects in marine biology, human behavior, social media data, cancer research, genetic psychiatry research, cancer diagnostics.

Now, I focus on process improvement solutions, building tools that assess the quality of medical devices, provide leadership with data driven solutions, and help make people's goals easier to complete

Technical: R, SQL, SAS, MS Access, PowerBI (DAX & M), Tableau, Advanced MS Excel, Visual Basic, SPSS, Java, HTML, PHP

Education

MS Statistics, May 2020

San Diego State University

Relevant Coursework: Statistical Communication, Predictive Analytics, Computational Database Fundamentals, Statistical Consulting, Machine Learning, Multivariate Statistics, Advanced Mathematical Statistics, Advanced Biostatistical Methods

BS Statistics, May 2016

San Diego State University

Relevant Coursework: Applied Regression Analysis, Data Analysis and Statistical Inference, Intermediate Computer Programming, Actuarial Modeling, Applied Multivariate Analysis, Probability and Statistics, Programming Languages, SAS programming, Data Management, Spatial Data Analysis, and Statistical Methods

Experience

Senior Business Systems Analyst (Data Analytics)

BD Jun 2020 – Present

  • Streamlined method for monitoring recall effectiveness by creating a PowerBI dashboard that aggregates data counts, calculates normalized rates, and provides trend charts. The intricate dashboard completely automated the process for active monitoring of recall-related complaints and failures. It reduced the number of hours spent working on meetings presentation slides and provided the capability for leadership to obtain summarized and raw data at any time. Previously, this process took two workdays to compile and was only done once a month prior.
  • Created a tool that enhanced existing data querying process. Analysts in my team are now able to utilize Power BI Dashboard extracts in order to produce datasets to specification. They can use point-and-click filters and slicers to select desired data. Additionally, they can write code into a DAX template that follows similar coding structure to SQL queries in order to export more advanced ad-hoc searches. This reduced the number of hours spent working on data requests and facilitated the data request process for the team.
  • Developed dashboards that provided daily and weekly reporting of numerous complaint metrics. These dashboards were instrumental in the elimination of the over 6000 open complaint backlog. The dashboards provided a broad view of current complaint status and age, as well as daily worklists for a number of teams.
  • Used data mining methods to analyze unstructured data sets for text categorization project.
  • Provided Marketing and other customer-facing employees with data in preparation for customer meetings. Built sophisticated queries to extract specified information related to infusion complaint and servicing records. Structured reports for ease of understanding.

Data Scientist

UC San Diego Health Sep 2017 – Jun 2020

  • Pioneered data pipeline that integrates a multitude of individual R scripts and implements algorithms that assess quality, manipulate, and re-format raw data. Resulting in clean, scored data for over 30 psychiatric measures.
  • Utilize data visualization and summarizing reports in R to illustrate staff and partners on key data findings and suggest actionable recommendations during bimonthly presentations.
  • Spearheaded project coordination, including independently obtaining data from over 90 international cohorts (including over 400,000 individual subjects) in order to provide scientists around the world with a large database of PTSD cases.
  • Restructured data collection and storage methods in order to improve laboratory efficacy.
  • Responsible for database management and retrieval using SQL.

Data Analyst

UC San Diego Health Jun 2017 – Aug 2017

  • Provided program director with appropriate statistical and data analyses in a timely manner and communicated key into research staff and physicians.
  • Drafted plans for analysis procedures. Performed data exploration and data modeling using SAS.
  • Produced conclusive reports that summarized key insights.

Biostatistics Analyst

Trovagene Aug 2016 – Mar 2017

  • Developed a user-friendly data collection tool that tool improved data acquisition and storage methods. This saved the company time and money as it no longer needed to hire outside consultants for website development using VBA, Excel and MS Access.
  • Communicated results, analysis plans, and methods to a nontechnical, nonstatistical audience in the form of written reports, presentations, graphs, slides, and verbal communication.
  • Supported Product Development, Clinical Affairs, and R&D departments with statistical consultation and analysis plans.
  • Created Standard Operating Procedures (SOP) for tools implemented, data storage methods, and secure sharing procedures. Trained departments in the use of such procedures.
  • Evaluated performance for all assays, conducted analysis and wrote assay validation reports.

Research Assistant

Center for Human Dynamics in the Mobile Age (HDMA) Oct 2015 – Aug 2016

  • Developed data dashboards using Tableau to help partners gain deeper insight into data through visualization.
  • Data mining and management using SQL.
  • Coauthored Web-based Geo Visualization Application for mapping Cancer Mortality Rates project and had poster displayed in esri User Conference.
  • Coauthored abstract on Preventable Cancer Systematic Literature Review.
  • Worked on all statistical, data analysis and reporting tasks for San Diego Cancer Research Project in coalition with the San Diego County using R and SAS.
  • Extracted, cleaned raw data, summarized, reported and presented quantitative data analysis results using SAS and SQL.

Statistics Intern

Marine Conservation Ecology Lab, SDSU Aug 2015 – Dec 2015

  • Semester-long internship
  • Tested generalized linear and mixed-effects models to determine the effects on habitat structure and faunal abundance.
  • Collected, entered, and managed all experiment data.
  • Created reports that summarized data analysis using SAS.

Projects

Machine Learning: Clustering Analysis of Twitter Users

Identified Most Influential Local Twitter Users. Excecuted data extraction and manipulation. Conducted clustering analysis and visualization. Presented of results to research center and partners.

Machine Learning: Principal Component Analysis

Conducted Factor Analysis in census tract data in order to better understand socioeconomic patterns in San Diego County Cancer Patient dataset. Used knowledge gained from analysis to reduce the set of variables in the dataset.

SQL, HTML, and PHP: Created an interactive website front end, and stored inputs into database

Created a website using HTML, PHP and mySQL. Designed an ER model to store website data into the database. Maintained a connection to insert and update data from the website’s front page to the MySQL backend. Website would take in friend's birthday information (i.e. date, favorite restaurant, store).

Modeling: YouTube Video Series (Multiple Linear Regression with Interaction)

Recorded a Series of YouTube Video in collaboration with fellow MS cohort mates. Gave a complete tutorial for Multiple Linear Regression with interactions that touched on the theoretical piece, exploratory data analysis, R code, diagnostics and applications.

R packages: Dashboard using flexboard and R markdown

Used R Markdown and the flexboard R package to create a dashboard summarizing psychological data from a local psychiatric Study.

R packages: learnr tutorial of ggplot

Interactive tutorial for learning ggplot, made using the learnr R package

Predictive Modeling: Biopsy Result Classification

The goal of this analysis was to build a logistic regression model that was able to determine what variables are able to predict a prostate biopsy result. Being able to predict the results of a biopsy is of high importance as it reduces patient burden, patient surgery recovery time, and the probability of the cancer spreading upon biopsy retrieval.

Predictive Modeling: Hospital Patient Survival

The goal of this analysis was to build a model that was able to predict whether a patient will recover or die based on a set of physiological measurements taken upon patient admission. The analysis helped determine which physiological measurements were essential to patient survival. The hope is that this information will help physicians establish the most appropriate treatment for a patient and in turn increase their chance for survival.

Predictive Modeling: Prosatic Capsule penetration Classification

Patients with prostate cancer have a great chance of survival when the tumor is located within the bounds of the prostatic capsule. Relative survival rates drop considerably when the tumor has penetrated the prostatic capsule and has spread through the body. In order to establish the most appropriate treatment for a patient, it is important for physicians to know whether the prostatic capsule has been penetrated. Using logistic regression, we built a model that is able to predict whether a patient’s tumor has penetrated the prostatic capsule with 0.84% accuracy and high sensitivity (0.842)

Predictive Modeling: New York Housing Price Prediction

Built a regression model for predicting the house prices in various New York state counties using various economic and socioeconomic variables.

Exploratory Data Analysis: Cancer in San Diego County

I created a Tableu dashboard and conducted Exploratory Data Analysis to explore Cancer patients in San Diego County. Presented Dashboard and results to lab's Director and other university professors. This simple project is very soecial to me as it lead to me being offered a paid position at the center (I was volunteering to get professional experience).

Non-Parametric Data Analysis: Analyzed data from California Active Duty Military Resilence data

Used a variety of Non-Parametric tests (Wilcoxon rank sum test, Kruskal-Wallis rank sum test, Wilcoxon Pairwise Comparison test) to analyze data from a Trauma and PTSD Study

Data Analysis: The Effects of Irradiation on Breast Cancer Recurrence in Women

We conducted a retrospective study that analyzed data from a cohort of 286 women ages 20-79 who had previously been diagnosed with breast cancer and treated either with or without radiation therapy at the time of their first diagnosis. Chi squared tests were performed in order to determine if prior radiation treatment impacted the recurrence of cancer

Data Analysis: Transportation Providers and cell phone use (submitted for Publication)

We studied Distracted Driving Behavior Related to Cell Phone Use Among Uber/Lyft/Taxi drivers. The goal of this study was to address this gap in literature via the characterization of attitudes and behaviors toward distracted driving among Transportation Providers. In the process, we hoped to gain a better understanding of the demography involved and determine level of exposure risk to Distracted Driving and possible causes that are unique to this population.

Data Analysis: Local Marine Ecology Laboratory

Worked in collaboration eith the Marine Conservation Ecology Lab, SDSU. Our research interest was to measure effects of seagrass structural complexity on mesograzer diversity and community composition in San Diego Bay.

Big Data Hackathon App Development Contest

This project promoted the development of data science and information technology solutions for San Diego on important civic issues related to water drought and conservation, disaster response, and crime monitoring.

Theoretical Presentation

Theoretical presentation of Logistic Regression Modeling. It's applications, logistiic regression with interactions, & extensions

Hobbies

Ceramics

I enjoy making functional ceramic pieces.

Painting

Painting with watercolor. This specific watercolor is an Aquarius constellation. It was made as one of a series of personalized holiday gifts for my closest friends.

Embroidery

I made this piece to celebrate the success of my dear friend’s local business, Nance Jewelry. It is her original business logo. See Nancy's pieces here

Weaving Loom

I enjoy making bookmarks on a weaving loom.

Restoring Furniture

Restoring used furniture. This coffee table was picked up free from the curb, some care later it was in use and loved by many.

Environmental Activism

Photo volunteering with a local environmental organization, San Diego 350.

Contact

You can see more of my work and/or reach me at the following:

LinkedIn

GitHub

Twitter

>>>>>>> 6cf4e1769e8428301d07d6619f923ee59b69d643