R Markdown Projects
Modeling, Cross Validation, & Machine Learning
Linear & Logistic Regression Modeling Using Machine learning
Approaches
A Blind Taste Test |
[Report]
Predicting sommelier quality rankings of wine and wine type (red/white)
utilizing chemical composition alone [click for more details]
- Real world data; medium data set (approx. 6,500 observations)
- Regularized linear & logistic regression prediction models,
support vector machine (SVM) regression and classification models
- Coefficient path analysis, hyperparameter tuning, and optimal cut
off probability analysis for optimized accuracy and specificity.
- Analysis goals:
- Identifying the best possible prediction model for sommelier wine
quality ranking based on chemical composition
- Creating a white vs red prediction model for blend wines that do not
neatly fall into either category
Hold That Plane! |
[Report]
An analisis of global flight delay data and prediction model creation to
reduce delays and increase live prediction capabilities [click for more
details]
- Synthetic data with forced missing values. Medium data set (over
3,500 observations)
- Multiple Imputation by Chained Equations, Ruben’s rules, Recursive
feature elimination, Principal Component Analysis, k-Means
Clustering
- Forward elimination model creation, cross validation, Receiver
Operating Characteristic (ROC) comparison
- Analysis goals:
- Identifying the variables with greatest impact on overall flight
delay for subsequent mitigation and preventative efforts
- Live flight delay prediction based on available flight data prior to
landing (predicting total delay time before flights depart)
Unsupervised Learning Algorithms and Predictive modeling
Big Back Behavior ]
[Report]
Predicting obesity rankings based on behavioral survey results [click
for more details]
- Real world data; medium data set (approx 2,200 observations)
- CART regression prediction models, CART classification prediction
models, Bootstrap BAGGING regression prediction models, & Bootstrap
BAGGING classification prediction models
- Variable importance analysis, hyperparameter tuning, model pruning,
ROC curve analysis & optimal cutoff identification
- Analysis goals:
- Develop overweight/obesity prediction model utilizing patient
lifestyle and behavioral information, excluding weight
- Develop weight independent prediction models for overweight/obesity
to allow for prediction of those at risk based on behavioral &
lifestyle choices
Can you pay me back? |
[Report]
&
[Presentation]
Loan Default Borrower Data Clustering & Analysis [click for more
details]
- Report & supporting presentation
- Synthetic data. Medium data set (approx. 1000 observations)
- K means clustering, agglomerative hierarchical data clustering,
principal component analysis, local outlier factor analysis
Blind Breast Biopsies |
[Report]
Predicting Breast Tissue Biopsy Diagnosis In the Event of Incomplete
Biopsy [click for more details]
- Synthetic data. Small data set (approx. 600 observations)
- Logistic predictive modeling, single-layer neural network model
(perceptron) prediction, decision tree algorithms, ROC analysis, and
bagging
Regression Analysis & Model Creation
Can We Predict the Future in Health? |
[Report]
Framingham Heart Study Model Fitting and Cross Validatiion for BMI
Predictions [click for more details]
- Real world data. Medium data set (approx. 4,200 observations)
- Linear regression model creation, box-cox transformation, mean
square error (MSE) cross validation, logistic regression model creation,
receiver operating characteristic (ROC) and area under the curve (AUC)
model comparison
Data Visualization and Exploratory Data Analysis
Exploratory Data Analysis & Variable Relationship
Exploration
I’m Going to Miss my Flight! |
[Report]
An analysis of flight delay times across airports [click for more
details]
- Synthetic data with forced missing values. Medium data set (over
3,500 observations)
- Analysis of variables impacting flight delays & solution
implementation
- ggplot & Plot_ly interactive visualizations
Exploratory Data Analysis
Presidential Fitness Test |
[Report]
The Impact of Local Education, Poverty, and Unemployment Rates on County
Presidential Election Results [click for more details]
- Four real world data sets utilized. Large data sets (over 72,000
observations)
- Relational data sets and data aggregation
- GeoJSON & leaflet Interactive Map, ggplot
Time Series Data
The Price of Longevity |
[Report]
An Analysis of Annual Income on Life Expectancy [click for more details]
- Real world data. Large data set (over 35,000 observations)
- An analysis of variables impacting average life expectancy
- ggplot & Plot_ly interactive visualizations
SAS Projects
Linear Regression Analysis
Buisness Analytics Proposal
Pizzaz.com Speed Dating Comparability
Prediction |
[Proposal]
&
[Presentation]
&
[SAS
Code]
Speed Dating Analytics & Online Dating Optimization [click for more
details]
- Synthetic data. Small data set (276 observations)
- An analysis of variables impacting overall dater interest and linear
regression prediction model creation
- Proposal for data utilization and business improvement
Shiny/FlexDashboard Projects
Linear Regression Dashboard
Melbourne Housing Market |
[R Shiny
FlexDashboard]
[click for more details]
- Real world data. Large data set (over 34,000 observations)
- Housing market price regression analysis
Tableau Projects
Geospatial Data
Interactive Mapping
Presidential Election Results Per County Per
Year |
[Map]
[click for more details]
- Real world data. Large data set (over 72,000 observations)
- An analysis of Presidential Election Data from 2000 through
2020
Public Health Data
Tableau Storypoint & Dahsboard
National Health and Nutrition Examination Survey
(NHANES) |
[Storypoint]
&
[Dashboard]
[click for more details]
- Real world data. Medium data set (approx. 7,900 observations)
- An analysis of NHANES data of Smoking, Blood Pressure, and Serum
Cholesterol Levels
