Project 2 approach

title: “Project 2 – Data Transformation Approach” author: “Joshua Henry” date: “2026-03-05” output: html_document ———————

Overview

For this project, I will work with three different datasets that are currently in a wide format. Wide datasets usually store values across many columns, which makes them harder to analyze directly. My goal will be to convert these datasets into a tidy format so they can be analyzed more easily in R.

To complete this project, I will first create a raw .csv file for each dataset that keeps the original wide structure exactly as it appears in the source data. This will act as the starting point for my work and will not be modified.

After importing the datasets into R, I will use the tidyr and dplyr packages to transform the data into a tidy structure. This will likely involve reshaping the datasets from wide format into long format so that each variable has its own column and each observation has its own row. During this process, I will also rename variables to make them more consistent and easier to understand.

Once the data is tidy, I will check for missing or inconsistent values and decide how to handle them. This could include removing incomplete rows, filling values where appropriate, or documenting why the values are missing.

After the transformation process is complete, I will perform the analysis that was originally requested in the Discussion 5A assignment for each dataset. The analysis will only use the tidy datasets created from the transformation process. I will include summary tables, visualizations, and short explanations describing what the results mean.

Anticipated Data Challenges

One challenge I expect is dealing with datasets that have many columns representing repeated measurements or categories. These types of structures usually require reshaping before analysis can be done.

Another possible issue is inconsistent column names or formatting. Some datasets may include spaces, abbreviations, or unclear labels that will need to be renamed to create a consistent structure.

Missing values may also appear in the data. When this happens, I will decide whether to remove those observations or keep them depending on how they affect the analysis. All decisions will be documented in the project.

Finally, some datasets may include values stored as text instead of numbers, which would require converting them into the correct data type before analysis can be performed.

Project 2 approach

Joshua Henry

2026-03-06

Overview

Anticipated Data Challenges