Introduction

I plan to use the Wine dataset from the UCI Machine Learning Repository, which contains chemical measurements of wines derived from three different cultivars grown in the same region of Italy. I chose this dataset because it includes a clear target variable and multiple quantitative features that are well suited for practicing data transformations and cleaning tasks.

Planned Workflow

I will begin by selecting a tabular dataset that contains a clear outcome or target variable along with several additional features. The dataset will be accessed through a public URL to ensure the analysis is fully reproducible. After loading the data into R, I will inspect its structure and identify a subset of relevant variables to include in the final transformed data frame.

Anticipated Data Challenges

Possible challenges include missing values, inconsistent data types, and abbreviated or coded variable values that are not immediately interpretable. I anticipate needing to recode certain variables and rename columns to improve clarity and usability for future analysis.

Data 607 - Week 1 Approach

Kristoff Oliphant

2026-01-29

Introduction

Planned Workflow

Anticipated Data Challenges