This data set contains information on popular baby names in NYC, including birth year, ethnicity, gender and popularity rank. The data set is useful for exploring trends in baby names among New Yorkers. The original data set sourced from Data.gov: https://catalog.data.gov/dataset/popular-baby-names
Approach
First, the data set will be saved on my GitHub repository (DATA607) to ensure reproducibility. Then, the raw data will be loaded into R and reviewed. Here one variable (“Ethnicity”) will be removed, and another will be renamed from “Rank” to “Popularity Rank” in order to improve clarity. These steps are intended to produce a more focused data frame.
As my first assignment using the R programming language and the Quarto format, this project provided an opportunity to practice loading a dataset and performing basic data transformations in a reproducible environment. Through this process, a subset of relevant variables was selected and a column name was renamed to improve clarity. This assignment established a foundation for working with data in R. As a next step, this cleaned data set could be extended by filtering to a specific year and summarizing name popularity using the Count and Popularity_Rank variables.