Assignment1 Data607

Author

Madina Kudanova

Introduction

This data set contains information on popular baby names in NYC, including birth year, ethnicity, gender and popularity rank. The data set is useful for exploring trends in baby names among New Yorkers. The original data set sourced from Data.gov: https://catalog.data.gov/dataset/popular-baby-names

Approach

First, the data set will be saved on my GitHub repository (DATA607) to ensure reproducibility. Then, the raw data will be loaded into R and reviewed. Here one variable (“Ethnicity”) will be removed, and another will be renamed from “Rank” to “Popularity Rank” in order to improve clarity. These steps are intended to produce a more focused data frame.

Code Base

1) Loading and displaying data set

URL <-"https://raw.githubusercontent.com/MKudanova/Data607/refs/heads/main/Assignment1/Popular_Baby_Names.csv"

babies_raw <- read.csv(URL)
head(babies_raw)

  Year.of.Birth Gender Ethnicity Child.s.First.Name Count Rank
1          2011 FEMALE  HISPANIC          GERALDINE    13   75
2          2011 FEMALE  HISPANIC                GIA    21   67
3          2011 FEMALE  HISPANIC             GIANNA    49   42
4          2011 FEMALE  HISPANIC            GISELLE    38   51
5          2011 FEMALE  HISPANIC              GRACE    36   53
6          2011 FEMALE  HISPANIC          GUADALUPE    26   62

print(head(babies_raw))

  Year.of.Birth Gender Ethnicity Child.s.First.Name Count Rank
1          2011 FEMALE  HISPANIC          GERALDINE    13   75
2          2011 FEMALE  HISPANIC                GIA    21   67
3          2011 FEMALE  HISPANIC             GIANNA    49   42
4          2011 FEMALE  HISPANIC            GISELLE    38   51
5          2011 FEMALE  HISPANIC              GRACE    36   53
6          2011 FEMALE  HISPANIC          GUADALUPE    26   62

2) Transforming data set

babies_new <- babies_raw[, c("Year.of.Birth", "Gender", "Child.s.First.Name", "Count", "Rank") ]
names(babies_new) <- c("Year_of_Birth", "Gender", "First_Name", "Count", "Popularity_Rank")
head(babies_new)

  Year_of_Birth Gender First_Name Count Popularity_Rank
1          2011 FEMALE  GERALDINE    13              75
2          2011 FEMALE        GIA    21              67
3          2011 FEMALE     GIANNA    49              42
4          2011 FEMALE    GISELLE    38              51
5          2011 FEMALE      GRACE    36              53
6          2011 FEMALE  GUADALUPE    26              62

Conclusion

As my first assignment using the R programming language and the Quarto format, this project provided an opportunity to practice loading a dataset and performing basic data transformations in a reproducible environment. Through this process, a subset of relevant variables was selected and a column name was renamed to improve clarity. This assignment established a foundation for working with data in R. As a next step, this cleaned data set could be extended by filtering to a specific year and summarizing name popularity using the Count and Popularity_Rank variables.