For this project, I plan to work with a housing-related dataset from NYC Open Data that contains information on housing programs, population counts, and rental costs. I will begin by identifying a dataset that is relevant to understanding housing conditions in New York City and storing it in a public GitHub repository to ensure reproducibility.
My approach will focus on loading the data directly from a public URL into R and then selecting a meaningful subset of variables that are most useful for downstream analysis. I will clean and rename columns to make them easier to interpret and consistent with standard naming conventions. This transformed dataset will serve as a clean foundation for future exploratory analysis.
One anticipated challenge is that NYC Open Data files often contain many columns, inconsistent naming conventions, or missing values. I may need to inspect the data structure carefully and decide which variables are relevant and which can be safely removed. Another challenge may be ensuring that all data types are correctly interpreted when importing the CSV file.
housing-related data obtained from NYC Open Data. The dataset includes information on housing programs, population counts, and rental costs, and is used here to demonstrate data loading and transformation in R. Source: https://opendata.cityofnewyork.us/
The original NYC Open Data file contained a large number of demographic and housing-related variables. To simplify downstream analysis, a subset of relevant columns was selected, including program type, Section 8 status, total families, total population, and average gross rent. Column names were also cleaned to improve readability and consistency.
dataset <- read.csv(
"https://raw.githubusercontent.com/japhet125/r-workflow-assignment/main/NYCHA_Resident_Data_Book_Summary_20260125.csv",
stringsAsFactors = FALSE
)
head(dataset)
## PROGRAM STATECITY_SECTION8_FLAG Total.Families
## 1 FEDERAL TOTAL HOUSEHOLDS 131,286
## 2 FORMER NEW YORK STATE TOTAL HOUSEHOLDS 7,668
## 3 FORMER NEW YORK STATE PUBLIC HOUSING HOUSEHOLDS 6,319
## 4 FORMER NEW YORK STATE SECTION 8 TRANSITION HOUSEHOLDS 1,349
## 5 FORMER NEW YORK CITY TOTAL HOUSEHOLDS 4,020
## 6 FORMER NEW YORK CITY PUBLIC HOUSING HOUSEHOLDS 3,305
## Total.Female.Headed.Families Total.Male.Headed.Families Total.Population
## 1 101,586 29,610 272,972
## 2 5,754 1,908 17,237
## 3 4,730 1,584 13,928
## 4 1,024 324 3,309
## 5 3,224 790 7,997
## 6 2,653 646 6,448
## Average.Family.Size Total.Minors.Under.18 Average.Minors.per.Family
## 1 2.1 62,730 0.5
## 2 2.2 4,012 0.5
## 3 2.2 3,099 0.5
## 4 2.5 913 0.7
## 5 2.0 1,964 0.5
## 6 2.0 1,555 0.5
## Total.Minors.as.Percent.of.Population All.Average.Total.Gross.Income
## 1 22.98% $26,105
## 2 23.28% $26,667
## 3 22.25% $27,295
## 4 27.59% $23,650
## 5 24.56% $25,913
## 6 24.12% $26,459
## All.Average.Gross.Rent Total.HOH.62.Years.and.Over
## 1 $622 59,366
## 2 $614 3,399
## 3 $624 3,071
## 4 $569 328
## 5 $617 1,529
## 6 $627 1,376
## Total.HOH.62.Years.and.Over.as.Percent.of.Families
## 1 45.22%
## 2 44.33%
## 3 48.6%
## 4 24.31%
## 5 38.03%
## 6 41.63%
## Total.Female.Headed.HOH.62.Years.and.Over
## 1 43,526
## 2 2,358
## 3 2,201
## 4 157
## 5 1,152
## 6 1,075
## Total.Male.Headed.HOH.62.Years.and.Over Total.Elderly.Single.Person.Families
## 1 15,831 35,343
## 2 1,039 1,848
## 3 869 1,616
## 4 170 232
## 5 376 996
## 6 300 892
## Total.Elderly.Population Total.62.Years.and.Over.as.Percent.of.Population
## 1 70,229 25.73%
## 2 4,078 23.66%
## 3 3,723 26.73%
## 4 355 10.73%
## 5 1,735 21.7%
## 6 1,565 24.27%
## Total.Families.on.Welfare Total.Families.on.Welfare.and.HOH.Elderly
## 1 20,174 2,202
## 2 1,178 129
## 3 860 108
## 4 318 21
## 5 655 50
## 6 523 41
## Total.Families.on.Full.Welfare
## 1 11,346
## 2 627
## 3 469
## 4 158
## 5 373
## 6 300
## Total.Families.on.Welfare.as.Percent.of.Families
## 1 15.37%
## 2 15.36%
## 3 13.61%
## 4 23.57%
## 5 16.29%
## 6 15.82%
## Total.Single.Parent.Grandparent.Families.with.Minors
## 1 26,428
## 2 1,367
## 3 1,126
## 4 241
## 5 939
## 6 775
## Total.Female.Headed.Single.Parent.Grandparent.with.Minors
## 1 25,086
## 2 1,294
## 3 1,063
## 4 231
## 5 894
## 6 736
## Total.Male.Headed.Single.Parent.Grandparent.with.Minors
## 1 1,320
## 2 73
## 3 63
## 4 10
## 5 43
## 6 37
## Total.Single.Parent.Grandparent.Families.on.Welfare
## 1 10,001
## 2 508
## 3 419
## 4 89
## 5 351
## 6 310
## Total.Single.Parent.Grandparent.with.Minors.as...of.Families
## 1 20.13%
## 2 17.83%
## 3 17.82%
## 4 17.87%
## 5 23.36%
## 6 23.45%
## Total.Families...1.or.More.Employed
## 1 5,053,500%
## 2 285,300%
## 3 239,400%
## 4 45,900%
## 5 149,700%
## 6 123,200%
## Total.Families...1.or.More.Employed.as.Percent.of.Families
## 1 38.49%
## 2 37.21%
## 3 37.89%
## 4 34.03%
## 5 37.24%
## 6 37.28%
## Total.Families...2nd.Adult.Employed
## 1 7,008
## 2 482
## 3 402
## 4 80
## 5 140
## 6 112
## All.Families.Average.Years.in.Public.Housing Residents.Under.4
## 1 27.1 6,494
## 2 25.8 338
## 3 28.7 281
## 4 11.7 57
## 5 24.4 211
## 6 27.0 187
## Residents.4.to.5 Residents.6.to.9 Residents.10.to.13 Residents.14.to.17
## 1 5,492 14,363 16,880 19,501
## 2 358 890 1,075 1,351
## 3 293 709 825 991
## 4 65 181 250 360
## 5 164 459 493 637
## 6 139 386 389 454
## Residents.18.to.20 Residents.21.to.49 Residents.50.to.61 Residents.62.Plus
## 1 14,427 89,029 36,556 NA
## 2 1,057 5,828 2,262 NA
## 3 723 4,536 1,847 NA
## 4 334 1,292 415 NA
## 5 428 2,807 1,063 NA
## 6 295 2,174 859 NA
## Total.Fixed.Income.Families
## 1 60,097
## 2 3,413
## 3 2,888
## 4 525
## 5 1,777
## 6 1,503
## Total.Fixed.Income.Families.as.Percent.of.Families
## 1 0.4578
## 2 0.4451
## 3 0.4570
## 4 0.3892
## 5 0.4420
## 6 0.4548
str(dataset)
## 'data.frame': 26 obs. of 43 variables:
## $ PROGRAM : chr "FEDERAL" "FORMER NEW YORK STATE" "FORMER NEW YORK STATE" "FORMER NEW YORK STATE" ...
## $ STATECITY_SECTION8_FLAG : chr "TOTAL HOUSEHOLDS" "TOTAL HOUSEHOLDS" "PUBLIC HOUSING HOUSEHOLDS" "SECTION 8 TRANSITION HOUSEHOLDS" ...
## $ Total.Families : chr "131,286" "7,668" "6,319" "1,349" ...
## $ Total.Female.Headed.Families : chr "101,586" "5,754" "4,730" "1,024" ...
## $ Total.Male.Headed.Families : chr "29,610" "1,908" "1,584" "324" ...
## $ Total.Population : chr "272,972" "17,237" "13,928" "3,309" ...
## $ Average.Family.Size : num 2.1 2.2 2.2 2.5 2 2 2.2 2.2 2.1 2.4 ...
## $ Total.Minors.Under.18 : chr "62,730" "4,012" "3,099" "913" ...
## $ Average.Minors.per.Family : num 0.5 0.5 0.5 0.7 0.5 0.5 0.6 0.5 0.5 0.6 ...
## $ Total.Minors.as.Percent.of.Population : chr "22.98%" "23.28%" "22.25%" "27.59%" ...
## $ All.Average.Total.Gross.Income : chr "$26,105" "$26,667" "$27,295" "$23,650" ...
## $ All.Average.Gross.Rent : chr "$622" "$614" "$624" "$569" ...
## $ Total.HOH.62.Years.and.Over : chr "59,366" "3,399" "3,071" "328" ...
## $ Total.HOH.62.Years.and.Over.as.Percent.of.Families : chr "45.22%" "44.33%" "48.6%" "24.31%" ...
## $ Total.Female.Headed.HOH.62.Years.and.Over : chr "43,526" "2,358" "2,201" "157" ...
## $ Total.Male.Headed.HOH.62.Years.and.Over : chr "15,831" "1,039" "869" "170" ...
## $ Total.Elderly.Single.Person.Families : chr "35,343" "1,848" "1,616" "232" ...
## $ Total.Elderly.Population : chr "70,229" "4,078" "3,723" "355" ...
## $ Total.62.Years.and.Over.as.Percent.of.Population : chr "25.73%" "23.66%" "26.73%" "10.73%" ...
## $ Total.Families.on.Welfare : chr "20,174" "1,178" "860" "318" ...
## $ Total.Families.on.Welfare.and.HOH.Elderly : chr "2,202" "129" "108" "21" ...
## $ Total.Families.on.Full.Welfare : chr "11,346" "627" "469" "158" ...
## $ Total.Families.on.Welfare.as.Percent.of.Families : chr "15.37%" "15.36%" "13.61%" "23.57%" ...
## $ Total.Single.Parent.Grandparent.Families.with.Minors : chr "26,428" "1,367" "1,126" "241" ...
## $ Total.Female.Headed.Single.Parent.Grandparent.with.Minors : chr "25,086" "1,294" "1,063" "231" ...
## $ Total.Male.Headed.Single.Parent.Grandparent.with.Minors : chr "1,320" "73" "63" "10" ...
## $ Total.Single.Parent.Grandparent.Families.on.Welfare : chr "10,001" "508" "419" "89" ...
## $ Total.Single.Parent.Grandparent.with.Minors.as...of.Families: chr "20.13%" "17.83%" "17.82%" "17.87%" ...
## $ Total.Families...1.or.More.Employed : chr "5,053,500%" "285,300%" "239,400%" "45,900%" ...
## $ Total.Families...1.or.More.Employed.as.Percent.of.Families : chr "38.49%" "37.21%" "37.89%" "34.03%" ...
## $ Total.Families...2nd.Adult.Employed : chr "7,008" "482" "402" "80" ...
## $ All.Families.Average.Years.in.Public.Housing : num 27.1 25.8 28.7 11.7 24.4 27 12.1 25.3 28.2 11.9 ...
## $ Residents.Under.4 : chr "6,494" "338" "281" "57" ...
## $ Residents.4.to.5 : chr "5,492" "358" "293" "65" ...
## $ Residents.6.to.9 : chr "14,363" "890" "709" "181" ...
## $ Residents.10.to.13 : chr "16,880" "1,075" "825" "250" ...
## $ Residents.14.to.17 : chr "19,501" "1,351" "991" "360" ...
## $ Residents.18.to.20 : chr "14,427" "1,057" "723" "334" ...
## $ Residents.21.to.49 : chr "89,029" "5,828" "4,536" "1,292" ...
## $ Residents.50.to.61 : chr "36,556" "2,262" "1,847" "415" ...
## $ Residents.62.Plus : logi NA NA NA NA NA NA ...
## $ Total.Fixed.Income.Families : chr "60,097" "3,413" "2,888" "525" ...
## $ Total.Fixed.Income.Families.as.Percent.of.Families : num 0.458 0.445 0.457 0.389 0.442 ...
colnames(dataset)
## [1] "PROGRAM"
## [2] "STATECITY_SECTION8_FLAG"
## [3] "Total.Families"
## [4] "Total.Female.Headed.Families"
## [5] "Total.Male.Headed.Families"
## [6] "Total.Population"
## [7] "Average.Family.Size"
## [8] "Total.Minors.Under.18"
## [9] "Average.Minors.per.Family"
## [10] "Total.Minors.as.Percent.of.Population"
## [11] "All.Average.Total.Gross.Income"
## [12] "All.Average.Gross.Rent"
## [13] "Total.HOH.62.Years.and.Over"
## [14] "Total.HOH.62.Years.and.Over.as.Percent.of.Families"
## [15] "Total.Female.Headed.HOH.62.Years.and.Over"
## [16] "Total.Male.Headed.HOH.62.Years.and.Over"
## [17] "Total.Elderly.Single.Person.Families"
## [18] "Total.Elderly.Population"
## [19] "Total.62.Years.and.Over.as.Percent.of.Population"
## [20] "Total.Families.on.Welfare"
## [21] "Total.Families.on.Welfare.and.HOH.Elderly"
## [22] "Total.Families.on.Full.Welfare"
## [23] "Total.Families.on.Welfare.as.Percent.of.Families"
## [24] "Total.Single.Parent.Grandparent.Families.with.Minors"
## [25] "Total.Female.Headed.Single.Parent.Grandparent.with.Minors"
## [26] "Total.Male.Headed.Single.Parent.Grandparent.with.Minors"
## [27] "Total.Single.Parent.Grandparent.Families.on.Welfare"
## [28] "Total.Single.Parent.Grandparent.with.Minors.as...of.Families"
## [29] "Total.Families...1.or.More.Employed"
## [30] "Total.Families...1.or.More.Employed.as.Percent.of.Families"
## [31] "Total.Families...2nd.Adult.Employed"
## [32] "All.Families.Average.Years.in.Public.Housing"
## [33] "Residents.Under.4"
## [34] "Residents.4.to.5"
## [35] "Residents.6.to.9"
## [36] "Residents.10.to.13"
## [37] "Residents.14.to.17"
## [38] "Residents.18.to.20"
## [39] "Residents.21.to.49"
## [40] "Residents.50.to.61"
## [41] "Residents.62.Plus"
## [42] "Total.Fixed.Income.Families"
## [43] "Total.Fixed.Income.Families.as.Percent.of.Families"
#selecting a subset of a column
subset_dataset = dataset[, c(
"PROGRAM",
"STATECITY_SECTION8_FLAG",
"Total.Families",
"Total.Population",
"All.Average.Gross.Rent"
)]
head(subset_dataset)
## PROGRAM STATECITY_SECTION8_FLAG Total.Families
## 1 FEDERAL TOTAL HOUSEHOLDS 131,286
## 2 FORMER NEW YORK STATE TOTAL HOUSEHOLDS 7,668
## 3 FORMER NEW YORK STATE PUBLIC HOUSING HOUSEHOLDS 6,319
## 4 FORMER NEW YORK STATE SECTION 8 TRANSITION HOUSEHOLDS 1,349
## 5 FORMER NEW YORK CITY TOTAL HOUSEHOLDS 4,020
## 6 FORMER NEW YORK CITY PUBLIC HOUSING HOUSEHOLDS 3,305
## Total.Population All.Average.Gross.Rent
## 1 272,972 $622
## 2 17,237 $614
## 3 13,928 $624
## 4 3,309 $569
## 5 7,997 $617
## 6 6,448 $627
#cleaning column names
colnames(subset_dataset) = c(
"CITY_PROGRAM",
"SECTION_8",
"Total_Families",
"Total_Population",
"All_Average_Gross_Rent"
)
head(subset_dataset)
## CITY_PROGRAM SECTION_8 Total_Families
## 1 FEDERAL TOTAL HOUSEHOLDS 131,286
## 2 FORMER NEW YORK STATE TOTAL HOUSEHOLDS 7,668
## 3 FORMER NEW YORK STATE PUBLIC HOUSING HOUSEHOLDS 6,319
## 4 FORMER NEW YORK STATE SECTION 8 TRANSITION HOUSEHOLDS 1,349
## 5 FORMER NEW YORK CITY TOTAL HOUSEHOLDS 4,020
## 6 FORMER NEW YORK CITY PUBLIC HOUSING HOUSEHOLDS 3,305
## Total_Population All_Average_Gross_Rent
## 1 272,972 $622
## 2 17,237 $614
## 3 13,928 $624
## 4 3,309 $569
## 5 7,997 $617
## 6 6,448 $627
This analysis prepared a clean and reproducible subset of NYC housing data for future exploration. Future work could examine relationships between housing programs, population size, and rental costs or incorporate additional years of data for trend analysis.