Project 2

First Data Set

This data set pertains to the shipping expenses of a company and encompasses a wide array of data, including numerous empty columns and values.

shipment <- read.csv("https://raw.githubusercontent.com/yli1048/yli1048/refs/heads/607/Shipment.csv", header = TRUE, skip = 2)
head(shipment)

##   Order.Date  X X.1 X.2 X.3 X.4 X.5    X.6    X.7 X.8    X.9   X.10   X.11
## 1  2013/3/14 NA  NA  NA  NA  NA  NA     NA     NA  NA     NA     NA 91.056
## 2 2013/12/16 NA  NA  NA  NA  NA  NA 129.44     NA  NA     NA     NA     NA
## 3   2013/6/2 NA  NA  NA  NA  NA  NA     NA     NA  NA 605.47     NA     NA
## 4 2013/10/21 NA  NA  NA  NA  NA  NA     NA     NA  NA     NA 788.86     NA
## 5  2013/8/27 NA  NA  NA  NA  NA  NA  13.36     NA  NA     NA     NA     NA
## 6 2013/11/28 NA  NA  NA  NA  NA  NA     NA 542.34  NA     NA     NA     NA

shipment_1 = shipment[c(1, 2, 3, 4)]
names(shipment_1)[1] <- "Order_Date"
names(shipment_1)[2] <- "Consumer"
names(shipment_1)[3] <- "Corporate"
names(shipment_1)[4] <- "Home_Office"

shipment_1 <- melt(shipment_1, id.vars = c("Order_Date"), variable.name = "Segment", value.name = "Amount")

first_class <- shipment_1[!is.na(shipment_1$Amount),]

print(first_class)

##    Order_Date   Segment  Amount
## 12  2013/1/15  Consumer 149.950
## 21  2013/8/15  Consumer 243.600
## 38   2013/7/5 Corporate 242.546
## 41  2013/3/19 Corporate 590.762
## 43   2013/1/6 Corporate  12.780
## 54   2013/5/5 Corporate  47.320

There is no record of same-day shipping, so I have skipped this step.

shipment_3 = shipment[c(1, 8, 9, 10)]
names(shipment_3)[1] <- "Order_Date"
names(shipment_3)[2] <- "Consumer"
names(shipment_3)[3] <- "Corporate"
names(shipment_3)[4] <- "Home_Office"

shipment_3 <- melt(shipment_3, id.vars = c("Order_Date"), variable.name = "Segment", value.name = "Amount")

second_class <- shipment_3[!is.na(shipment_3$Amount),]

print(second_class)

##    Order_Date   Segment   Amount
## 2  2013/12/16  Consumer  129.440
## 5   2013/8/27  Consumer   13.360
## 22  2013/1/13  Consumer  545.940
## 25  2013/1/21  Consumer   25.248
## 33 2013/11/28 Corporate  542.340
## 37   2013/4/5 Corporate 4251.920
## 51 2013/10/18 Corporate 2216.800

shipment_4 = shipment[c(1, 11, 12, 13)]
names(shipment_4)[1] <- "Order_Date"
names(shipment_4)[2] <- "Consumer"
names(shipment_4)[3] <- "Corporate"
names(shipment_4)[4] <- "HomeOffice"

shipment_4 <- melt(shipment_4, id.vars = c("Order_Date"), variable.name = "Segment", value.name = "Amount")

standard_class <- shipment_4[!is.na(shipment_4$Amount),]

print(standard_class)

##    Order_Date    Segment   Amount
## 3    2013/6/2   Consumer  605.470
## 15  2013/6/27   Consumer  616.140
## 18 2013/12/12   Consumer   23.472
## 23  2013/4/25   Consumer  302.376
## 31 2013/10/21  Corporate  788.860
## 34  2013/3/31  Corporate    1.869
## 35 2013/11/21  Corporate  865.500
## 36  2013/11/1  Corporate 1044.440
## 40  2013/12/2  Corporate   21.190
## 44  2013/5/14  Corporate  310.880
## 46  2013/4/29  Corporate  661.504
## 53  2013/8/17  Corporate  484.790
## 55  2013/3/14 HomeOffice   91.056
## 74 2013/10/24 HomeOffice   10.368

Analysis

The process of organizing the wide shipment data set involves segmenting it by different shipping methods, thereby enhancing its clarity. Furthermore, integrating the data sets and incorporating an additional column to delineate the segments can significantly improve the comprehensibility of the data. It is worth noting that numerous rows and columns contained empty values, particularly in relation to same-day delivery.

Second Data Set

For the second data set, it was sourced from the DOHMH New York City Restaurant Inspection Results. Due to its size, only selected and organized segments of the data set will be presented.

inspection <- read.csv("https://raw.githubusercontent.com/yli1048/yli1048/refs/heads/607/NYC_Restaurant_Inspection_Results.csv", header = TRUE)

store_info <- inspection[c(1, 2, 7, 8)]
names(store_info)[2] <- "NAME"
head(store_info)

##      CAMIS               NAME      PHONE      CUISINE.DESCRIPTION
## 1 50139400       ANDIE'S EATS 9143648113 Bakery Products/Desserts
## 2 41243535   EMPIRE CORNER II 2124105756                  Chinese
## 3 41470527 HEAVEN'S HOT BAGEL 2124207566          Bagels/Pretzels
## 4 40373462      VILLA BERULIA 2126891970                  Italian
## 5 50128067    GEORGIAN CORNER 3472400940         Eastern European
## 6 50104255   WO HOP NEXT DOOR 9175510233                  Chinese

location <- inspection[c(1, 4, 5, 6, 3, 19, 20)]
location_1 = unite(location, address, c(BUILDING, STREET, ZIPCODE, BORO))
location_new = unite(location_1, coordinates, c(Latitude, Longitude))
head(location_new)

##      CAMIS                                 address
## 1 50139400     185_BLEECKER STREET_10012_Manhattan
## 2 41243535           1415_5 AVENUE_10029_Manhattan
## 3 41470527 283_EAST HOUSTON STREET_10002_Manhattan
## 4 40373462    107_EAST   34 STREET_10016_Manhattan
## 5 50128067  626_SHEEPSHEAD BAY ROAD_11224_Brooklyn
## 6 50104255          15_MOTT STREET_10013_Manhattan
##                        coordinates
## 1 40.729078531416_-74.001006642818
## 2 40.800487432208_-73.946572413814
## 3 40.721573256914_-73.984205791876
## 4 40.746865746167_-73.980821742973
## 5 40.578972142427_-73.975261496585
## 6 40.714179932664_-73.998755505932

violations <- inspection[c(1, 9, 10, 11, 12, 13)]
violations = unite(violations, inspection, c(INSPECTION.DATE, ACTION))
violations = unite(violations, violations, c(CRITICAL.FLAG, VIOLATION.CODE, VIOLATION.DESCRIPTION))
head(violations)

##      CAMIS
## 1 50139400
## 2 41243535
## 3 41470527
## 4 40373462
## 5 50128067
## 6 50104255
##                                                                                                                                      inspection
## 1                                                                                    01/11/2024_Violations were cited in the following area(s).
## 2                                                                                    04/02/2024_Violations were cited in the following area(s).
## 3                                                                                    01/21/2022_Violations were cited in the following area(s).
## 4                                                                                    04/13/2022_Violations were cited in the following area(s).
## 5                                                                                    08/14/2024_Violations were cited in the following area(s).
## 6 08/11/2021_Establishment Closed by DOHMH. Violations were cited in the following area(s) and those requiring immediate action were addressed.
##                                                                                                                                                                                                                                                                                         violations
## 1                                       Not Critical_10B_Anti-siphonage or back-flow prevention device not provided where required; equipment or floor not properly drained; sewage disposal system in disrepair or not functioning properly. Condensation or liquid waste improperly disposed of.
## 2                                                         Not Critical_10F_Non-food contact surface or equipment made of unacceptable material, not kept clean, or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
## 3 Not Critical_10F_Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
## 4                                                                                                                                      Critical_06D_Food contact surface not properly washed, rinsed and sanitized after each use and following any activity when contamination may have occurred.
## 5                                                         Not Critical_10F_Non-food contact surface or equipment made of unacceptable material, not kept clean, or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
## 6                                                                                                                                                                                                                      Critical_04M_Live roaches present in facility's food and/or non-food areas.

inspection_details <- inspection[c(1, 14, 15, 16, 17, 18)]
inspection_details = unite(inspection_details, score, c(RECORD.DATE, SCORE))
inspection_details = unite(inspection_details, grade, c(GRADE.DATE, GRADE))
inspection_details <- melt(inspection_details, id.vars = c("CAMIS"), variable.name = "record", value.name = "Date_and_value")
head(inspection_details)

##      CAMIS record Date_and_value
## 1 50139400  score  10/09/2024_70
## 2 41243535  score   10/09/2024_5
## 3 41470527  score  10/09/2024_41
## 4 40373462  score  10/09/2024_11
## 5 50128067  score  10/09/2024_17
## 6 50104255  score  10/09/2024_44

community <- inspection[c(1, 21, 22, 23, 24, 25, 26)]
community <- melt(community, id.vars = c("CAMIS"), variable.name = "community", value.name = "info")
head(community)

##      CAMIS       community info
## 1 50139400 Community.Board  102
## 2 41243535 Community.Board  111
## 3 41470527 Community.Board  103
## 4 40373462 Community.Board  106
## 5 50128067 Community.Board  313
## 6 50104255 Community.Board  103

Analysis

The extensive data set consists of 27 columns, which can be categorized into smaller datasets based on specific criteria. Additionally, CAMIS functions as a distinct identifier for each store, facilitating its use as a representation of the store in the data frames. Segmenting the wide data set enables easier data retrieval and enhances overall organization.

Third Data Set

The data set pertains to marriages for both genders and encompasses a wide array of conditions. It can be segmented into distinct categories and data frames, enabling a more systematic and organized presentation of the information.

marriages <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/refs/heads/master/marriage/both_sexes.csv", header = TRUE)
head(marriages)

##   X year       date  all_2534   HS_2534   SC_2534  BAp_2534  BAo_2534   GD_2534
## 1 1 1960 1960-01-01 0.1233145 0.1095332 0.1522818 0.2389952 0.2389952        NA
## 2 2 1970 1970-01-01 0.1269715 0.1094000 0.1495096 0.2187031 0.2187031        NA
## 3 3 1980 1980-01-01 0.1991767 0.1617313 0.2236916 0.2881646 0.2881646        NA
## 4 4 1990 1990-01-01 0.2968306 0.2777491 0.2780912 0.3612968 0.3656655 0.3474505
## 5 5 2000 2000-01-01 0.3450087 0.3316545 0.3249205 0.3874906 0.3939579 0.3691740
## 6 6 2001 2001-01-01 0.3527767 0.3446069 0.3341101 0.3835686 0.3925148 0.3590304
##   White_2534 Black_2534 Hisp_2534   NE_2534   MA_2534 Midwest_2534 South_2534
## 1  0.1164848  0.1621855 0.1393736 0.1504184 0.1628934    0.1121467  0.1090562
## 2  0.1179043  0.1855163 0.1298769 0.1517231 0.1640680    0.1153741  0.1126220
## 3  0.1824126  0.3137500 0.1885440 0.2414327 0.2505925    0.1828339  0.1688435
## 4  0.2639256  0.4838556 0.2962372 0.3500384 0.3623321    0.2755046  0.2639794
## 5  0.3127149  0.5144994 0.3180681 0.4091852 0.4175565    0.3308022  0.3099712
## 6  0.3183506  0.5437985 0.3321214 0.4200581 0.4294281    0.3344332  0.3182688
##   Mountain_2534 Pacific_2534 poor_2534   mid_2534 rich_2534   all_3544
## 1    0.09152117    0.1198758 0.1371597 0.07514929 0.2066776 0.07058157
## 2    0.10293602    0.1374964 0.1717202 0.08159207 0.1724093 0.06732520
## 3    0.17434230    0.2334279 0.3100591 0.14825303 0.1851082 0.06883378
## 4    0.25264326    0.3319579 0.4199108 0.24320008 0.2783226 0.11191800
## 5    0.30621032    0.3753061 0.5033676 0.30202036 0.2717386 0.15605881
## 6    0.30980779    0.3844799 0.5178771 0.31716118 0.2532041 0.15642529
##      HS_3544    SC_3544  BAp_3544  BAo_3544   GD_3544 White_3544 Black_3544
## 1 0.06860309 0.06663695 0.1326265 0.1326265        NA 0.06825586 0.08836728
## 2 0.06511964 0.06271724 0.1116899 0.1116899        NA 0.06250372 0.10290904
## 3 0.06429102 0.06531333 0.1056102 0.1056102        NA 0.05966739 0.13140081
## 4 0.11210043 0.09699372 0.1285172 0.1258567 0.1328018 0.09611312 0.22010298
## 5 0.16993703 0.13800404 0.1541238 0.1536299 0.1550970 0.13207032 0.30239381
## 6 0.16870156 0.13986044 0.1548151 0.1524923 0.1595169 0.13287455 0.30857796
##    Hisp_3544    NE_3544    MA_3544 Midwest_3544 South_3544 Mountain_3544
## 1 0.07307651 0.09194322 0.09347468   0.06863360 0.06026353    0.04739747
## 2 0.07070500 0.08570110 0.09040725   0.06156272 0.05966057    0.04651163
## 3 0.08110790 0.07997323 0.09744428   0.06070641 0.05914089    0.04880077
## 4 0.12194206 0.12785915 0.14354989   0.10157576 0.09637035    0.09189904
## 5 0.15469520 0.17327422 0.18819256   0.14539201 0.14230600    0.13584194
## 6 0.14953050 0.16653497 0.18315109   0.14794407 0.14312592    0.13943820
##   Pacific_3544 poor_3544   mid_3544  rich_3544   all_4554    HS_4554    SC_4554
## 1   0.05822486 0.1019749 0.04717272 0.08553870 0.07254649 0.06840792 0.07903755
## 2   0.06347796 0.1117548 0.04566838 0.06499159 0.05968794 0.05833439 0.05443478
## 3   0.07552538 0.1291426 0.05050321 0.04445951 0.05250871 0.05036563 0.04816180
## 4   0.13134638 0.2012208 0.09024739 0.06573916 0.05947824 0.05988244 0.04654087
## 5   0.17480047 0.2813137 0.12815751 0.08622046 0.08804394 0.09442809 0.07558786
## 6   0.17694864 0.2919112 0.13267625 0.06803283 0.08823342 0.09189007 0.07795481
##     BAp_4554   BAo_4554    GD_4554 White_4554 Black_4554  Hisp_4554    NE_4554
## 1 0.15360889 0.15360889         NA 0.07246692 0.06913249 0.06636058 0.10236412
## 2 0.10466047 0.10466047         NA 0.05754799 0.07899168 0.05810740 0.08028082
## 3 0.08623774 0.08623774         NA 0.04765354 0.08624602 0.06522951 0.06930253
## 4 0.07301884 0.06416529 0.08394886 0.05092552 0.11617699 0.07613556 0.07047502
## 5 0.09208417 0.09097472 0.09362802 0.07578174 0.17587334 0.09418009 0.10232170
## 6 0.09333365 0.09313480 0.09362876 0.07516912 0.18154531 0.09409896 0.09868408
##      MA_4554 Midwest_4554 South_4554 Mountain_4554 Pacific_4554 poor_4554
## 1 0.09264788   0.07285321 0.05977295    0.04754183   0.05996993 0.1030055
## 2 0.07860635   0.05791163 0.05174462    0.03970134   0.04826312 0.1016489
## 3 0.07508466   0.04807290 0.04485348    0.03374438   0.04958992 0.1003011
## 4 0.08373134   0.05398391 0.05043636    0.04459411   0.06461875 0.1148335
## 5 0.11269659   0.08302437 0.07631858    0.07637774   0.09896832 0.1718976
## 6 0.10953635   0.08207629 0.07886513    0.07405971   0.10119511 0.1759369
##     mid_4554  rich_4554 nokids_all_2534 kids_all_2534 nokids_HS_2534
## 1 0.05364421 0.07908591       0.4640564   0.002820625      0.4430148
## 2 0.04221637 0.05142867       0.4309043   0.009868596      0.4246779
## 3 0.03830266 0.03311296       0.4464304   0.025285667      0.4319342
## 4 0.04562332 0.03136386       0.5425242   0.060277451      0.5464881
## 5 0.07055672 0.03897342       0.5714531   0.099472713      0.5711395
## 6 0.07407508 0.02857320       0.5852213   0.110178467      0.6045475
##   nokids_SC_2534 nokids_BAp_2534 nokids_BAo_2534 nokids_GD_2534 kids_HS_2534
## 1      0.5000402       0.5619099       0.5619099             NA  0.003318886
## 2      0.4333479       0.4554766       0.4554766             NA  0.012465915
## 3      0.4505900       0.4719700       0.4719700             NA  0.031930752
## 4      0.5238446       0.5560765       0.5633301      0.5332628  0.078470444
## 5      0.5700042       0.5729677       0.5862213      0.5367160  0.127193577
## 6      0.5810912       0.5698644       0.5864967      0.5258800  0.141395652
##   kids_SC_2534 kids_BAp_2534 kids_BAo_2534 kids_GD_2534 nokids_poor_2534
## 1  0.001150824  0.0005751073  0.0005751073           NA        0.4933061
## 2  0.003699982  0.0014683425  0.0014683425           NA        0.5097742
## 3  0.018135401  0.0062544364  0.0062544364           NA        0.5740402
## 4  0.052032702  0.0171241042  0.0181766027   0.01374234        0.6546908
## 5  0.097625310  0.0370024452  0.0401009875   0.02761467        0.7055451
## 6  0.110030662  0.0399801447  0.0445838012   0.02645041        0.7147334
##   nokids_mid_2534 nokids_rich_2534 kids_poor_2534 kids_mid_2534 kids_rich_2534
## 1       0.4100080        0.4921184    0.008722711  0.0007532065   0.0008027331
## 2       0.3764538        0.4288948    0.029974945  0.0033771145   0.0030435661
## 3       0.3998250        0.3848089    0.077926214  0.0102368871   0.0068317224
## 4       0.5186604        0.4750156    0.170763774  0.0274655254   0.0182329127
## 5       0.5690228        0.4458023    0.256281918  0.0597845173   0.0295644698
## 6       0.5864741        0.4461111    0.280146488  0.0677954572   0.0336540502

#By age of 25-34
age2534 <- marriages[c(2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)]

#By education
edu_2534 <- age2534[c(1, 3, 4, 5, 6, 7)]
names(edu_2534)[2] <- "High_School"
names(edu_2534)[3] <- "Some_College"
names(edu_2534)[4] <- "Bachelor's"
names(edu_2534)[5] <- "No_Graduate"
names(edu_2534)[6] <- "Graduate"

edu_2534 <- melt(edu_2534, id.vars = c("year"), variable.name = "education", value.name = "rate")

edu_2534 <- edu_2534[!is.na(edu_2534$rate),]

head(edu_2534)

##   year   education      rate
## 1 1960 High_School 0.1095332
## 2 1970 High_School 0.1094000
## 3 1980 High_School 0.1617313
## 4 1990 High_School 0.2777491
## 5 2000 High_School 0.3316545
## 6 2001 High_School 0.3446069

#By race
race_2534 <- age2534[c(1, 8, 9, 10)]
names(race_2534)[2] <- "White"
names(race_2534)[3] <- "Black"
names(race_2534)[4] <- "Hispanic"

race_2534 <- melt(race_2534, id.vars = c("year"), variable.name = "race", value.name = "rate")

head(race_2534)

##   year  race      rate
## 1 1960 White 0.1164848
## 2 1970 White 0.1179043
## 3 1980 White 0.1824126
## 4 1990 White 0.2639256
## 5 2000 White 0.3127149
## 6 2001 White 0.3183506

#By region
region_2534 <- age2534[c(1, 11, 12, 13, 14, 15, 16)]
names(region_2534)[2] <- "New_England"
names(region_2534)[3] <- "Mid_Atlantic"
names(region_2534)[4] <- "Midwest"
names(region_2534)[5] <- "South"
names(region_2534)[6] <- "Mountain_West"
names(region_2534)[7] <- "Pacific"

region_2534 <- melt(region_2534, id.vars = c("year"), variable.name = "region", value.name = "rate")

head(region_2534)

##   year      region      rate
## 1 1960 New_England 0.1504184
## 2 1970 New_England 0.1517231
## 3 1980 New_England 0.2414327
## 4 1990 New_England 0.3500384
## 5 2000 New_England 0.4091852
## 6 2001 New_England 0.4200581

#By family income
income_2534 <- age2534[c(1, 17, 18, 19)]
names(income_2534)[2] <- "Low_25%"
names(income_2534)[3] <- "Middle_50%"
names(income_2534)[4] <- "Top_25%"

income_2534 <- melt(income_2534, id.vars = c("year"), variable.name = "income", value.name = "rate")

head(income_2534)

##   year  income      rate
## 1 1960 Low_25% 0.1371597
## 2 1970 Low_25% 0.1717202
## 3 1980 Low_25% 0.3100591
## 4 1990 Low_25% 0.4199108
## 5 2000 Low_25% 0.5033676
## 6 2001 Low_25% 0.5178771

#By age of 35-44
age3544 <- marriages[c(2, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 32, 33, 34, 35, 36, 37, 38, 39)]

#By education
edu_3544 <- age3544[c(1, 3, 4, 5, 6, 7)]
names(edu_3544)[2] <- "High_School"
names(edu_3544)[3] <- "Some_College"
names(edu_3544)[4] <- "Bachelor's"
names(edu_3544)[5] <- "No_Graduate"
names(edu_3544)[6] <- "Graduate"

edu_3544 <- melt(edu_3544, id.vars = c("year"), variable.name = "education", value.name = "rate")

edu_3544 <- edu_3544[!is.na(edu_3544$rate),]

head(edu_3544)

##   year   education       rate
## 1 1960 High_School 0.06860309
## 2 1970 High_School 0.06511964
## 3 1980 High_School 0.06429102
## 4 1990 High_School 0.11210043
## 5 2000 High_School 0.16993703
## 6 2001 High_School 0.16870156

#By race
race_3544 <- age3544[c(1, 8, 9, 10)]
names(race_3544)[2] <- "White"
names(race_3544)[3] <- "Black"
names(race_3544)[4] <- "Hispanic"

race_3544 <- melt(race_3544, id.vars = c("year"), variable.name = "race", value.name = "rate")

head(race_3544)

##   year  race       rate
## 1 1960 White 0.06825586
## 2 1970 White 0.06250372
## 3 1980 White 0.05966739
## 4 1990 White 0.09611312
## 5 2000 White 0.13207032
## 6 2001 White 0.13287455

#By region
region_3544 <- age3544[c(1, 11, 12, 13, 14, 15, 16)]
names(region_3544)[2] <- "New_England"
names(region_3544)[3] <- "Mid_Atlantic"
names(region_3544)[4] <- "Midwest"
names(region_3544)[5] <- "South"
names(region_3544)[6] <- "Mountain_West"
names(region_3544)[7] <- "Pacific"

region_3544 <- melt(region_3544, id.vars = c("year"), variable.name = "region", value.name = "rate")

head(region_3544)

##   year      region       rate
## 1 1960 New_England 0.09194322
## 2 1970 New_England 0.08570110
## 3 1980 New_England 0.07997323
## 4 1990 New_England 0.12785915
## 5 2000 New_England 0.17327422
## 6 2001 New_England 0.16653497

#By family income
income_3544 <- age3544[c(1, 17, 18, 19)]
names(income_3544)[2] <- "Low_25%"
names(income_3544)[3] <- "Middle_50%"
names(income_3544)[4] <- "Top_25%"

income_3544 <- melt(income_3544, id.vars = c("year"), variable.name = "income", value.name = "rate")

head(income_3544)

##   year  income       rate
## 1 1960 Low_25% 0.05822486
## 2 1970 Low_25% 0.06347796
## 3 1980 Low_25% 0.07552538
## 4 1990 Low_25% 0.13134638
## 5 2000 Low_25% 0.17480047
## 6 2001 Low_25% 0.17694864

#By age of 35-44
age4554 <- marriages[c(2, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50)]

#By education
edu_4554 <- age4554[c(1, 3, 4, 5, 6, 7)]
names(edu_4554)[2] <- "High_School"
names(edu_4554)[3] <- "Some_College"
names(edu_4554)[4] <- "Bachelor's"
names(edu_4554)[5] <- "No_Graduate"
names(edu_4554)[6] <- "Graduate"

edu_4554 <- melt(edu_4554, id.vars = c("year"), variable.name = "education", value.name = "rate")

edu_4554 <- edu_4554[!is.na(edu_4554$rate),]

head(edu_4554)

##   year   education       rate
## 1 1960 High_School 0.06840792
## 2 1970 High_School 0.05833439
## 3 1980 High_School 0.05036563
## 4 1990 High_School 0.05988244
## 5 2000 High_School 0.09442809
## 6 2001 High_School 0.09189007

#By race
race_4554 <- age4554[c(1, 8, 9, 10)]
names(race_4554)[2] <- "White"
names(race_4554)[3] <- "Black"
names(race_4554)[4] <- "Hispanic"

race_4554 <- melt(race_4554, id.vars = c("year"), variable.name = "race", value.name = "rate")

head(race_4554)

##   year  race       rate
## 1 1960 White 0.07246692
## 2 1970 White 0.05754799
## 3 1980 White 0.04765354
## 4 1990 White 0.05092552
## 5 2000 White 0.07578174
## 6 2001 White 0.07516912

#By region
region_4554 <- age4554[c(1, 11, 12)]
names(region_4554)[2] <- "New_England"
names(region_4554)[3] <- "Mid_Atlantic"

region_4554 <- melt(region_4554, id.vars = c("year"), variable.name = "region", value.name = "rate")

head(region_4554)

##   year      region       rate
## 1 1960 New_England 0.10236412
## 2 1970 New_England 0.08028082
## 3 1980 New_England 0.06930253
## 4 1990 New_England 0.07047502
## 5 2000 New_England 0.10232170
## 6 2001 New_England 0.09868408

Analysis

The initial dataset comprises a diverse range of data spanning multiple decades and years, encompassing various conditions. For improved manageability and clarity, it is advisable to transform the dataset into a long-format dataframe categorized by relevant criteria.

Project 2

Yanyi Li

2024-10-10

First Data Set

There is no record of same-day shipping, so I have skipped this step.

Analysis

Second Data Set

Analysis

Third Data Set

Analysis