Assignment

Theme for this Project is Data Transformation.

For this project: 1. Create a .CSV file (or optionally, a MySQL database!) that includes all of the information included in the dataset. You’re encouraged to use a “wide” structure similar to how the information appears in the discussion item, so that you can practice tidying and transformations as described below. 2. Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. [Most of your grade will be based on this step!] 3. Perform the analysis requested in the discussion item.

This dataset is from the NYC MTA of annual ridership from 2013-2018.

Collaborators:

Read Dataset

The NYC MTA Annual Ridership dataset, containing column names, is read from GitHub.

# 
theURL <- c("https://raw.githubusercontent.com/CUNYSPS-RickRN/DATA607/master/subway_riders.csv")
subway_df <- tbl_df(read_csv(theURL))
## Warning: `tbl_df()` is deprecated as of dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: Missing column names filled in: 'X8' [8]
## Parsed with column specification:
## cols(
##   Borough = col_character(),
##   `2013` = col_number(),
##   `2014` = col_number(),
##   `2015` = col_number(),
##   `2016` = col_number(),
##   `2017` = col_number(),
##   `2018` = col_number(),
##   X8 = col_logical()
## )
# glimpse(subway_df)

Tidy & Transform

This phase will tidy and transform the MTA Ridership dataset. The column names will be renamed to meaningful names, removal of extraneous column and row containing NA values. Pivot_longer used to tidy data to represent observations comprised of Borough, Year, and Ridership values.

# Change the column names
names(subway_df) <- c("Borough", "Yr2013","Yr2014","Yr2015","Yr2016","Yr2017","Yr2018","IgnoreThisCol")

subway_df <- select(subway_df, -(IgnoreThisCol))

subway_df <- filter(subway_df,!(is.na(Borough)))  # remove rows with NA values

List_of_Boroughs <- subway_df$Borough

subway_change_df <- mutate(subway_df, Diff_2013_to_2018 = Yr2018 - Yr2013,
                    Pct_Change = (Diff_2013_to_2018 / Yr2013) * 100)

subway_pivot_df <- pivot_longer(subway_df, starts_with("Yr"), names_to = "Year", values_to = "Ridership")

Analysis

The first chart shows the year to year ridership by Borough. The second chart shows the percent change between the riderships in 2013 to 2018. Ridership in The Bronx declined by more than 6% followed by Queens with a decline of more than 4%. Ridership in Manhattan and Brooklyn dropped by less than 1 %.

ggplot(data = subway_pivot_df, mapping = aes(x=Year, y=Ridership)) +
  geom_point(mapping = aes(shape = Borough))

ggplot(data=subway_change_df, mapping = aes(x=Borough, y=Pct_Change)) +
  geom_point(mapping = aes(shape = Borough))

Summary

Further exploration is needed to understand the reasons for substantial decline in ridership in The Bronx and Queens.

LS0tDQp0aXRsZTogIkQ2MDdfUDAyX0RTMSINCmF1dGhvcjogIlJpY2tSTiINCmRhdGU6ICJgciBTeXMuRGF0ZSgpYCINCm91dHB1dDogDQogIG9wZW5pbnRybzo6bGFiX3JlcG9ydDogZGVmYXVsdA0KICBodG1sX2RvY3VtZW50Og0KICAgIG51bWJlcl9zZWN0aW9uczogeWVzDQotLS0NCg0KYGBge3Igc3RlcF9zZXR1cCwgaW5jbHVkZT1GQUxTRX0NCmtuaXRyOjpvcHRzX2NodW5rJHNldChlY2hvID0gVFJVRSkNCmxpYnJhcnkodGlkeXZlcnNlKQ0KYGBgDQoNCiMgQXNzaWdubWVudA0KDQo8c3R5bGU+DQpkaXYuYmx1ZSB7IGJhY2tncm91bmQtY29sb3I6I2U2ZjBmZjsgYm9yZGVyLXJhZGl1czogNXB4OyBwYWRkaW5nOiAyMHB4O30NCjwvc3R5bGU+DQo8ZGl2IGNsYXNzID0gImJsdWUiPg0KDQpUaGVtZSBmb3IgdGhpcyBQcm9qZWN0IGlzIERhdGEgVHJhbnNmb3JtYXRpb24uICANCg0KRm9yIHRoaXMgcHJvamVjdDoNCjEuIENyZWF0ZSBhIC5DU1YgZmlsZSAob3Igb3B0aW9uYWxseSwgYSBNeVNRTCBkYXRhYmFzZSEpIHRoYXQgaW5jbHVkZXMgYWxsIG9mIHRoZSBpbmZvcm1hdGlvbiBpbmNsdWRlZCBpbiB0aGUgZGF0YXNldC4gWW914oCZcmUgZW5jb3VyYWdlZCB0byB1c2UgYSDigJx3aWRl4oCdIHN0cnVjdHVyZSBzaW1pbGFyIHRvIGhvdyB0aGUgaW5mb3JtYXRpb24gYXBwZWFycyBpbiB0aGUgZGlzY3Vzc2lvbiBpdGVtLCBzbyB0aGF0IHlvdSBjYW4gcHJhY3RpY2UgdGlkeWluZyBhbmQgdHJhbnNmb3JtYXRpb25zIGFzIGRlc2NyaWJlZCBiZWxvdy4NCjIuIFJlYWQgdGhlIGluZm9ybWF0aW9uIGZyb20geW91ciAuQ1NWIGZpbGUgaW50byBSLCBhbmQgdXNlIHRpZHlyIGFuZCBkcGx5ciBhcyBuZWVkZWQgdG8gdGlkeSBhbmQgdHJhbnNmb3JtIHlvdXIgZGF0YS4gW01vc3Qgb2YgeW91ciBncmFkZSB3aWxsIGJlIGJhc2VkIG9uIHRoaXMgc3RlcCFdDQozLiBQZXJmb3JtIHRoZSBhbmFseXNpcyByZXF1ZXN0ZWQgaW4gdGhlIGRpc2N1c3Npb24gaXRlbS4NCg0KVGhpcyBkYXRhc2V0IGlzIGZyb20gdGhlIE5ZQyBNVEEgb2YgYW5udWFsIHJpZGVyc2hpcCBmcm9tIDIwMTMtMjAxOC4NCg0KQ29sbGFib3JhdG9yczoNCg0KPC9kaXY+IFxoZmlsbFxicmVhaw0KDQoNCiMgUmVhZCBEYXRhc2V0DQoNCjxzdHlsZT4NCmRpdi5ibHVlIHsgYmFja2dyb3VuZC1jb2xvcjojZTZmMGZmOyBib3JkZXItcmFkaXVzOiA1cHg7IHBhZGRpbmc6IDIwcHg7fQ0KPC9zdHlsZT4NCjxkaXYgY2xhc3MgPSAiYmx1ZSI+DQoNClRoZSBOWUMgTVRBIEFubnVhbCBSaWRlcnNoaXAgZGF0YXNldCwgY29udGFpbmluZyBjb2x1bW4gbmFtZXMsIGlzIHJlYWQgZnJvbSBHaXRIdWIuICANCg0KDQo8L2Rpdj4gXGhmaWxsXGJyZWFrDQoNCg0KDQpgYGB7ciBzdGVwX3JlYWRfZHMsIGVjaG89VFJVRSB9DQojIA0KdGhlVVJMIDwtIGMoImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9DVU5ZU1BTLVJpY2tSTi9EQVRBNjA3L21hc3Rlci9zdWJ3YXlfcmlkZXJzLmNzdiIpDQpzdWJ3YXlfZGYgPC0gdGJsX2RmKHJlYWRfY3N2KHRoZVVSTCkpDQoNCiMgZ2xpbXBzZShzdWJ3YXlfZGYpDQoNCmBgYA0KDQojIFRpZHkgJiBUcmFuc2Zvcm0NCg0KPHN0eWxlPg0KZGl2LmJsdWUgeyBiYWNrZ3JvdW5kLWNvbG9yOiNlNmYwZmY7IGJvcmRlci1yYWRpdXM6IDVweDsgcGFkZGluZzogMjBweDt9DQo8L3N0eWxlPg0KPGRpdiBjbGFzcyA9ICJibHVlIj4NCg0KVGhpcyBwaGFzZSB3aWxsIHRpZHkgYW5kIHRyYW5zZm9ybSB0aGUgTVRBIFJpZGVyc2hpcCBkYXRhc2V0LiBUaGUgY29sdW1uIG5hbWVzIHdpbGwgYmUgcmVuYW1lZCB0byBtZWFuaW5nZnVsIG5hbWVzLCByZW1vdmFsIG9mIGV4dHJhbmVvdXMgY29sdW1uIGFuZCByb3cgY29udGFpbmluZyBOQSB2YWx1ZXMuICBQaXZvdF9sb25nZXIgdXNlZCB0byB0aWR5IGRhdGEgdG8gcmVwcmVzZW50IG9ic2VydmF0aW9ucyBjb21wcmlzZWQgb2YgQm9yb3VnaCwgWWVhciwgYW5kIFJpZGVyc2hpcCB2YWx1ZXMuDQoNCjwvZGl2PiBcaGZpbGxcYnJlYWsNCmBgYHtyIHN0ZXBfdGlkeSwgZWNobz1UUlVFfQ0KIyBDaGFuZ2UgdGhlIGNvbHVtbiBuYW1lcw0KbmFtZXMoc3Vid2F5X2RmKSA8LSBjKCJCb3JvdWdoIiwgIllyMjAxMyIsIllyMjAxNCIsIllyMjAxNSIsIllyMjAxNiIsIllyMjAxNyIsIllyMjAxOCIsIklnbm9yZVRoaXNDb2wiKQ0KDQpzdWJ3YXlfZGYgPC0gc2VsZWN0KHN1YndheV9kZiwgLShJZ25vcmVUaGlzQ29sKSkNCg0Kc3Vid2F5X2RmIDwtIGZpbHRlcihzdWJ3YXlfZGYsIShpcy5uYShCb3JvdWdoKSkpICAjIHJlbW92ZSByb3dzIHdpdGggTkEgdmFsdWVzDQoNCkxpc3Rfb2ZfQm9yb3VnaHMgPC0gc3Vid2F5X2RmJEJvcm91Z2gNCg0Kc3Vid2F5X2NoYW5nZV9kZiA8LSBtdXRhdGUoc3Vid2F5X2RmLCBEaWZmXzIwMTNfdG9fMjAxOCA9IFlyMjAxOCAtIFlyMjAxMywNCiAgICAgICAgICAgICAgICAgICAgUGN0X0NoYW5nZSA9IChEaWZmXzIwMTNfdG9fMjAxOCAvIFlyMjAxMykgKiAxMDApDQoNCnN1YndheV9waXZvdF9kZiA8LSBwaXZvdF9sb25nZXIoc3Vid2F5X2RmLCBzdGFydHNfd2l0aCgiWXIiKSwgbmFtZXNfdG8gPSAiWWVhciIsIHZhbHVlc190byA9ICJSaWRlcnNoaXAiKQ0KDQoNCmBgYA0KDQojIEFuYWx5c2lzDQoNCjxzdHlsZT4NCmRpdi5ibHVlIHsgYmFja2dyb3VuZC1jb2xvcjojZTZmMGZmOyBib3JkZXItcmFkaXVzOiA1cHg7IHBhZGRpbmc6IDIwcHg7fQ0KPC9zdHlsZT4NCjxkaXYgY2xhc3MgPSAiYmx1ZSI+DQoNClRoZSBmaXJzdCBjaGFydCBzaG93cyB0aGUgeWVhciB0byB5ZWFyIHJpZGVyc2hpcCBieSBCb3JvdWdoLiBUaGUgc2Vjb25kIGNoYXJ0IHNob3dzIHRoZSBwZXJjZW50IGNoYW5nZSBiZXR3ZWVuIHRoZSByaWRlcnNoaXBzIGluIDIwMTMgdG8gMjAxOC4gIFJpZGVyc2hpcCBpbiBUaGUgQnJvbnggZGVjbGluZWQgYnkgbW9yZSB0aGFuIDYlIGZvbGxvd2VkIGJ5IFF1ZWVucyB3aXRoIGEgZGVjbGluZSBvZiBtb3JlIHRoYW4gNCUuICBSaWRlcnNoaXAgaW4gTWFuaGF0dGFuIGFuZCBCcm9va2x5biBkcm9wcGVkIGJ5IGxlc3MgdGhhbiAxICUuDQoNCjwvZGl2PiBcaGZpbGxcYnJlYWsNCg0KDQpgYGB7ciBzdGVwX2FuYWx5c2lzfQ0KZ2dwbG90KGRhdGEgPSBzdWJ3YXlfcGl2b3RfZGYsIG1hcHBpbmcgPSBhZXMoeD1ZZWFyLCB5PVJpZGVyc2hpcCkpICsNCiAgZ2VvbV9wb2ludChtYXBwaW5nID0gYWVzKHNoYXBlID0gQm9yb3VnaCkpDQoNCmdncGxvdChkYXRhPXN1YndheV9jaGFuZ2VfZGYsIG1hcHBpbmcgPSBhZXMoeD1Cb3JvdWdoLCB5PVBjdF9DaGFuZ2UpKSArDQogIGdlb21fcG9pbnQobWFwcGluZyA9IGFlcyhzaGFwZSA9IEJvcm91Z2gpKQ0KDQoNCmBgYA0KDQojIFN1bW1hcnkNCg0KPHN0eWxlPg0KZGl2LmJsdWUgeyBiYWNrZ3JvdW5kLWNvbG9yOiNlNmYwZmY7IGJvcmRlci1yYWRpdXM6IDVweDsgcGFkZGluZzogMjBweDt9DQo8L3N0eWxlPg0KPGRpdiBjbGFzcyA9ICJibHVlIj4NCg0KRnVydGhlciBleHBsb3JhdGlvbiBpcyBuZWVkZWQgdG8gdW5kZXJzdGFuZCB0aGUgcmVhc29ucyBmb3Igc3Vic3RhbnRpYWwgZGVjbGluZSBpbiByaWRlcnNoaXAgaW4gVGhlIEJyb254IGFuZCBRdWVlbnMuDQoNCjwvZGl2PiBcaGZpbGxcYnJlYWsNCg==