The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work.

I will be using following datasets from week 6 discussion forums.

Income limits data - Khyati Naik (me)
Zillow Home Values - Sanielle Worrell
Gun violence data - Keeno Glanville

I downloaded the data from above websites and have saved it in my github repository. I will be reading in daatsets from my github repository to perform data manipulation and analysis.

Income limts data - Khyati Naik

Research questions

  1. What is the trend of very low income limit for households of different sizes in metro vs non metro areas?
  2. By what proportion is the very low income limit higher in metro areas vs non metro areas. Does this proportion vary by different household size?

Load packages

library(tidyverse)

Read the data file from github

#provide the github data path
dt_inc_lim <- "https://raw.githubusercontent.com/Naik-Khyati/data_prep_proj2/main/data/Section8-FY22.csv"

raw_dt_inc_lim <- read.csv(dt_inc_lim, sep=",", stringsAsFactors=FALSE)

glimpse(raw_dt_inc_lim)
## Rows: 4,765
## Columns: 35
## $ State_Alpha      <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL",…
## $ fips2010         <dbl> 100199999, 100399999, 100599999, 100799999, 100999999…
## $ cbsasub          <chr> "METRO33860M33860", "METRO19300M19300", "NCNTY01005N0…
## $ median2022       <int> 75500, 85500, 46400, 84800, 84800, 48200, 56100, 6880…
## $ Metro_Area_Name  <chr> "Montgomery, AL MSA", "Daphne-Fairhope-Foley, AL MSA"…
## $ county_town_name <chr> "Autauga County", "Baldwin County", "Barbour County",…
## $ l50_1            <int> 26450, 29950, 21600, 29700, 29700, 21600, 21600, 2410…
## $ l50_2            <int> 30200, 34200, 24650, 33950, 33950, 24650, 24650, 2755…
## $ l50_3            <int> 34000, 38500, 27750, 38200, 38200, 27750, 27750, 3100…
## $ l50_4            <int> 37750, 42750, 30800, 42400, 42400, 30800, 30800, 3440…
## $ l50_5            <int> 40800, 46200, 33300, 45800, 45800, 33300, 33300, 3720…
## $ l50_6            <int> 43800, 49600, 35750, 49200, 49200, 35750, 35750, 3995…
## $ l50_7            <int> 46850, 53050, 38200, 52600, 52600, 38200, 38200, 4270…
## $ l50_8            <int> 49850, 56450, 40700, 56000, 56000, 40700, 40700, 4545…
## $ ELI_1            <int> 15900, 18000, 13590, 17850, 17850, 13590, 13590, 1450…
## $ ELI_2            <int> 18310, 20550, 18310, 20400, 20400, 18310, 18310, 1831…
## $ ELI_3            <int> 23030, 23100, 23030, 23030, 23030, 23030, 23030, 2303…
## $ ELI_4            <int> 27750, 27750, 27750, 27750, 27750, 27750, 27750, 2775…
## $ ELI_5            <int> 32470, 32470, 32470, 32470, 32470, 32470, 32470, 3247…
## $ ELI_6            <int> 37190, 37190, 35750, 37190, 37190, 35750, 35750, 3719…
## $ ELI_7            <int> 41910, 41910, 38200, 41910, 41910, 38200, 38200, 4191…
## $ ELI_8            <int> 46630, 46630, 40700, 46630, 46630, 40700, 40700, 4545…
## $ l80_1            <int> 42300, 47900, 34550, 47500, 47500, 34550, 34550, 3855…
## $ l80_2            <int> 48350, 54750, 39450, 54300, 54300, 39450, 39450, 4405…
## $ l80_3            <int> 54400, 61600, 44400, 61100, 61100, 44400, 44400, 4955…
## $ l80_4            <int> 60400, 68400, 49300, 67850, 67850, 49300, 49300, 5505…
## $ l80_5            <int> 65250, 73900, 53250, 73300, 73300, 53250, 53250, 5950…
## $ l80_6            <int> 70100, 79350, 57200, 78750, 78750, 57200, 57200, 6390…
## $ l80_7            <int> 74900, 84850, 61150, 84150, 84150, 61150, 61150, 6830…
## $ l80_8            <int> 79750, 90300, 65100, 89600, 89600, 65100, 65100, 7270…
## $ state            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ county           <int> 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29…
## $ County_Name      <chr> "Autauga County", "Baldwin County", "Barbour County",…
## $ state_name       <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alabama"…
## $ metro            <int> 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0,…

The dataset has 4765 rows and 35 columns.

Select metro column and only columns that start with l50_ which represent very low income limits

vli_wide <- raw_dt_inc_lim %>%  select(metro, starts_with('l50_')) %>% 
  group_by (metro) %>% 
  summarise(across(everything(), list(mean)))

Convert the data from wide to long format to plot the chart

vli_long <- vli_wide %>% gather("vli_cat","mean_vli", 2:9)
head(vli_long)
## # A tibble: 6 × 3
##   metro vli_cat mean_vli
##   <int> <chr>      <dbl>
## 1     0 l50_1_1   26747.
## 2     1 l50_1_1   32725.
## 3     0 l50_2_1   30560.
## 4     1 l50_2_1   37393.
## 5     0 l50_3_1   34384.
## 6     1 l50_3_1   42067.

Convert metro variable to character

Since the fill variable has to be of character to create a chart.

# converting integer to character to be used in fill for the chart
vli_long$metro = as.character(vli_long$metro)

Plot the chart to answer first research question

ggplot(vli_long, aes(x = vli_cat, y = mean_vli, fill = metro )) + 
geom_bar(position = "dodge",stat = "identity") +
xlab("# Person in Family") +
ylab("Mean Very Low Income") +
theme_minimal() 

Income limit for household of 1 member is the lowest and income limit for household of 8 members is the highest. For all sizes of families (1 to 8), metro income limit is higher than non metro income limit.

Prepare data to answer second research question

vli_metro_wide <- vli_long %>% spread(metro,mean_vli)
vli_metro_wide <- vli_metro_wide %>% rename('non_metro'='0', 'metro'='1') %>% mutate(pct_met_high = (metro/non_metro-1)*100)
head(vli_metro_wide)
## # A tibble: 6 × 4
##   vli_cat non_metro  metro pct_met_high
##   <chr>       <dbl>  <dbl>        <dbl>
## 1 l50_1_1    26747. 32725.         22.3
## 2 l50_2_1    30560. 37393.         22.4
## 3 l50_3_1    34384. 42067.         22.3
## 4 l50_4_1    38176. 46716.         22.4
## 5 l50_5_1    41258. 50482.         22.4
## 6 l50_6_1    44311. 54215.         22.4

Mean income in metro areas is about 22% higher than non metro areas for very low income households. Across all family size (1 to 8), mean income is consistently higher by the same proportion of 22%.

Zillow home values data - Sanielle Worrell

Research questions

  1. What is the trend in home prices in Maryland, North Carolina and Virginia states?
  2. Which state has highest average home prices across USA for 2022 ?
#provide the github data path
dt_zhv <- "https://raw.githubusercontent.com/Naik-Khyati/data_prep_proj2/main/data/State_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv"

raw_dt_zhv <- read.csv(dt_zhv, sep=",", stringsAsFactors=FALSE)

glimpse(raw_dt_zhv)
## Rows: 51
## Columns: 277
## $ RegionID    <int> 9, 54, 14, 43, 47, 21, 44, 16, 36, 30, 40, 56, 59, 8, 26, …
## $ SizeRank    <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, …
## $ RegionName  <chr> "California", "Texas", "Florida", "New York", "Pennsylvani…
## $ RegionType  <chr> "state", "state", "state", "state", "state", "state", "sta…
## $ StateName   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ X2000.01.31 <dbl> 199037, 114095, 115670, 150013, 106281, 147480, 110729, 13…
## $ X2000.02.29 <dbl> 199883, 114219, 116011, 150780, 106641, 147803, 110893, 13…
## $ X2000.03.31 <dbl> 200947, 114259, 116372, 151421, 106931, 148243, 111066, 13…
## $ X2000.04.30 <dbl> 203187, 114465, 117073, 152775, 107498, 149108, 111517, 13…
## $ X2000.05.31 <dbl> 205587, 114515, 117740, 153998, 107988, 149999, 112063, 13…
## $ X2000.06.30 <dbl> 208086, 114586, 118403, 155256, 108434, 150866, 112649, 13…
## $ X2000.07.31 <dbl> 210682, 114591, 119056, 156279, 108885, 151822, 113145, 13…
## $ X2000.08.31 <dbl> 213433, 114808, 119728, 157127, 109278, 152932, 113607, 13…
## $ X2000.09.30 <dbl> 216199, 115059, 120412, 157870, 109694, 154078, 114066, 13…
## $ X2000.10.31 <dbl> 218822, 115321, 121110, 158720, 110090, 155213, 114518, 13…
## $ X2000.11.30 <dbl> 221448, 115653, 121846, 159850, 110576, 156245, 114975, 13…
## $ X2000.12.31 <dbl> 224027, 116033, 122614, 161062, 111082, 157223, 115427, 14…
## $ X2001.01.31 <dbl> 226097, 116268, 123381, 162223, 111587, 158137, 115863, 14…
## $ X2001.02.28 <dbl> 228240, 116225, 124189, 163187, 112075, 158957, 116234, 14…
## $ X2001.03.31 <dbl> 230690, 116019, 125044, 164142, 112606, 159916, 116622, 14…
## $ X2001.04.30 <dbl> 233777, 115939, 125974, 165161, 113184, 160842, 117011, 14…
## $ X2001.05.31 <dbl> 236629, 116032, 126940, 166252, 113671, 161964, 117389, 14…
## $ X2001.06.30 <dbl> 239108, 116169, 127940, 167407, 114143, 163005, 117707, 14…
## $ X2001.07.31 <dbl> 241455, 116292, 129027, 168589, 114640, 164099, 118057, 14…
## $ X2001.08.31 <dbl> 243800, 116382, 130178, 169807, 115188, 165209, 118439, 14…
## $ X2001.09.30 <dbl> 246136, 116489, 131345, 170951, 115723, 166385, 118820, 14…
## $ X2001.10.31 <dbl> 248340, 116613, 132503, 172175, 116198, 167582, 119183, 14…
## $ X2001.11.30 <dbl> 250418, 116685, 133592, 173467, 116718, 168618, 119508, 14…
## $ X2001.12.31 <dbl> 252377, 116754, 134681, 174863, 117206, 169487, 119818, 14…
## $ X2002.01.31 <dbl> 253884, 116821, 135720, 176260, 117716, 170285, 120087, 14…
## $ X2002.02.28 <dbl> 255546, 117019, 136713, 177542, 118259, 171174, 120350, 14…
## $ X2002.03.31 <dbl> 257505, 117381, 137707, 178639, 118855, 172299, 120633, 15…
## $ X2002.04.30 <dbl> 260207, 117846, 138778, 179766, 119516, 173241, 120920, 15…
## $ X2002.05.31 <dbl> 263230, 118304, 139918, 180954, 120190, 174268, 121221, 15…
## $ X2002.06.30 <dbl> 266535, 118801, 141098, 182332, 120900, 175171, 121516, 15…
## $ X2002.07.31 <dbl> 270418, 119281, 142284, 183764, 121681, 176369, 121846, 15…
## $ X2002.08.31 <dbl> 274548, 119798, 143583, 185368, 122562, 177633, 122165, 15…
## $ X2002.09.30 <dbl> 278669, 120211, 144811, 186961, 123448, 178886, 122485, 15…
## $ X2002.10.31 <dbl> 282537, 120562, 146047, 188505, 124304, 180052, 122816, 15…
## $ X2002.11.30 <dbl> 286382, 120866, 147195, 189992, 124997, 180905, 123178, 15…
## $ X2002.12.31 <dbl> 290199, 121131, 148525, 191592, 125704, 181883, 123548, 15…
## $ X2003.01.31 <dbl> 293431, 121386, 149876, 193171, 126396, 182822, 123889, 15…
## $ X2003.02.28 <dbl> 296475, 121582, 151303, 194764, 127143, 183953, 124242, 15…
## $ X2003.03.31 <dbl> 299798, 121837, 152759, 196566, 127860, 184979, 124575, 15…
## $ X2003.04.30 <dbl> 303735, 122058, 154330, 198257, 128644, 186146, 124932, 15…
## $ X2003.05.31 <dbl> 308044, 122274, 155933, 199967, 129603, 187159, 125273, 15…
## $ X2003.06.30 <dbl> 311736, 122473, 157490, 201597, 130722, 188098, 125667, 15…
## $ X2003.07.31 <dbl> 315864, 122661, 159027, 203342, 131893, 189001, 126092, 15…
## $ X2003.08.31 <dbl> 320270, 122903, 160625, 205368, 132950, 190201, 126594, 15…
## $ X2003.09.30 <dbl> 325320, 123094, 162281, 207520, 133997, 191567, 127084, 15…
## $ X2003.10.31 <dbl> 330049, 123310, 163946, 209668, 134976, 192871, 127559, 15…
## $ X2003.11.30 <dbl> 334587, 123436, 165559, 211399, 135851, 193942, 127927, 15…
## $ X2003.12.31 <dbl> 338999, 123562, 167092, 212976, 136633, 194888, 128278, 15…
## $ X2004.01.31 <dbl> 343702, 124102, 168652, 214629, 137334, 196004, 128578, 15…
## $ X2004.02.29 <dbl> 349090, NA, 170235, 216234, 138275, 197117, 128951, 159140…
## $ X2004.03.31 <dbl> 355174, NA, 172026, 217543, 139189, 198283, 129339, 159506…
## $ X2004.04.30 <dbl> 362112, 128271, 173996, 218922, 140237, 199581, 129712, 16…
## $ X2004.05.31 <dbl> 370122, 129224, 176301, 220597, 141455, 201016, 130073, 16…
## $ X2004.06.30 <dbl> 378974, 129689, 178900, 222653, 142860, 202604, 130419, 16…
## $ X2004.07.31 <dbl> 387711, 130251, 181765, 224889, 144487, 204055, 130803, 16…
## $ X2004.08.31 <dbl> 395866, 130750, 184917, 227088, 145947, 205495, 131123, 16…
## $ X2004.09.30 <dbl> 403633, 131386, 188001, 229422, 147422, 206719, 131438, 16…
## $ X2004.10.31 <dbl> 411215, 131932, 190895, 231627, 148626, 207982, 131738, 16…
## $ X2004.11.30 <dbl> 417616, 132414, 193581, 233900, 149949, 209256, 132064, 16…
## $ X2004.12.31 <dbl> 424103, 132835, 196434, 236024, 151233, 210700, 132372, 16…
## $ X2005.01.31 <dbl> 429685, 133415, 199469, 238097, 152601, 212005, 132735, 16…
## $ X2005.02.28 <dbl> 436252, 133918, 202768, 240152, 153592, 213348, 133130, 16…
## $ X2005.03.31 <dbl> 443040, 134418, 206517, 242003, 154631, 214670, 133533, 16…
## $ X2005.04.30 <dbl> 450862, 134801, 210908, 244040, 155825, 216182, 133906, 16…
## $ X2005.05.31 <dbl> 458029, 135198, 215708, 246208, 157322, 217755, 134215, 16…
## $ X2005.06.30 <dbl> 464500, 135603, 220890, 248426, 159024, 219302, 134545, 16…
## $ X2005.07.31 <dbl> 470776, 135919, 226389, 250799, 160787, 220776, 134837, 16…
## $ X2005.08.31 <dbl> 477297, 136218, 231998, 253271, 162606, 222231, 135167, 16…
## $ X2005.09.30 <dbl> 483827, 136521, 237338, 255866, 164233, 223864, 135461, 16…
## $ X2005.10.31 <dbl> 490183, 136730, 242107, 258447, 165693, 225429, 135765, 16…
## $ X2005.11.30 <dbl> 494866, 137031, 246517, 260647, 167032, 226786, 136057, 16…
## $ X2005.12.31 <dbl> 499214, 137225, 250674, 262940, 168300, 228087, 136311, 17…
## $ X2006.01.31 <dbl> 502729, 137451, 254595, 264861, 169390, 229411, 136503, 17…
## $ X2006.02.28 <dbl> 505986, 137746, 258243, 266662, 170327, 230696, 136651, 17…
## $ X2006.03.31 <dbl> 509194, 138135, 262211, 268068, 171238, 231975, 136793, 17…
## $ X2006.04.30 <dbl> 512001, 138658, 266624, 269276, 172255, 233055, 136959, 17…
## $ X2006.05.31 <dbl> 514920, 139287, 270598, 270461, 173432, 234231, 137130, 17…
## $ X2006.06.30 <dbl> 516585, 139913, 273738, 271738, 174709, 234944, 137273, 17…
## $ X2006.07.31 <dbl> 516854, 140506, 275637, 272906, 175808, 235550, 137395, 17…
## $ X2006.08.31 <dbl> 516593, 140999, 276843, 273924, 176623, 236018, 137446, 17…
## $ X2006.09.30 <dbl> 515027, 141570, 277072, 274443, 177128, 236388, 137427, 17…
## $ X2006.10.31 <dbl> 513686, 142331, 276707, 274828, 177547, 236572, 137358, 17…
## $ X2006.11.30 <dbl> 512291, 142986, 276074, 274976, 177881, 236679, 137213, 17…
## $ X2006.12.31 <dbl> 511194, 143645, 275006, 275242, 178167, 236924, 137017, 17…
## $ X2007.01.31 <dbl> 510336, 143870, 273819, 275431, 178484, 237449, 136833, 17…
## $ X2007.02.28 <dbl> 509175, 143971, 272625, 275583, 178898, 237891, 136738, 17…
## $ X2007.03.31 <dbl> 507418, 143968, 271247, 275867, 179433, 238173, 136561, 17…
## $ X2007.04.30 <dbl> 505218, 144348, 269654, 275879, 179870, 238164, 136383, 17…
## $ X2007.05.31 <dbl> 501555, 144839, 267502, 276118, 180309, 238003, 136183, 17…
## $ X2007.06.30 <dbl> 497687, 145398, 264886, 276041, 180602, 237705, 136025, 17…
## $ X2007.07.31 <dbl> 492554, 145819, 261773, 276104, 180980, 237240, 135776, 17…
## $ X2007.08.31 <dbl> 487542, 146306, 258425, 276187, 181183, 236744, 135493, 17…
## $ X2007.09.30 <dbl> 481698, 146766, 255060, 276157, 181237, 235967, 135312, 17…
## $ X2007.10.31 <dbl> 475676, 147174, 251604, 276400, 181279, 235195, 135056, 17…
## $ X2007.11.30 <dbl> 469729, 147489, 247948, 276559, 181375, 234413, 134584, 17…
## $ X2007.12.31 <dbl> 463195, 147793, 244275, 276579, 181423, 233422, 134031, 17…
## $ X2008.01.31 <dbl> 456917, 147392, 240154, 276578, 181285, 232374, 133478, 17…
## $ X2008.02.29 <dbl> 448998, 146871, 236247, 276306, 181226, NA, 132975, 175487…
## $ X2008.03.31 <dbl> 439505, 146226, 231954, 276097, 181109, NA, 132590, 174484…
## $ X2008.04.30 <dbl> 428469, 146095, 227960, 275618, 181044, 227152, 132255, 17…
## $ X2008.05.31 <dbl> 417161, 145928, 223498, 274887, 180742, NA, 132014, 173036…
## $ X2008.06.30 <dbl> 406642, 145727, 219351, 273960, 180331, 224434, 131698, 17…
## $ X2008.07.31 <dbl> 395775, 145564, 215030, 272761, 179590, 223532, 131399, 17…
## $ X2008.08.31 <dbl> 385543, 145094, 210500, 271554, 178685, 220748, 130996, 17…
## $ X2008.09.30 <dbl> 376244, 144477, 205921, 270191, 177792, 218955, 130519, 16…
## $ X2008.10.31 <dbl> 368453, 143791, 201287, 268509, 177073, 215937, 129861, 16…
## $ X2008.11.30 <dbl> 360429, 143171, 196772, 266723, 176364, 213516, 129238, 16…
## $ X2008.12.31 <dbl> 352774, 142612, 192500, 265094, 175555, 210352, 128438, 16…
## $ X2009.01.31 <dbl> 345731, 142092, 188494, 263605, 174747, 207031, 127535, 16…
## $ X2009.02.28 <dbl> 341313, 141949, 185038, 262363, 174324, 204660, 126578, 15…
## $ X2009.03.31 <dbl> 337318, 141956, 181702, 261243, 174156, 202170, 125802, 15…
## $ X2009.04.30 <dbl> 333307, 142042, 178317, 260308, 174025, 200952, 125360, 15…
## $ X2009.05.31 <dbl> 329379, 142072, 175089, 259418, 173707, 199057, 125086, 15…
## $ X2009.06.30 <dbl> 326150, 142031, 172052, 258391, 173301, 197780, 124815, 15…
## $ X2009.07.31 <dbl> 323966, 142051, 169596, 257350, 172890, 196491, 124421, 15…
## $ X2009.08.31 <dbl> 322020, 141841, 167242, 256209, 172383, 195283, 123992, 15…
## $ X2009.09.30 <dbl> 320559, 141632, 165085, 255343, 171938, 193341, 123623, 14…
## $ X2009.10.31 <dbl> 320052, 141365, 163326, 254693, 171676, 191576, 123380, 14…
## $ X2009.11.30 <dbl> 321760, 141497, 162320, 254317, 171691, 189894, 123066, 14…
## $ X2009.12.31 <dbl> 324346, 141640, 161474, 254041, 171822, 189345, 122783, 14…
## $ X2010.01.31 <dbl> 326486, 141638, 160800, 253817, 172134, 188434, 122486, 14…
## $ X2010.02.28 <dbl> 326863, 141609, 160175, 253848, 172522, 188087, 122439, 14…
## $ X2010.03.31 <dbl> 328049, 141697, 159803, 253929, 173073, 186836, 122377, 14…
## $ X2010.04.30 <dbl> 329519, 142002, 159434, 254408, 173541, 187275, 122438, 14…
## $ X2010.05.31 <dbl> 331090, 142240, 158880, 254794, 173783, 186710, 122438, 14…
## $ X2010.06.30 <dbl> 330380, 142212, 158026, 255368, 173716, 187339, 122386, 14…
## $ X2010.07.31 <dbl> 328690, 141817, 156873, 255604, 173306, 185326, 122060, 14…
## $ X2010.08.31 <dbl> 326277, 141187, 155473, 255304, 172683, 184175, 121627, 14…
## $ X2010.09.30 <dbl> 324169, 140531, 154005, 254555, 171807, 181990, 121065, 14…
## $ X2010.10.31 <dbl> 321870, 139909, 152488, 253384, 170839, 180569, 120449, 13…
## $ X2010.11.30 <dbl> 319983, 139274, 150966, 252329, 169800, 178247, 119691, 13…
## $ X2010.12.31 <dbl> 318820, 138762, 149550, 251270, 168874, 176209, 118862, 13…
## $ X2011.01.31 <dbl> 317873, 138307, 148091, 250644, 168096, 174411, 118115, 13…
## $ X2011.02.28 <dbl> 316849, 137842, 146745, 250128, 167484, 173017, 117415, 13…
## $ X2011.03.31 <dbl> 315038, 137426, 145325, 249844, 167029, 172052, 116913, 13…
## $ X2011.04.30 <dbl> 313299, 137044, 143996, 249305, 166465, 170666, 116366, 13…
## $ X2011.05.31 <dbl> 310414, 136752, 142652, 248773, 165812, 169876, 115873, 13…
## $ X2011.06.30 <dbl> 308628, 136383, 141502, 247920, 164959, 168582, 115403, 12…
## $ X2011.07.31 <dbl> 306980, 136237, 140438, 247192, 164179, 168081, 115126, 12…
## $ X2011.08.31 <dbl> 304998, 136112, 139672, 246482, 163574, 166903, 114903, 12…
## $ X2011.09.30 <dbl> 302559, 136061, 139107, 245848, 163170, 165894, 114627, 12…
## $ X2011.10.31 <dbl> 300074, 135876, 138846, 245189, 162802, 163915, 114257, 12…
## $ X2011.11.30 <dbl> 299424, 135594, 138726, 244448, 162397, 162685, 113864, 12…
## $ X2011.12.31 <dbl> 298844, 135303, 138676, 243879, 162006, 160863, 113511, 12…
## $ X2012.01.31 <dbl> 298635, 135217, 138739, 243364, 161678, 159961, 113055, 12…
## $ X2012.02.29 <dbl> 298383, 135466, 138845, 242832, 161440, 158290, 112617, 12…
## $ X2012.03.31 <dbl> 298524, 135913, 139159, 242459, 161342, 158087, 112366, 12…
## $ X2012.04.30 <dbl> 299218, 136317, 139549, 242355, 161371, 157766, 112325, 12…
## $ X2012.05.31 <dbl> 300456, 136742, 140021, 242570, 161452, 158072, 112307, 12…
## $ X2012.06.30 <dbl> 302221, 137154, 140518, 242845, 161402, 157979, 112251, 12…
## $ X2012.07.31 <dbl> 304240, 137655, 141176, 243020, 161224, 157382, 112234, 12…
## $ X2012.08.31 <dbl> 306538, 138014, 141971, 243114, 161018, 156749, 112277, 12…
## $ X2012.09.30 <dbl> 309123, 138223, 142944, 243122, 160828, 155964, 112324, 12…
## $ X2012.10.31 <dbl> 312608, 138385, 144081, 242997, 160708, 156209, 112328, 12…
## $ X2012.11.30 <dbl> 316463, 138748, 145269, 243013, 160629, 156519, 112283, 12…
## $ X2012.12.31 <dbl> 321847, 139261, 146640, 243167, 160671, 156777, 112260, 12…
## $ X2013.01.31 <dbl> 327341, 139947, 148205, 243430, 160833, 157302, 112286, 12…
## $ X2013.02.28 <dbl> 333693, 140683, 149931, 243937, 161103, 158096, 112416, 12…
## $ X2013.03.31 <dbl> 340160, 141591, 151616, 244450, 161455, 158858, 112564, 13…
## $ X2013.04.30 <dbl> 347140, 142537, 153188, 245359, 161920, 159829, 112868, 13…
## $ X2013.05.31 <dbl> 354166, 143476, 154949, 246275, 162394, 160689, 113325, 13…
## $ X2013.06.30 <dbl> 361540, 144411, 156651, 247303, 162886, 162145, 113933, 13…
## $ X2013.07.31 <dbl> 368559, 145362, 158611, 248166, 163321, 163305, 114495, 13…
## $ X2013.08.31 <dbl> 375058, 146253, 160633, 248802, 163606, 164770, 114942, 13…
## $ X2013.09.30 <dbl> 380799, 147173, 162866, 249363, 163846, 165943, 115214, 13…
## $ X2013.10.31 <dbl> 386201, 148166, 164990, 249993, 163969, 167508, 115534, 13…
## $ X2013.11.30 <dbl> 391681, 149220, 167055, 250670, 164189, 168227, 115840, 14…
## $ X2013.12.31 <dbl> 396876, 150178, 169015, 251266, 164359, 169197, 116098, 14…
## $ X2014.01.31 <dbl> 402202, 151109, 170811, 251873, 164608, 169778, 116061, 14…
## $ X2014.02.28 <dbl> 406778, 152114, 172352, 252503, 164891, 170852, 115966, 14…
## $ X2014.03.31 <dbl> 409219, 153042, 173656, 253287, 165189, 171439, 115890, 14…
## $ X2014.04.30 <dbl> 410132, 153884, 174600, 254079, 165402, 171798, 116011, 14…
## $ X2014.05.31 <dbl> 410463, 154733, 175512, 254815, 165569, 172381, 116259, 14…
## $ X2014.06.30 <dbl> 411002, 155717, 176246, 255450, 165814, 173246, 116606, 14…
## $ X2014.07.31 <dbl> 411459, 156604, 177091, 255830, 166052, 173953, 116959, 14…
## $ X2014.08.31 <dbl> 411259, 157442, 177684, 256154, 166173, 174188, 117090, 14…
## $ X2014.09.30 <dbl> 411942, 158222, 178502, 256431, 166171, 174478, 117207, 14…
## $ X2014.10.31 <dbl> 413546, 159109, 179642, 256729, 166157, 174710, 117222, 15…
## $ X2014.11.30 <dbl> 416257, 160116, 180835, 257032, 166210, 175119, 117461, 15…
## $ X2014.12.31 <dbl> 418698, 161219, 182104, 257456, 166390, 175549, 117791, 15…
## $ X2015.01.31 <dbl> 421284, 162373, 183201, 257830, 166551, 175948, 118313, 15…
## $ X2015.02.28 <dbl> 424047, 163540, 184381, 258123, 166718, 176436, 118878, 15…
## $ X2015.03.31 <dbl> 427057, 164731, 185318, 258301, 167022, 177153, 119434, 15…
## $ X2015.04.30 <dbl> 429170, 165902, 186199, 258778, 167529, 178089, 119959, 15…
## $ X2015.05.31 <dbl> 431433, 166994, 187127, 259403, 168121, 178856, 120532, 15…
## $ X2015.06.30 <dbl> 433060, 168090, 188161, 260218, 168454, 178977, 121063, 15…
## $ X2015.07.31 <dbl> 433725, 169090, 189144, 260980, 168581, 179096, 121545, 15…
## $ X2015.08.31 <dbl> 435015, 169909, 190301, 261646, 168696, 179013, 121913, 15…
## $ X2015.09.30 <dbl> 437202, 170631, 191587, 262237, 168905, 179084, 122226, 15…
## $ X2015.10.31 <dbl> 440934, 171521, 192975, 262657, 169183, 179100, 122497, 15…
## $ X2015.11.30 <dbl> 443916, 172476, 194264, 263094, 169437, 179275, 122816, 16…
## $ X2015.12.31 <dbl> 448004, 173548, 195767, 263391, 169702, 179867, 123125, 16…
## $ X2016.01.31 <dbl> 451830, 174619, 197263, 263687, 170106, 180680, 123549, 16…
## $ X2016.02.29 <dbl> 455184, 175657, 198805, 264226, 170682, 182138, 123977, 16…
## $ X2016.03.31 <dbl> 456935, 176568, 199943, 264996, 171085, 183298, 124573, 16…
## $ X2016.04.30 <dbl> 459213, 177473, 201174, 265966, 171424, 184183, 125173, 16…
## $ X2016.05.31 <dbl> 461417, 178528, 202293, 266932, 171604, 184617, 125716, 16…
## $ X2016.06.30 <dbl> 463755, 179554, 203611, 267791, 172057, 184963, 126166, 16…
## $ X2016.07.31 <dbl> 464367, 180549, 204917, 268573, 172460, 185436, 126586, 16…
## $ X2016.08.31 <dbl> 465952, 181315, 206218, 269380, 172946, 185778, 127035, 16…
## $ X2016.09.30 <dbl> 467769, 182139, 207386, 270146, 173284, 186153, 127509, 16…
## $ X2016.10.31 <dbl> 471333, 183004, 208543, 271027, 173683, 186628, 128000, 16…
## $ X2016.11.30 <dbl> 474161, 184214, 209895, 271983, 174145, 187472, 128587, 16…
## $ X2016.12.31 <dbl> 476690, 185579, 211332, 273123, 174887, 188622, 129260, 17…
## $ X2017.01.31 <dbl> 479565, 186786, 212657, 274208, 175399, 189989, 129998, 17…
## $ X2017.02.28 <dbl> 482525, 187946, 213801, 274968, 175862, 191224, 130744, 17…
## $ X2017.03.31 <dbl> 485476, 188842, 214862, 275701, 176331, 192504, 131512, 17…
## $ X2017.04.30 <dbl> 487933, 189810, 216231, 276655, 177125, 193470, 132342, 17…
## $ X2017.05.31 <dbl> 490856, 190607, 217567, 278031, 177968, 194224, 133129, 17…
## $ X2017.06.30 <dbl> 494155, 191469, 218929, 279665, 178650, 194472, 133824, 17…
## $ X2017.07.31 <dbl> 495683, 192150, 220066, 281143, 179249, 194716, 134381, 17…
## $ X2017.08.31 <dbl> 497374, 192849, 221368, 282479, 179863, 195000, 135028, 17…
## $ X2017.09.30 <dbl> 499704, 193615, 222629, 283680, 180577, 195592, 135731, 17…
## $ X2017.10.31 <dbl> 503836, 194599, 223951, 284933, 181387, 196253, 136544, 18…
## $ X2017.11.30 <dbl> 508085, 195643, 225206, 286257, 182132, 197073, 137269, 18…
## $ X2017.12.31 <dbl> 512309, 196669, 226614, 287434, 182946, 198083, 137976, 18…
## $ X2018.01.31 <dbl> 516966, 197668, 227673, 288779, 183856, 199237, 138497, 18…
## $ X2018.02.28 <dbl> 522015, 198672, 228826, 290180, 184797, 200275, 139205, 18…
## $ X2018.03.31 <dbl> 526349, 199711, 230087, 291730, 186081, 200934, 140047, 18…
## $ X2018.04.30 <dbl> 529734, 200784, 231601, 293149, 187192, 201321, 141136, 18…
## $ X2018.05.31 <dbl> 532445, 201784, 233108, 294442, 188275, 201667, 142153, 19…
## $ X2018.06.30 <dbl> 535166, 202772, 234423, 295673, 188854, 201946, 142969, 19…
## $ X2018.07.31 <dbl> 536884, 203620, 235658, 296976, 189406, 202250, 143712, 19…
## $ X2018.08.31 <dbl> 538273, 204589, 236780, 298371, 189928, 202438, 144243, 19…
## $ X2018.09.30 <dbl> 539162, 205437, 237889, 299844, 190562, 202911, 144758, 19…
## $ X2018.10.31 <dbl> 540899, 206373, 239006, 301169, 191198, 203320, 145270, 19…
## $ X2018.11.30 <dbl> 541954, 207201, 240276, 302428, 191971, 203902, 145941, 19…
## $ X2018.12.31 <dbl> 541811, 208184, 241623, 303552, 192768, 204382, 146680, 19…
## $ X2019.01.31 <dbl> 541188, 209334, 242873, 304688, 193626, 204779, 147558, 19…
## $ X2019.02.28 <dbl> 540315, 210413, 243687, 305771, 194381, 205382, 148431, 20…
## $ X2019.03.31 <dbl> 540144, 211145, 244248, 306996, 194934, 205815, 149324, 20…
## $ X2019.04.30 <dbl> 540867, 211585, 244742, 307966, 195361, 206281, 149950, 20…
## $ X2019.05.31 <dbl> 541905, 212004, 245166, 308807, 195900, 206393, 150640, 20…
## $ X2019.06.30 <dbl> 542789, 212483, 245660, 309296, 196502, 206419, 151246, 20…
## $ X2019.07.31 <dbl> 542752, 212991, 246195, 309607, 197149, 206441, 151905, 20…
## $ X2019.08.31 <dbl> 543167, 213624, 246573, 309782, 197737, 206472, 152409, 20…
## $ X2019.09.30 <dbl> 544231, 214399, 246875, 309873, 198286, 206487, 153014, 20…
## $ X2019.10.31 <dbl> 546162, 214848, 247319, 310027, 198913, 206429, 153592, 20…
## $ X2019.11.30 <dbl> 547591, 215234, 248029, 310436, 199555, 206433, 154048, 20…
## $ X2019.12.31 <dbl> 549605, 215443, 248747, 311140, 200279, 206638, 154561, 20…
## $ X2020.01.31 <dbl> 552164, 216044, 249783, 312089, 200984, 207044, 155295, 20…
## $ X2020.02.29 <dbl> 556088, 216585, 251093, 313174, 201639, 207487, 156152, 21…
## $ X2020.03.31 <dbl> 560612, 217364, 252651, 314156, 202412, 207902, 157054, 21…
## $ X2020.04.30 <dbl> 563997, 218041, 254097, 315084, 203184, 208066, 157914, 21…
## $ X2020.05.31 <dbl> 563896, 218617, 255283, 315544, 203693, 207972, 158832, 21…
## $ X2020.06.30 <dbl> 562830, 219184, 256184, 316079, 204349, 207970, 159725, 21…
## $ X2020.07.31 <dbl> 565069, 219991, 256968, 317182, 205482, 208708, 160706, 21…
## $ X2020.08.31 <dbl> 571713, 221383, 258515, 319023, NA, 210007, 162051, 216806…
## $ X2020.09.30 <dbl> 580209, 223303, 260617, 321702, 209589, 212198, 163563, 21…
## $ X2020.10.31 <dbl> 588419, 225557, 263149, 325073, NA, 214376, 165321, 220879…
## $ X2020.11.30 <dbl> 595980, 227822, 265746, 328783, 214821, 216628, 167383, 22…
## $ X2020.12.31 <dbl> 602825, 229996, 268449, 332430, 217570, 218495, 169344, 22…
## $ X2021.01.31 <dbl> 608991, 232265, 271597, 335746, 220349, 219969, 171623, 22…
## $ X2021.02.28 <dbl> 616012, 235043, 274730, 339063, 222808, 222026, 173711, 23…
## $ X2021.03.31 <dbl> 623779, 237996, 278433, 342309, 225323, 223924, 176194, 23…
## $ X2021.04.30 <dbl> 633956, 241646, 282437, 345529, 228015, 226447, 178546, 23…
## $ X2021.05.31 <dbl> 648122, 245980, 287865, 349618, 231597, 228768, 181084, 24…
## $ X2021.06.30 <dbl> 665178, 251417, 294710, 354568, 235469, 231931, 183719, 24…
## $ X2021.07.31 <dbl> 683587, 257072, 302651, 360213, 238942, 235397, 186336, 25…
## $ X2021.08.31 <dbl> 697149, 262742, 310696, 365058, 241732, 238801, 188660, 25…
## $ X2021.09.30 <dbl> 706869, 267595, 318591, 369103, 243898, 241341, 190723, 26…
## $ X2021.10.31 <dbl> 711808, 271867, 325971, 372165, 245890, 243579, 192448, 27…
## $ X2021.11.30 <dbl> 717220, 275502, 333282, 375055, 247762, 245842, 194167, 27…
## $ X2021.12.31 <dbl> 723926, 279765, 340088, 377914, 250004, 248349, 196257, 28…
## $ X2022.01.31 <dbl> 733988, 284428, 347640, 381216, 252544, 250702, 198574, 28…
## $ X2022.02.28 <dbl> 745945, 289868, 355566, 385005, 255326, 253070, 201174, 29…
## $ X2022.03.31 <dbl> 760729, 295282, 365267, 389766, 257958, 255747, 203466, 29…
## $ X2022.04.30 <dbl> 775012, 301326, 375948, 394721, 260578, 258600, 206047, 30…
## $ X2022.05.31 <dbl> 785789, 306785, 386674, 399829, 262976, 261789, 208315, 31…
## $ X2022.06.30 <dbl> 788081, 311477, 395718, 403950, 265107, 264469, 210562, 31…
## $ X2022.07.31 <dbl> 784926, 313941, 401934, 407002, 266601, 266341, 212040, 31…
## $ X2022.08.31 <dbl> 775876, 314837, 405489, 408833, 267594, 267079, 213335, 32…

Zillow homw value dataset has 51 rows and 277 columns.

Convert data from wide to long

zhv_long <- raw_dt_zhv %>% 
  select(-RegionID, -SizeRank,  -RegionType,    -StateName) %>% 
  gather('period','home_val',2:ncol(.))

glimpse(zhv_long)
## Rows: 13,872
## Columns: 3
## $ RegionName <chr> "California", "Texas", "Florida", "New York", "Pennsylvania…
## $ period     <chr> "X2000.01.31", "X2000.01.31", "X2000.01.31", "X2000.01.31",…
## $ home_val   <dbl> 199037, 114095, 115670, 150013, 106281, 147480, 110729, 131…

Create a date variable to plot time series trend

zhv_long_dt <- zhv_long %>% 
  separate(period, c("yr", "mo" , "day"),"\\.")
  
zhv_long_dt$yr <-  as.numeric(gsub('X', '', zhv_long_dt$yr))
zhv_long_dt$mo <-  as.numeric(zhv_long_dt$mo)
zhv_long_dt$date <- as.Date(with(zhv_long_dt, paste(yr, mo, 1,sep="-")), "%Y-%m-%d")

# keep only numeric values in yr column

head(zhv_long_dt)
##     RegionName   yr mo day home_val       date
## 1   California 2000  1  31   199037 2000-01-01
## 2        Texas 2000  1  31   114095 2000-01-01
## 3      Florida 2000  1  31   115670 2000-01-01
## 4     New York 2000  1  31   150013 2000-01-01
## 5 Pennsylvania 2000  1  31   106281 2000-01-01
## 6     Illinois 2000  1  31   147480 2000-01-01

Plot trend chart to answer first research question

zhv_long_ts <- zhv_long_dt %>% filter (RegionName %in% c('Virginia', 'Maryland', 'North Carolina'))

ggplot(data = zhv_long_ts, aes(x=date, y=home_val)) + geom_line(aes(colour=RegionName)) + theme_minimal() 

It is an interesting trend chart, as we can see that average home prices in North Carolina and Virginia were nearly the same in 2000. But, by 2022, Virginia home prices are much higher than North Carolina home prices. Maryland has the highest average home prices, among the 3 states.

Prepare data for answering second research question

zhv_long_reg  <- zhv_long_dt %>% 
  filter (yr==max(zhv_long_dt$yr)) %>% 
  group_by (RegionName) %>% 
  summarise(mean_hv = mean(home_val)) %>% 
  arrange(mean_hv)

glimpse(zhv_long_reg)
## Rows: 51
## Columns: 2
## $ RegionName <chr> "West Virginia", "Mississippi", "Arkansas", "Oklahoma", "Io…
## $ mean_hv    <dbl> 137052.5, 163117.6, 174626.4, 177118.8, 188413.0, 193936.8,…

Plot the chart to answer second research question

ggplot(zhv_long_reg, aes(x = reorder(RegionName,-mean_hv), y = mean_hv )) + 
geom_bar(position = "dodge",stat = "identity", fill="steelblue") +
xlab("State") +
ylab("Mean Home Value") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))

In 2022, Hawaii has the highest home prices. It is followed by California and DC.
West Virginia has the lowest home prices followed by Mississippi.

Gun Violence Data - Keeno Glanville

Research questions

  1. What is the yearly trend in gun violence incidents?
  2. What is the trend in share of person injured vs killed in gun violence incidents?
  3. Which state has the highest gun violence incidents?

Read the data file

#provide the github data path
dt_gv <- "https://raw.githubusercontent.com/Naik-Khyati/data_prep_proj2/main/data/gun-violence-data_01-2013_03-2018.csv"

raw_dt_gv <- read.csv(dt_gv, sep=",", stringsAsFactors=FALSE)

glimpse(raw_dt_gv)
## Rows: 239,677
## Columns: 23
## $ incident_id                 <int> 461105, 460726, 478855, 478925, 478959, 47…
## $ date                        <chr> "1/1/2013", "1/1/2013", "1/1/2013", "1/5/2…
## $ state                       <chr> "Pennsylvania", "California", "Ohio", "Col…
## $ city_or_county              <chr> "Mckeesport", "Hawthorne", "Lorain", "Auro…
## $ address                     <chr> "1506 Versailles Avenue and Coursin Street…
## $ n_killed                    <int> 0, 1, 1, 4, 2, 4, 5, 0, 0, 1, 1, 1, 2, 0, …
## $ n_injured                   <int> 4, 3, 3, 0, 2, 0, 0, 5, 4, 6, 3, 3, 3, 5, …
## $ incident_url_fields_missing <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …
## $ congressional_district      <int> 14, 43, 9, 6, 6, 1, 1, 2, 9, 7, 3, 1, 3, 1…
## $ gun_stolen                  <chr> "", "", "0::Unknown||1::Unknown", "", "0::…
## $ gun_type                    <chr> "", "", "0::Unknown||1::Unknown", "", "0::…
## $ incident_characteristics    <chr> "Shot - Wounded/Injured||Mass Shooting (4+…
## $ n_guns_involved             <int> NA, NA, 2, NA, 2, NA, 2, NA, NA, NA, 1, 1,…
## $ notes                       <chr> "Julian Sims under investigation: Four Sho…
## $ participant_age             <chr> "0::20", "0::20", "0::25||1::31||2::33||3:…
## $ participant_age_group       <chr> "0::Adult 18+||1::Adult 18+||2::Adult 18+|…
## $ participant_gender          <chr> "0::Male||1::Male||3::Male||4::Female", "0…
## $ participant_name            <chr> "0::Julian Sims", "0::Bernard Gillis", "0:…
## $ participant_relationship    <chr> "", "", "", "", "3::Family", "", "5::Famil…
## $ participant_status          <chr> "0::Arrested||1::Injured||2::Injured||3::I…
## $ participant_type            <chr> "0::Victim||1::Victim||2::Victim||3::Victi…
## $ state_house_district        <int> NA, 62, 56, 40, 62, 72, 10, 93, 11, NA, 28…
## $ state_senate_district       <int> NA, 35, 13, 28, 27, 11, 14, 5, 7, 44, 10, …

This data set has 239,677 rows and 23 columns.

Convert the character type date into date format

raw_dt_gv <- raw_dt_gv %>% mutate(date_fmt = as.Date(date, format= "%m/%d/%Y"))

Extract year from date

raw_dt_gv$year <- as.numeric(format(raw_dt_gv$date_fmt,'%Y'))

Prepare dataset to create a plot

yr_inci_cnt <- raw_dt_gv %>% group_by(year) %>% summarise(incid_count=n())

Plot chart to answer first research question

ggplot(yr_inci_cnt, aes(x = year, y = incid_count )) + 
geom_bar(position = "dodge",stat = "identity", fill="darkgreen") +
xlab("year") +
ylab("# Incident") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))

There were fewest incidents in 2013. There was a spike in number of incidents in 2014 where incidents rose to above 50,000. Since, 2014, incidents have been on an increasing trend. However, in 2018, we see a drop in number of incidents.

Convert data from wide to long to plot year wise number of person impacted

gv_long_dt <- raw_dt_gv %>% select (year,n_injured,n_killed) %>% 
  group_by(year) %>%  summarise_each(list(sum)) %>%
  gather('impact_type','nbr_impact',2:3)
## Warning: `summarise_each_()` was deprecated in dplyr 0.7.0.
## Please use `across()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

Add impact type share by year to have labels on plot

gv_long_dt <- gv_long_dt %>% group_by (year) %>% 
  mutate(impact_share = nbr_impact / sum(nbr_impact))

Plot to answer second research question

ggplot(gv_long_dt, aes(x = year, y = nbr_impact, fill=impact_type )) +
geom_bar(stat = "identity") +
geom_text(aes(label = paste0(round(impact_share * 100), '%')),
            position = position_stack(vjust = 0.5)) +
xlab("year") +
ylab("# of impacted person") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))

Find unique values in state column

unique(raw_dt_gv$state)
##  [1] "Pennsylvania"         "California"           "Ohio"                
##  [4] "Colorado"             "North Carolina"       "Oklahoma"            
##  [7] "New Mexico"           "Louisiana"            "Maryland"            
## [10] "Tennessee"            "Missouri"             "District of Columbia"
## [13] "Illinois"             "Delaware"             "Utah"                
## [16] "Michigan"             "Georgia"              "Indiana"             
## [19] "Mississippi"          "New York"             "Florida"             
## [22] "Washington"           "South Carolina"       "Arizona"             
## [25] "Kentucky"             "New Jersey"           "Virginia"            
## [28] "Wisconsin"            "Rhode Island"         "Texas"               
## [31] "Alabama"              "Kansas"               "Connecticut"         
## [34] "West Virginia"        "Minnesota"            "Nevada"              
## [37] "Nebraska"             "Massachusetts"        "Hawaii"              
## [40] "New Hampshire"        "Iowa"                 "Alaska"              
## [43] "Arkansas"             "Idaho"                "Oregon"              
## [46] "Wyoming"              "Maine"                "North Dakota"        
## [49] "Montana"              "Vermont"              "South Dakota"

Prepare data for incidents by state

state_gv <- raw_dt_gv %>% group_by(state) %>% summarise(incid_count = n())

Plot state wise number of persons impacted by incidents to answer third research question

ggplot(state_gv, aes(x = reorder(state,-incid_count), y = incid_count )) + 
geom_bar(position = "dodge",stat = "identity", fill="steelblue") +
xlab("State") +
ylab("Gun Violence Incident Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))

Illinois has the most number of incidents followed by California, Florida and Texas.
Hawaii, Vermont and Wyoming has fewest number of incidents.

Conclusion

We prepared 3 different wide datasets to answer corresponding research questions. Gun Violence dataset had nearly 240,000 rows, whereas Zillow home value data has 277 columns. Thus, I got the opportunity to work with relatively large dataset, which were wide too.

In this project, I performed various data manipulation tasks such as string split, string substitution ,conversion from one data type to another, creating a date variable from multiple columns, selecting only few columns based on starting characters, converting data from wide to long and vice versa, aggregating and summarizing the data, adding new columns to the data frame, subsetting the data, sorting the data etc. Moreover, I also created stack and line charts to answer to answer research questions.