BONUS - place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
## -- Attaching packages -------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts ----------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
df <- read_csv('https://raw.githubusercontent.com/mandiemannz/MSDS_Bridge-2018/master/terrorism.csv')
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## .default = col_double(),
## X1 = col_integer(),
## year = col_integer(),
## methodology = col_character(),
## method = col_character(),
## incidents = col_integer(),
## incidents.us = col_integer(),
## suicide = col_integer(),
## suicide.us = col_integer(),
## nkill = col_integer(),
## nkill.us = col_integer(),
## nwound = col_integer(),
## nwound.us = col_integer()
## )
## See spec(...) for full column specifications.
## # A tibble: 6 x 26
## X1 year methodology method incidents incidents.us suicide suicide.us
## <int> <int> <chr> <chr> <int> <int> <int> <int>
## 1 1 1970 PGIS p 651 468 0 0
## 2 2 1971 PGIS p 470 247 0 0
## 3 3 1972 PGIS p 494 64 0 0
## 4 4 1973 PGIS p 473 58 0 0
## 5 5 1974 PGIS p 580 94 0 0
## 6 6 1975 PGIS p 740 149 0 0
## # ... with 18 more variables: nkill <int>, nkill.us <int>, nwound <int>,
## # nwound.us <int>, pNA.nkill <dbl>, pNA.nkill.us <dbl>,
## # pNA.nwound <dbl>, pNA.nwound.us <dbl>, worldPopulation <dbl>,
## # USpopulation <dbl>, worldDeathRate <dbl>, USdeathRate <dbl>,
## # worldDeaths <dbl>, USdeaths <dbl>, kill.pmp <dbl>, kill.pmp.us <dbl>,
## # pkill <dbl>, pkill.us <dbl>
## X1 year methodology method
## Min. : 1.00 Min. :1970 Length:46 Length:46
## 1st Qu.:12.25 1st Qu.:1981 Class :character Class :character
## Median :23.50 Median :1992 Mode :character Mode :character
## Mean :23.50 Mean :1992
## 3rd Qu.:34.75 3rd Qu.:2004
## Max. :46.00 Max. :2015
##
## incidents incidents.us suicide suicide.us
## Min. : 470 Min. : 6.00 Min. : 0 Min. :0.0
## 1st Qu.: 1348 1st Qu.: 27.00 1st Qu.: 1 1st Qu.:0.0
## Median : 2865 Median : 40.00 Median : 11 Median :0.0
## Mean : 3516 Mean : 59.84 Mean :106 Mean :0.2
## 3rd Qu.: 4213 3rd Qu.: 64.00 3rd Qu.:119 3rd Qu.:0.0
## Max. :16840 Max. :468.00 Max. :906 Max. :4.0
## NA's :1 NA's :1 NA's :1
## nkill nkill.us nwound nwound.us
## Min. : 171 Min. : 6.00 Min. : 82 Min. : 3.00
## 1st Qu.: 3646 1st Qu.: 16.25 1st Qu.: 3423 1st Qu.: 28.25
## Median : 6715 Median : 25.50 Median : 6620 Median : 47.00
## Mean : 7803 Mean : 113.91 Mean : 9842 Mean : 117.63
## 3rd Qu.: 9226 3rd Qu.: 59.75 3rd Qu.:12630 3rd Qu.: 81.75
## Max. :43550 Max. :2910.00 Max. :43495 Max. :1214.00
##
## pNA.nkill pNA.nkill.us pNA.nwound
## Min. :0.001544 Min. :0.000000 Min. :0.005145
## 1st Qu.:0.012920 1st Qu.:0.009448 1st Qu.:0.036557
## Median :0.030534 Median :0.898408 Median :0.079877
## Mean :0.072987 Mean :0.552738 Mean :0.141271
## 3rd Qu.:0.103594 3rd Qu.:0.958357 3rd Qu.:0.173619
## Max. :0.315914 Max. :0.990177 Max. :0.668016
## NA's :1 NA's :1 NA's :1
## pNA.nwound.us worldPopulation USpopulation worldDeathRate
## Min. :0.00000 Min. :3682488 Min. :209486 Min. : 7.748
## 1st Qu.:0.01759 1st Qu.:4538702 1st Qu.:232313 1st Qu.: 8.322
## Median :0.90068 Median :5527580 Median :259218 Median : 9.090
## Mean :0.55578 Mean :5498924 Mean :262592 Mean : 9.321
## 3rd Qu.:0.95807 3rd Qu.:6420073 3rd Qu.:292900 3rd Qu.:10.087
## Max. :0.98986 Max. :7349472 Max. :321774 Max. :11.966
## NA's :1
## USdeathRate worldDeaths USdeaths kill.pmp
## Min. :7.900 Min. :44064648 Min. :1892879 Min. :0.04604
## 1st Qu.:8.325 1st Qu.:45780625 1st Qu.:1980946 1st Qu.:0.60046
## Median :8.600 Median :50246087 Median :2222083 Median :1.14272
## Mean :8.580 Mean :49978598 Mean :2224823 Mean :1.28422
## 3rd Qu.:8.800 3rd Qu.:53356899 3rd Qu.:2425626 3rd Qu.:1.53252
## Max. :9.500 Max. :57421426 Max. :2665996 Max. :5.99385
##
## kill.pmp.us pkill pkill.us
## Min. : 0.02715 Min. :3.881e-06 Min. :3.142e-06
## 1st Qu.: 0.06778 1st Qu.:6.988e-05 1st Qu.:8.137e-06
## Median : 0.09195 Median :1.353e-04 Median :1.138e-05
## Mean : 0.41003 Mean :1.488e-04 Mean :4.842e-05
## 3rd Qu.: 0.23054 3rd Qu.:1.701e-04 3rd Qu.:2.692e-05
## Max. :10.18208 Max. :7.736e-04 Max. :1.204e-03
##
2. Create a new data frame with a subset of the columns and rows. Make sure to rename it.
## [1] 113.913
## [1] 25.5
3. Create new column names for the new data frame.
data_terror <- df %>%
select(year, nkill.us, nkill, USpopulation, USdeathRate) %>%
rename(Year = year, NumkilledUS = nkill.us, Numkilled = nkill, USpop = USpopulation, USDeath = USdeathRate)
5. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as“excellent”.
i <- 1
for (x in data_terror$Year){
if (x == 1970){
data_terror$Year[i] <- "Nineteen seventy"
}else if (x == 1971){
data_terror$Year[i] <- "Nineteen seventy one"
}else if (x == 1977){
data_terror$Year[i] <- "Nineteen seventy seven"
}
i <- i + 1
}
data_terror
## # A tibble: 46 x 5
## Year NumkilledUS Numkilled USpop USDeath
## <chr> <int> <int> <dbl> <dbl>
## 1 Nineteen seventy 28 171 209486. 9.5
## 2 Nineteen seventy one 15 173 211358. 9.3
## 3 1972 12 566 213220. 9.4
## 4 1973 73 370 215093. 9.3
## 5 1974 17 542 217002. 9.1
## 6 1975 22 617 218964. 8.8
## 7 1976 6 672 220993. 8.8
## 8 Nineteen seventy seven 7 456 223091. 8.6
## 9 1978 11 1459 225239. 8.7
## 10 1979 16 2100 227412. 8.5
## # ... with 36 more rows
6. Display enough rows to see examples of all of steps 1-5 above.