CUNY MSDS 2022 Bridge: R HW 2

R Markdown

Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes

##Question #1
install.packages("tidyverse", repos = "http://cran.us.r-project.org")

## Installing package into 'C:/Users/maric/AppData/Local/R/win-library/4.2'
## (as 'lib' is unspecified)

## package 'tidyverse' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\maric\AppData\Local\Temp\RtmpSc1GEE\downloaded_packages

library(tidyverse)

## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──

## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(readr)

#Setting directory where to look for dataset
getwd()

## [1] "C:/Users/maric/Documents/CUNY Data Science/Bridge/RHW2"

setwd("C:/Users/maric/Documents/CUNY Data Science/Bridge/RHW2")
dataset<-read.csv(file="Fatalities.csv", header=TRUE,sep=",")
View(dataset)

#Summary
summary(dataset)

##        X             state                year         spirits     
##  Min.   :  1.00   Length:336         Min.   :1982   Min.   :0.790  
##  1st Qu.: 84.75   Class :character   1st Qu.:1983   1st Qu.:1.300  
##  Median :168.50   Mode  :character   Median :1985   Median :1.670  
##  Mean   :168.50                      Mean   :1985   Mean   :1.754  
##  3rd Qu.:252.25                      3rd Qu.:1987   3rd Qu.:2.013  
##  Max.   :336.00                      Max.   :1988   Max.   :4.900  
##      unemp            income          emppop         beertax       
##  Min.   : 2.400   Min.   : 9514   Min.   :42.99   Min.   :0.04331  
##  1st Qu.: 5.475   1st Qu.:12086   1st Qu.:57.69   1st Qu.:0.20885  
##  Median : 7.000   Median :13763   Median :61.36   Median :0.35259  
##  Mean   : 7.347   Mean   :13880   Mean   :60.81   Mean   :0.51326  
##  3rd Qu.: 8.900   3rd Qu.:15175   3rd Qu.:64.41   3rd Qu.:0.65157  
##  Max.   :18.000   Max.   :22193   Max.   :71.27   Max.   :2.72076  
##     baptist            mormon           drinkage          dry          
##  Min.   : 0.0000   Min.   : 0.1000   Min.   :18.00   Min.   : 0.00000  
##  1st Qu.: 0.6268   1st Qu.: 0.2722   1st Qu.:20.00   1st Qu.: 0.00000  
##  Median : 1.7492   Median : 0.3931   Median :21.00   Median : 0.08681  
##  Mean   : 7.1569   Mean   : 2.8019   Mean   :20.46   Mean   : 4.26707  
##  3rd Qu.:13.1271   3rd Qu.: 0.6293   3rd Qu.:21.00   3rd Qu.: 2.42481  
##  Max.   :30.3557   Max.   :65.9165   Max.   :21.00   Max.   :45.79210  
##   youngdrivers         miles          breath              jail          
##  Min.   :0.07314   Min.   : 4576   Length:336         Length:336        
##  1st Qu.:0.17037   1st Qu.: 7183   Class :character   Class :character  
##  Median :0.18539   Median : 7796   Mode  :character   Mode  :character  
##  Mean   :0.18593   Mean   : 7891                                        
##  3rd Qu.:0.20219   3rd Qu.: 8504                                        
##  Max.   :0.28163   Max.   :26148                                        
##    service              fatal            nfatal            sfatal     
##  Length:336         Min.   :  79.0   Min.   :  13.00   Min.   :  8.0  
##  Class :character   1st Qu.: 293.8   1st Qu.:  53.75   1st Qu.: 35.0  
##  Mode  :character   Median : 701.0   Median : 135.00   Median : 81.0  
##                     Mean   : 928.7   Mean   : 182.58   Mean   :109.9  
##                     3rd Qu.:1063.5   3rd Qu.: 212.00   3rd Qu.:131.0  
##                     Max.   :5504.0   Max.   :1049.00   Max.   :603.0  
##    fatal1517        nfatal1517      fatal1820       nfatal1820    
##  Min.   :  3.00   Min.   : 0.00   Min.   :  7.0   Min.   :  0.00  
##  1st Qu.: 25.75   1st Qu.: 4.00   1st Qu.: 38.0   1st Qu.: 11.00  
##  Median : 49.00   Median :10.00   Median : 82.0   Median : 24.00  
##  Mean   : 62.61   Mean   :12.26   Mean   :106.7   Mean   : 33.53  
##  3rd Qu.: 77.00   3rd Qu.:15.25   3rd Qu.:130.2   3rd Qu.: 44.00  
##  Max.   :318.00   Max.   :76.00   Max.   :601.0   Max.   :196.00  
##    fatal2124       nfatal2124         afatal            pop          
##  Min.   : 12.0   Min.   :  1.00   Min.   :  24.6   Min.   :  479000  
##  1st Qu.: 42.0   1st Qu.: 13.00   1st Qu.:  90.5   1st Qu.: 1545251  
##  Median : 97.5   Median : 30.00   Median : 211.6   Median : 3310503  
##  Mean   :126.9   Mean   : 41.38   Mean   : 293.3   Mean   : 4930272  
##  3rd Qu.:150.5   3rd Qu.: 49.00   3rd Qu.: 364.0   3rd Qu.: 5751735  
##  Max.   :770.0   Max.   :249.00   Max.   :2094.9   Max.   :28314028  
##     pop1517           pop1820           pop2124           milestot     
##  Min.   :  21000   Min.   :  21000   Min.   :  30000   Min.   :  3993  
##  1st Qu.:  71750   1st Qu.:  76962   1st Qu.: 103500   1st Qu.: 11692  
##  Median : 163000   Median : 170982   Median : 241000   Median : 28484  
##  Mean   : 230816   Mean   : 249090   Mean   : 336390   Mean   : 37101  
##  3rd Qu.: 270500   3rd Qu.: 308311   3rd Qu.: 413000   3rd Qu.: 44140  
##  Max.   :1172000   Max.   :1321004   Max.   :1892998   Max.   :241575  
##     unempus         emppopus          gsp           
##  Min.   :5.500   Min.   :57.80   Min.   :-0.123641  
##  1st Qu.:6.200   1st Qu.:57.90   1st Qu.: 0.001182  
##  Median :7.200   Median :60.10   Median : 0.032413  
##  Mean   :7.529   Mean   :59.97   Mean   : 0.025313  
##  3rd Qu.:9.600   3rd Qu.:61.50   3rd Qu.: 0.056501  
##  Max.   :9.700   Max.   :62.30   Max.   : 0.142361

#mean for fatalities(column 18) and income(column6) for all years in dataset (1982-1988)
mean(dataset[,18])  #mean fatalities in years 1982-1988 is 928.66

## [1] 928.6637

mean(dataset[,6])   #mean income in years 1982-1988 is $13880.18

## [1] 13880.18

#median for fatalities and income for all years in dataset (1982-1988)
median(dataset[,18]) #median fatalities in years 1982-1988 is 701

## [1] 701

median(dataset[,6])  #median income in years 1982-1988 is $13,763.13

## [1] 13763.13

Create a new data frame with a subset of the columns and rows. Make sure to rename it

#This is a subset containing data from one year (1984) and only four columns: 'state','year','fatal' and 'income'
#This subset provides data on fatalities in US states in the year 1984
class(dataset)

## [1] "data.frame"

dataset<-read.csv(file="fatalities.csv", header=TRUE,sep=",")
Fatalities_in_1984_by_State<- data.frame(subset(dataset,dataset$year=='1984',select=c(state,year,fatal,income)))#create dataframe Fatalities_in_1984_by_State
View(Fatalities_in_1984_by_State)

Create new column names for the new data frame

#column name changes are as follows: 'state' changed to 'US_State, 'year' changed to 'Calendar_Yea','fatal' changed to 'Number_of_Fatalities' and 'income' changed to 'Annual_Income'
colnames(Fatalities_in_1984_by_State)<-c("US_State","Calendar_Year","Number_of_Fatalities","Annual_Income")
View(Fatalities_in_1984_by_State)

Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.

#Question 4
#Fatalities_in_1984_by_State<-subset(dataset, dataset$year=='1984',select=c(state,year,fatal))
summary(Fatalities_in_1984_by_State)

##    US_State         Calendar_Year  Number_of_Fatalities Annual_Income  
##  Length:48          Min.   :1984   Min.   :  79.0       Min.   : 9792  
##  Class :character   1st Qu.:1984   1st Qu.: 307.5       1st Qu.:11990  
##  Mode  :character   Median :1984   Median : 672.5       Median :13498  
##                     Mean   :1984   Mean   : 915.0       Mean   :13583  
##                     3rd Qu.:1984   3rd Qu.: 978.5       3rd Qu.:14795  
##                     Max.   :1984   Max.   :5020.0       Max.   :18760

#print(Fatalities_in_1984_by_State)

#compare means between data set and subset
#mean income in 1984 is less than mean income for years 1982-1988
mean(dataset[,6]) #mean of income for 48 states in years 1982-1988

## [1] 13880.18

mean(Fatalities_in_1984_by_State[,4]) #mean of income for 48 states in 1984

## [1] 13582.51

#mean fatalities in 1984 is less than mean fatalities for years 1982-1988
mean(dataset[,18]) #mean of fatalities for 48 states in years 1982-1988

## [1] 928.6637

mean(Fatalities_in_1984_by_State[,3]) #mean of fatalities for 48 states in 1984

## [1] 915.0208

#compare medians between data set and subset
#median income in 1984 is less than median income for years 1982-1988
median(dataset[,6]) #median of income for 48 states in years 1982-1988

## [1] 13763.13

median(Fatalities_in_1984_by_State[,4]) #median of income for 48 states in 1984

## [1] 13498.35

#median fatalities in 1984 is less than median fatalities for years 1982-1988
median(dataset[,18]) #median of fatalities for 48 states in years 1982-1988

## [1] 701

median(Fatalities_in_1984_by_State[,3]) #median of fatalities for 48 states in 1984

## [1] 672.5

For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.

#Question 5
#I renamed 3 states acronyms to reflect the actual state name
Fatalities_in_1984_by_State[Fatalities_in_1984_by_State=='al'] <- 'alabama'
Fatalities_in_1984_by_State[Fatalities_in_1984_by_State=='az'] <- 'arizona'
Fatalities_in_1984_by_State[Fatalities_in_1984_by_State=='ar'] <- 'arkansas'
View(Fatalities_in_1984_by_State)

Display enough rows to see examples of all of steps 1-5 above

#Question 6
head(Fatalities_in_1984_by_State,3)

##    US_State Calendar_Year Number_of_Fatalities Annual_Income
## 3   alabama          1984                  932      11108.79
## 10  arizona          1984                  869      13265.93
## 17 arkansas          1984                  525      10916.48

BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

#Question 7
install.packages("tidyverse", repos = "http://cran.us.r-project.org")

## Warning: package 'tidyverse' is in use and will not be installed

library(tidyverse)
library(readr)
install.packages("RCurl", repos = "http://cran.us.r-project.org")

## Installing package into 'C:/Users/maric/AppData/Local/R/win-library/4.2'
## (as 'lib' is unspecified)

## package 'RCurl' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\maric\AppData\Local\Temp\RtmpSc1GEE\downloaded_packages

library(RCurl)

## 
## Attaching package: 'RCurl'

## The following object is masked from 'package:tidyr':
## 
##     complete

temp<-getURL("https://raw.githubusercontent.com/goygoyummm/2022_CUNY_DS_Bridge_R/main/Fatalities.csv")
y<- read.csv(text=temp)
View(temp)

CUNY MSDS 2022 Bridge: R HW 2

Gregg Maloy

2022-07-19

R Markdown