library(RCurl)
## Loading required package: bitops
arrests <- read.csv('C:\\Users\\aferg\\Documents\\Data_Science\\201907_Bridge_Program\\R\\vincentarelbundock-Rdatasets\\csv\\datasets\\USArrests.csv', header = TRUE, sep = ",")
Use the summary function to gain an overview of the data set.Then display the mean and median for at least two attributes.
summary(arrests)
## X Murder Assault UrbanPop
## Alabama : 1 Min. : 0.800 Min. : 45.0 Min. :32.00
## Alaska : 1 1st Qu.: 4.075 1st Qu.:109.0 1st Qu.:54.50
## Arizona : 1 Median : 7.250 Median :159.0 Median :66.00
## Arkansas : 1 Mean : 7.788 Mean :170.8 Mean :65.54
## California: 1 3rd Qu.:11.250 3rd Qu.:249.0 3rd Qu.:77.75
## Colorado : 1 Max. :17.400 Max. :337.0 Max. :91.00
## (Other) :44
## Rape
## Min. : 7.30
## 1st Qu.:15.07
## Median :20.10
## Mean :21.23
## 3rd Qu.:26.18
## Max. :46.00
##
# cat("Mean Number of Murders: ", mean(arrests$Murder))
# cat("Mean Urban Population: ", mean(arrests$UrbanPop))
meansarrests <- sapply(arrests[,2:3],mean)
mediansarrests <- sapply(arrests[,2:3], median)
meansarrests
## Murder Assault
## 7.788 170.760
mediansarrests
## Murder Assault
## 7.25 159.00
Create a new data frame with a subset of the columns and rows. Make sure to rename it.
arrests2 <- subset(arrests, arrests$UrbanPop > 70)
nrow(arrests)
## [1] 50
nrow(arrests2)
## [1] 19
Create new column names for the new data frame.
arrests2Names <- c("State","Murder2"," Assault2","UrbanPop2", "Rape2")
colnames(arrests2) <- arrests2Names
Use the summary function to create an overview of your new data frame. Then print the mean and median for the same two attributes. Please compare.
summary(arrests2)
## State Murder2 Assault2 UrbanPop2
## Arizona : 1 Min. : 3.200 Min. : 46.0 Min. :72.00
## California : 1 1st Qu.: 4.850 1st Qu.:132.5 1st Qu.:76.00
## Colorado : 1 Median : 7.400 Median :201.0 Median :80.00
## Connecticut: 1 Mean : 7.863 Mean :194.1 Mean :80.32
## Delaware : 1 3rd Qu.:10.750 3rd Qu.:253.0 3rd Qu.:84.00
## Florida : 1 Max. :15.400 Max. :335.0 Max. :91.00
## (Other) :13
## Rape2
## Min. : 8.30
## 1st Qu.:17.55
## Median :24.00
## Mean :24.99
## 3rd Qu.:31.45
## Max. :46.00
##
meansarrests2 <- sapply(arrests2[,2:3], mean)
mediansarrests2 <- sapply(arrests2[,2:3], median)
meansarrests
## Murder Assault
## 7.788 170.760
meansarrests2
## Murder2 Assault2
## 7.863158 194.052632
mediansarrests
## Murder Assault
## 7.25 159.00
mediansarrests2
## Murder2 Assault2
## 7.4 201.0
For at least 3 values in a column, please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 words would show as “excellent”.
arrests2$State <- gsub("Arizona", "OTHER", arrests2$State)
arrests2$State <- gsub("California", "OTHER", arrests2$State)
arrests2$State <- gsub("Ohio", "OTHER", arrests2$State)
Display enough rows to see examples of steps 1 - 5 above.
head(arrests2)
## State Murder2 Assault2 UrbanPop2 Rape2
## 3 OTHER 8.1 294 80 31.0
## 5 OTHER 9.0 276 91 40.6
## 6 Colorado 7.9 204 78 38.7
## 7 Connecticut 3.3 110 77 11.1
## 8 Delaware 5.9 238 72 15.8
## 9 Florida 15.4 335 80 31.9
BONUS - place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
x <- getURL("https://raw.githubusercontent.com/amberferger/201907_RBridge_Week2/master/USArrests.csv")
df <-read.csv(text = x)