CUNY SPS Data Science Bridge Program: R

Libraries and Dataset

library(RCurl)

## Loading required package: bitops

arrests <- read.csv('C:\\Users\\aferg\\Documents\\Data_Science\\201907_Bridge_Program\\R\\vincentarelbundock-Rdatasets\\csv\\datasets\\USArrests.csv', header = TRUE, sep = ",")

Question 1

Use the summary function to gain an overview of the data set.Then display the mean and median for at least two attributes.

summary(arrests)

##           X          Murder          Assault         UrbanPop    
##  Alabama   : 1   Min.   : 0.800   Min.   : 45.0   Min.   :32.00  
##  Alaska    : 1   1st Qu.: 4.075   1st Qu.:109.0   1st Qu.:54.50  
##  Arizona   : 1   Median : 7.250   Median :159.0   Median :66.00  
##  Arkansas  : 1   Mean   : 7.788   Mean   :170.8   Mean   :65.54  
##  California: 1   3rd Qu.:11.250   3rd Qu.:249.0   3rd Qu.:77.75  
##  Colorado  : 1   Max.   :17.400   Max.   :337.0   Max.   :91.00  
##  (Other)   :44                                                   
##       Rape      
##  Min.   : 7.30  
##  1st Qu.:15.07  
##  Median :20.10  
##  Mean   :21.23  
##  3rd Qu.:26.18  
##  Max.   :46.00  
##

# cat("Mean Number of Murders: ", mean(arrests$Murder))
# cat("Mean Urban Population: ", mean(arrests$UrbanPop))

meansarrests <- sapply(arrests[,2:3],mean)
mediansarrests <- sapply(arrests[,2:3], median)

meansarrests

##  Murder Assault 
##   7.788 170.760

mediansarrests

##  Murder Assault 
##    7.25  159.00

Question 2

Create a new data frame with a subset of the columns and rows. Make sure to rename it.

arrests2 <- subset(arrests, arrests$UrbanPop > 70)

nrow(arrests)

## [1] 50

nrow(arrests2)

## [1] 19

Question 3

Create new column names for the new data frame.

arrests2Names <- c("State","Murder2"," Assault2","UrbanPop2", "Rape2")
colnames(arrests2) <- arrests2Names

Question 4

Use the summary function to create an overview of your new data frame. Then print the mean and median for the same two attributes. Please compare.

summary(arrests2)

##          State       Murder2          Assault2       UrbanPop2    
##  Arizona    : 1   Min.   : 3.200   Min.   : 46.0   Min.   :72.00  
##  California : 1   1st Qu.: 4.850   1st Qu.:132.5   1st Qu.:76.00  
##  Colorado   : 1   Median : 7.400   Median :201.0   Median :80.00  
##  Connecticut: 1   Mean   : 7.863   Mean   :194.1   Mean   :80.32  
##  Delaware   : 1   3rd Qu.:10.750   3rd Qu.:253.0   3rd Qu.:84.00  
##  Florida    : 1   Max.   :15.400   Max.   :335.0   Max.   :91.00  
##  (Other)    :13                                                   
##      Rape2      
##  Min.   : 8.30  
##  1st Qu.:17.55  
##  Median :24.00  
##  Mean   :24.99  
##  3rd Qu.:31.45  
##  Max.   :46.00  
##

meansarrests2 <-  sapply(arrests2[,2:3], mean) 
mediansarrests2 <- sapply(arrests2[,2:3], median)


meansarrests

##  Murder Assault 
##   7.788 170.760

meansarrests2

##    Murder2   Assault2 
##   7.863158 194.052632

mediansarrests

##  Murder Assault 
##    7.25  159.00

mediansarrests2

##   Murder2  Assault2 
##       7.4     201.0

Question 5

For at least 3 values in a column, please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 words would show as “excellent”.

arrests2$State <- gsub("Arizona", "OTHER", arrests2$State)
arrests2$State <- gsub("California", "OTHER", arrests2$State)
arrests2$State <- gsub("Ohio", "OTHER", arrests2$State)

Question 6

Display enough rows to see examples of steps 1 - 5 above.

head(arrests2)

##         State Murder2  Assault2 UrbanPop2 Rape2
## 3       OTHER     8.1       294        80  31.0
## 5       OTHER     9.0       276        91  40.6
## 6    Colorado     7.9       204        78  38.7
## 7 Connecticut     3.3       110        77  11.1
## 8    Delaware     5.9       238        72  15.8
## 9     Florida    15.4       335        80  31.9

Question 7

BONUS - place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

x <- getURL("https://raw.githubusercontent.com/amberferger/201907_RBridge_Week2/master/USArrests.csv")

df <-read.csv(text = x)

CUNY SPS Data Science Bridge Program: R - Assignment 2

Amber Ferger