R Bridge Course Week 2 Assignment One of the challenges in working with data is wrangling. In this assignment we will use R to perform this task. Here is a list of data sets: http://vincentarelbundock.github.io/Rdatasets/ (click on the csv index for a list) Please select one, download it and perform the following tasks:
Titanic <- read.csv(file.choose(),header = TRUE) #select work directory of file to import data
ls(Titanic) #check all column names in downloaded csv file
## [1] "Age" "Name" "PClass" "Sex" "SexCode" "Survived"
## [7] "X"
summary(Titanic)
## X Name PClass
## Min. : 1 Carlsson, Mr Frans Olof : 2 * : 1
## 1st Qu.: 329 Connolly, Miss Kate : 2 1st:322
## Median : 657 Kelly, Mr James : 2 2nd:279
## Mean : 657 Abbing, Mr Anthony : 1 3rd:711
## 3rd Qu.: 985 Abbott, Master Eugene Joseph: 1
## Max. :1313 Abbott, Mr Rossmore Edward : 1
## (Other) :1304
## Age Sex Survived SexCode
## Min. : 0.17 female:462 Min. :0.0000 Min. :0.0000
## 1st Qu.:21.00 male :851 1st Qu.:0.0000 1st Qu.:0.0000
## Median :28.00 Median :0.0000 Median :0.0000
## Mean :30.40 Mean :0.3427 Mean :0.3519
## 3rd Qu.:39.00 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :71.00 Max. :1.0000 Max. :1.0000
## NA's :557
Titanic[1:3, ]
## X Name PClass Age Sex Survived SexCode
## 1 1 Allen, Miss Elisabeth Walton 1st 29 female 1 1
## 2 2 Allison, Miss Helen Loraine 1st 2 female 0 1
## 3 3 Allison, Mr Hudson Joshua Creighton 1st 30 male 0 0
Titanic1<-data.frame(Titanic$Name, Titanic$Age, Titanic$Sex, Titanic$Survived, Titanic$SexCode) #New data.frame dones Not select column x.
print(Titanic1[1:3,]) #for Q6
## Titanic.Name Titanic.Age Titanic.Sex
## 1 Allen, Miss Elisabeth Walton 29 female
## 2 Allison, Miss Helen Loraine 2 female
## 3 Allison, Mr Hudson Joshua Creighton 30 male
## Titanic.Survived Titanic.SexCode
## 1 1 1
## 2 0 1
## 3 0 0
Titanic1$NewCol<- as.numeric(Titanic1$Titanic.Survived)+as.numeric(Titanic1$Titanic.SexCode) #new column named NewCol
ls(Titanic1) #find NewCol
## [1] "NewCol" "Titanic.Age" "Titanic.Name"
## [4] "Titanic.Sex" "Titanic.SexCode" "Titanic.Survived"
print(Titanic1[1:3, ]) #for Q6
## Titanic.Name Titanic.Age Titanic.Sex
## 1 Allen, Miss Elisabeth Walton 29 female
## 2 Allison, Miss Helen Loraine 2 female
## 3 Allison, Mr Hudson Joshua Creighton 30 male
## Titanic.Survived Titanic.SexCode NewCol
## 1 1 1 2
## 2 0 1 1
## 3 0 0 0
summary(Titanic1) # mean and median for attributes are same as in perivous summery of original table.
## Titanic.Name Titanic.Age Titanic.Sex
## Carlsson, Mr Frans Olof : 2 Min. : 0.17 female:462
## Connolly, Miss Kate : 2 1st Qu.:21.00 male :851
## Kelly, Mr James : 2 Median :28.00
## Abbing, Mr Anthony : 1 Mean :30.40
## Abbott, Master Eugene Joseph: 1 3rd Qu.:39.00
## Abbott, Mr Rossmore Edward : 1 Max. :71.00
## (Other) :1304 NA's :557
## Titanic.Survived Titanic.SexCode NewCol
## Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.3427 Mean :0.3519 Mean :0.6946
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :2.0000
##
levels(Titanic1$Titanic.Sex) [levels(Titanic1$Titanic.Sex)=="female"] <- "F" #change "female" to "F" for all matched value in a column grouped by leverls function
print(Titanic1[1:5,]) #for Q6
## Titanic.Name Titanic.Age Titanic.Sex
## 1 Allen, Miss Elisabeth Walton 29.00 F
## 2 Allison, Miss Helen Loraine 2.00 F
## 3 Allison, Mr Hudson Joshua Creighton 30.00 male
## 4 Allison, Mrs Hudson JC (Bessie Waldo Daniels) 25.00 F
## 5 Allison, Master Hudson Trevor 0.92 male
## Titanic.Survived Titanic.SexCode NewCol
## 1 1 1 2
## 2 0 1 1
## 3 0 0 0
## 4 0 1 1
## 5 1 0 1
#check from above
Titanic2 <- read.csv(url("https://raw.githubusercontent.com/czhu505/R_W2_Assignment/master/Titanic.csv"))
#linked to raw data from github R_W2_Assignment
print(Titanic2[1:3,])
## X Name PClass Age Sex Survived SexCode
## 1 1 Allen, Miss Elisabeth Walton 1st 29 female 1 1
## 2 2 Allison, Miss Helen Loraine 1st 2 female 0 1
## 3 3 Allison, Mr Hudson Joshua Creighton 1st 30 male 0 0
Please submit your .rmd file and the .csv file as well as a link to your RPubs.