R Bridge Course Week 2 Assignment One of the challenges in working with data is wrangling. In this assignment we will use R to perform this task. Here is a list of data sets: http://vincentarelbundock.github.io/Rdatasets/ (click on the csv index for a list) Please select one, download it and perform the following tasks:

  1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes.
Titanic <- read.csv(file.choose(),header = TRUE) #select work directory of file to import data
ls(Titanic) #check all column names in downloaded csv file
## [1] "Age"      "Name"     "PClass"   "Sex"      "SexCode"  "Survived"
## [7] "X"
summary(Titanic)
##        X                                  Name      PClass   
##  Min.   :   1   Carlsson, Mr Frans Olof     :   2   *  :  1  
##  1st Qu.: 329   Connolly, Miss Kate         :   2   1st:322  
##  Median : 657   Kelly, Mr James             :   2   2nd:279  
##  Mean   : 657   Abbing, Mr Anthony          :   1   3rd:711  
##  3rd Qu.: 985   Abbott, Master Eugene Joseph:   1            
##  Max.   :1313   Abbott, Mr Rossmore Edward  :   1            
##                 (Other)                     :1304            
##       Age            Sex         Survived         SexCode      
##  Min.   : 0.17   female:462   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:21.00   male  :851   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :28.00                Median :0.0000   Median :0.0000  
##  Mean   :30.40                Mean   :0.3427   Mean   :0.3519  
##  3rd Qu.:39.00                3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :71.00                Max.   :1.0000   Max.   :1.0000  
##  NA's   :557
Titanic[1:3, ]
##   X                                Name PClass Age    Sex Survived SexCode
## 1 1        Allen, Miss Elisabeth Walton    1st  29 female        1       1
## 2 2         Allison, Miss Helen Loraine    1st   2 female        0       1
## 3 3 Allison, Mr Hudson Joshua Creighton    1st  30   male        0       0
  1. Create a new data frame with a subset of the columns and rows. Make sure to rename it.
Titanic1<-data.frame(Titanic$Name, Titanic$Age, Titanic$Sex, Titanic$Survived, Titanic$SexCode) #New data.frame dones Not select column x.
print(Titanic1[1:3,]) #for Q6
##                          Titanic.Name Titanic.Age Titanic.Sex
## 1        Allen, Miss Elisabeth Walton          29      female
## 2         Allison, Miss Helen Loraine           2      female
## 3 Allison, Mr Hudson Joshua Creighton          30        male
##   Titanic.Survived Titanic.SexCode
## 1                1               1
## 2                0               1
## 3                0               0
  1. Create new column names for the new data frame.
Titanic1$NewCol<- as.numeric(Titanic1$Titanic.Survived)+as.numeric(Titanic1$Titanic.SexCode) #new column named NewCol
ls(Titanic1) #find NewCol
## [1] "NewCol"           "Titanic.Age"      "Titanic.Name"    
## [4] "Titanic.Sex"      "Titanic.SexCode"  "Titanic.Survived"
print(Titanic1[1:3, ]) #for Q6
##                          Titanic.Name Titanic.Age Titanic.Sex
## 1        Allen, Miss Elisabeth Walton          29      female
## 2         Allison, Miss Helen Loraine           2      female
## 3 Allison, Mr Hudson Joshua Creighton          30        male
##   Titanic.Survived Titanic.SexCode NewCol
## 1                1               1      2
## 2                0               1      1
## 3                0               0      0
  1. Use the summary function to create an overview of your new data frame. The print the mean and median for the same two attributes. Please compare.
summary(Titanic1) # mean and median for attributes are same as in perivous summery of original table.
##                        Titanic.Name   Titanic.Age    Titanic.Sex 
##  Carlsson, Mr Frans Olof     :   2   Min.   : 0.17   female:462  
##  Connolly, Miss Kate         :   2   1st Qu.:21.00   male  :851  
##  Kelly, Mr James             :   2   Median :28.00               
##  Abbing, Mr Anthony          :   1   Mean   :30.40               
##  Abbott, Master Eugene Joseph:   1   3rd Qu.:39.00               
##  Abbott, Mr Rossmore Edward  :   1   Max.   :71.00               
##  (Other)                     :1304   NA's   :557                 
##  Titanic.Survived Titanic.SexCode      NewCol      
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.3427   Mean   :0.3519   Mean   :0.6946  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :2.0000  
## 
  1. For at least 3 values in a column please rename so that every value in that column is renamed. For example, suppose I have 20 values of the letter “e” in one column. Rename those values so that all 20 would show as “excellent”.
levels(Titanic1$Titanic.Sex) [levels(Titanic1$Titanic.Sex)=="female"]  <- "F" #change "female" to "F" for all matched value in a column grouped by leverls function
print(Titanic1[1:5,]) #for Q6
##                                    Titanic.Name Titanic.Age Titanic.Sex
## 1                  Allen, Miss Elisabeth Walton       29.00           F
## 2                   Allison, Miss Helen Loraine        2.00           F
## 3           Allison, Mr Hudson Joshua Creighton       30.00        male
## 4 Allison, Mrs Hudson JC (Bessie Waldo Daniels)       25.00           F
## 5                 Allison, Master Hudson Trevor        0.92        male
##   Titanic.Survived Titanic.SexCode NewCol
## 1                1               1      2
## 2                0               1      1
## 3                0               0      0
## 4                0               1      1
## 5                1               0      1
  1. Display enough rows to see examples of all of steps 1-5 above. #Please see from above solution.
#check from above
  1. BONUS - place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
Titanic2 <- read.csv(url("https://raw.githubusercontent.com/czhu505/R_W2_Assignment/master/Titanic.csv"))
#linked to raw data from github R_W2_Assignment
print(Titanic2[1:3,])
##   X                                Name PClass Age    Sex Survived SexCode
## 1 1        Allen, Miss Elisabeth Walton    1st  29 female        1       1
## 2 2         Allison, Miss Helen Loraine    1st   2 female        0       1
## 3 3 Allison, Mr Hudson Joshua Creighton    1st  30   male        0       0

Please submit your .rmd file and the .csv file as well as a link to your RPubs.