R Markdown
# Preliminary. Read from text file
tableInput <- read.table(file = "Titanic.csv", header = TRUE, sep = ",")
#1a. Execution of summary function in R reveals some interesting statistics
summary(tableInput)
## X Name PClass
## Min. : 1 Carlsson, Mr Frans Olof : 2 * : 1
## 1st Qu.: 329 Connolly, Miss Kate : 2 1st:322
## Median : 657 Kelly, Mr James : 2 2nd:279
## Mean : 657 Abbing, Mr Anthony : 1 3rd:711
## 3rd Qu.: 985 Abbott, Master Eugene Joseph: 1
## Max. :1313 Abbott, Mr Rossmore Edward : 1
## (Other) :1304
## Age Sex Survived SexCode
## Min. : 0.17 female:462 Min. :0.0000 Min. :0.0000
## 1st Qu.:21.00 male :851 1st Qu.:0.0000 1st Qu.:0.0000
## Median :28.00 Median :0.0000 Median :0.0000
## Mean :30.40 Mean :0.3427 Mean :0.3519
## 3rd Qu.:39.00 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :71.00 Max. :1.0000 Max. :1.0000
## NA's :557
#1b. Find the mean and median of the column Age
#I found a good reference on how to retrieve mean from this link: https://stackoverflow.com/questions/37908949/how-to-find-the-mean-of-a-column-in-r
# This was especially helpful because some of my column data had NA. The na.rm helped remove the NAs from the calculations.
mean(tableInput$Age, na.rm=TRUE)
## [1] 30.39799
median(tableInput$Age, na.rm=TRUE)
## [1] 28
#1c. Find the mean and median of the column Survived
mean(tableInput$Survived, na.rm=TRUE)
## [1] 0.3427266
median(tableInput$Survived, na.rm=TRUE)
## [1] 0
#2. New data set frame from tableInput. The only columns that will be used will be Name, Age, Sex, and Survived. For this exercise, I limited the rows to 20.
#ref: https://dzone.com/articles/learn-r-how-create-data-frames
tableInput2 <- read.table(file = "Titanic.csv", header = TRUE, sep = ",", nrows=20)
dfTableInput <- as.data.frame(tableInput2)
dfSubSetTableInput <- dfTableInput[,c(2,4,5,6)]
#3. New column names for dfSubSetTableInput will be Full Name, Age, Gender, Alive
names(dfSubSetTableInput) <- c("FullName", "Age", "Gender", "Alive")
dfSubSetTableInput
## FullName Age Gender Alive
## 1 Allen, Miss Elisabeth Walton 29.00 female 1
## 2 Allison, Miss Helen Loraine 2.00 female 0
## 3 Allison, Mr Hudson Joshua Creighton 30.00 male 0
## 4 Allison, Mrs Hudson JC (Bessie Waldo Daniels) 25.00 female 0
## 5 Allison, Master Hudson Trevor 0.92 male 1
## 6 Anderson, Mr Harry 47.00 male 1
## 7 Andrews, Miss Kornelia Theodosia 63.00 female 1
## 8 Andrews, Mr Thomas, jr 39.00 male 0
## 9 Appleton, Mrs Edward Dale (Charlotte Lamson) 58.00 female 1
## 10 Artagaveytia, Mr Ramon 71.00 male 0
## 11 Astor, Colonel John Jacob 47.00 male 0
## 12 Astor, Mrs John Jacob (Madeleine Talmadge Force) 19.00 female 1
## 13 Aubert, Mrs Leontine Pauline NA female 1
## 14 Barkworth, Mr Algernon H NA male 1
## 15 Baumann, Mr John D NA male 0
## 16 Baxter, Mrs James (Helene DeLaudeniere Chaput) 50.00 female 1
## 17 Baxter, Mr Quigg Edmond 24.00 male 0
## 18 Beattie, Mr Thomson 36.00 male 0
## 19 Beckwith, Mr Richard Leonard 37.00 male 1
## 20 Beckwith, Mrs Richard Leonard (Sallie Monypeny) 47.00 female 1
#4a. Execution of summary function in R reveals some interesting statistics for dfSubsetTableInput
# Some notable comparisons
# Age - in the original data set, the median age was 28. With the smaller data set, it has become 37.
# Gender - in the original data set, there was an 8:5 ratio of men to woman. In the smaller dataset, it has become 11:9 respectively.
# Alive/Survived - in the orignal data set, the mean survivability of passengers was 34.7%. In the smaller data set, the mean survivability of passengers increased to 55%.
summary(dfSubSetTableInput)
## FullName Age
## Allen, Miss Elisabeth Walton : 1 Min. : 0.92
## Allison, Master Hudson Trevor : 1 1st Qu.:25.00
## Allison, Miss Helen Loraine : 1 Median :37.00
## Allison, Mr Hudson Joshua Creighton : 1 Mean :36.76
## Allison, Mrs Hudson JC (Bessie Waldo Daniels): 1 3rd Qu.:47.00
## Anderson, Mr Harry : 1 Max. :71.00
## (Other) :14 NA's :3
## Gender Alive
## female: 9 Min. :0.00
## male :11 1st Qu.:0.00
## Median :1.00
## Mean :0.55
## 3rd Qu.:1.00
## Max. :1.00
##
#4b. Find the mean and median of the column Age
mean(dfSubSetTableInput$Age, na.rm=TRUE)
## [1] 36.76
median(dfSubSetTableInput$Age, na.rm=TRUE)
## [1] 37
#4c. Find the mean and median of the column Alive
mean(dfSubSetTableInput$Alive, na.rm=TRUE)
## [1] 0.55
median(dfSubSetTableInput$Alive, na.rm=TRUE)
## [1] 1
# 5. Change all column values of male in Gender to 'Dude'
dfSubSetTableInputA <- within(dfSubSetTableInput, levels(Gender)[levels(Gender) == "male"] <- "Dude")
#6. Show samples of output table of each problem where applicable.
#ref: https://www.statmethods.net/input/contents.html
#6. - 1. Heading output for tableInput for first 10 rows
head(tableInput, n=10)
## X Name PClass Age Sex
## 1 1 Allen, Miss Elisabeth Walton 1st 29.00 female
## 2 2 Allison, Miss Helen Loraine 1st 2.00 female
## 3 3 Allison, Mr Hudson Joshua Creighton 1st 30.00 male
## 4 4 Allison, Mrs Hudson JC (Bessie Waldo Daniels) 1st 25.00 female
## 5 5 Allison, Master Hudson Trevor 1st 0.92 male
## 6 6 Anderson, Mr Harry 1st 47.00 male
## 7 7 Andrews, Miss Kornelia Theodosia 1st 63.00 female
## 8 8 Andrews, Mr Thomas, jr 1st 39.00 male
## 9 9 Appleton, Mrs Edward Dale (Charlotte Lamson) 1st 58.00 female
## 10 10 Artagaveytia, Mr Ramon 1st 71.00 male
## Survived SexCode
## 1 1 1
## 2 0 1
## 3 0 0
## 4 0 1
## 5 1 0
## 6 1 0
## 7 1 1
## 8 0 0
## 9 1 1
## 10 0 0
#6. - 2. Tail output for tableInput2 for last 10 rows
tail(tableInput2, n=10)
## X Name PClass Age Sex
## 11 11 Astor, Colonel John Jacob 1st 47 male
## 12 12 Astor, Mrs John Jacob (Madeleine Talmadge Force) 1st 19 female
## 13 13 Aubert, Mrs Leontine Pauline 1st NA female
## 14 14 Barkworth, Mr Algernon H 1st NA male
## 15 15 Baumann, Mr John D 1st NA male
## 16 16 Baxter, Mrs James (Helene DeLaudeniere Chaput) 1st 50 female
## 17 17 Baxter, Mr Quigg Edmond 1st 24 male
## 18 18 Beattie, Mr Thomson 1st 36 male
## 19 19 Beckwith, Mr Richard Leonard 1st 37 male
## 20 20 Beckwith, Mrs Richard Leonard (Sallie Monypeny) 1st 47 female
## Survived SexCode
## 11 0 0
## 12 1 1
## 13 1 1
## 14 1 0
## 15 0 0
## 16 1 1
## 17 0 0
## 18 0 0
## 19 1 0
## 20 1 1
#6. - 3. and 4. Head output for dfSubSetTableInput for first 10 rows
head(dfSubSetTableInput, n=10)
## FullName Age Gender Alive
## 1 Allen, Miss Elisabeth Walton 29.00 female 1
## 2 Allison, Miss Helen Loraine 2.00 female 0
## 3 Allison, Mr Hudson Joshua Creighton 30.00 male 0
## 4 Allison, Mrs Hudson JC (Bessie Waldo Daniels) 25.00 female 0
## 5 Allison, Master Hudson Trevor 0.92 male 1
## 6 Anderson, Mr Harry 47.00 male 1
## 7 Andrews, Miss Kornelia Theodosia 63.00 female 1
## 8 Andrews, Mr Thomas, jr 39.00 male 0
## 9 Appleton, Mrs Edward Dale (Charlotte Lamson) 58.00 female 1
## 10 Artagaveytia, Mr Ramon 71.00 male 0
#6. - 5. Tail output for dfSubSetTableInputA for last 10 rows
tail(dfSubSetTableInputA, n=10)
## FullName Age Gender Alive
## 11 Astor, Colonel John Jacob 47 Dude 0
## 12 Astor, Mrs John Jacob (Madeleine Talmadge Force) 19 female 1
## 13 Aubert, Mrs Leontine Pauline NA female 1
## 14 Barkworth, Mr Algernon H NA Dude 1
## 15 Baumann, Mr John D NA Dude 0
## 16 Baxter, Mrs James (Helene DeLaudeniere Chaput) 50 female 1
## 17 Baxter, Mr Quigg Edmond 24 Dude 0
## 18 Beattie, Mr Thomson 36 Dude 0
## 19 Beckwith, Mr Richard Leonard 37 Dude 1
## 20 Beckwith, Mrs Richard Leonard (Sallie Monypeny) 47 female 1
#7. BONUS - Place .csv file in my github repository and access it from there. Using head function to on rawgithuboutput to prove the file was read by program.
#ref: https://stackoverflow.com/questions/14441729/read-a-csv-from-github-into-r
urlfile <- 'https://raw.githubusercontent.com/RommyGraphs/MSDA/master/2018Workshop/Titanic.csv'
rawgithuboutput <- read.csv(url(urlfile))
head(rawgithuboutput, n=10)
## X Name PClass Age Sex
## 1 1 Allen, Miss Elisabeth Walton 1st 29.00 female
## 2 2 Allison, Miss Helen Loraine 1st 2.00 female
## 3 3 Allison, Mr Hudson Joshua Creighton 1st 30.00 male
## 4 4 Allison, Mrs Hudson JC (Bessie Waldo Daniels) 1st 25.00 female
## 5 5 Allison, Master Hudson Trevor 1st 0.92 male
## 6 6 Anderson, Mr Harry 1st 47.00 male
## 7 7 Andrews, Miss Kornelia Theodosia 1st 63.00 female
## 8 8 Andrews, Mr Thomas, jr 1st 39.00 male
## 9 9 Appleton, Mrs Edward Dale (Charlotte Lamson) 1st 58.00 female
## 10 10 Artagaveytia, Mr Ramon 1st 71.00 male
## Survived SexCode
## 1 1 1
## 2 0 1
## 3 0 0
## 4 0 1
## 5 1 0
## 6 1 0
## 7 1 1
## 8 0 0
## 9 1 1
## 10 0 0