library("plyr")
library("dplyr")
library("tidyr")
library("readxl")
library("knitr")
Using R, you’ll be handling missing values in this data set (titanic3), and creating a new data set.
titanic_original <- read_excel("C:/Users/xena0/Downloads/titanic3.xls")
## Warning in read_fun(path = path, sheet_i = sheet, limits = limits, shim =
## shim, : Coercing text to numeric in M1306 / R1306C13: '328'
View(titanic_original)
titanic_clean <- titanic_original %>%
mutate(embarked = ifelse(grepl(" ", embarked, ignore.case = TRUE), "S", embarked)) %>%
replace_na(list(embarked = "S"))
Answer: When you are using such a large data set, I would think using the mean would be the best way to predict the age of passangers with missing age information. If the passenger list was smaller, you may want to look at other options.
titanic_clean %>%
summarise(avg_age = mean(age, na.rm = TRUE))
## # A tibble: 1 x 1
## avg_age
## <dbl>
## 1 29.9
titanic_clean <- titanic_clean %>%
replace_na(list(age = 30))
titanic_clean <- titanic_clean %>%
replace_na(list(boat = "NONE"))
Answer: Yes, so that it is easier to pull out the data.
Answer: It could mean a lot of things. It could just be a mistake. It could indicate that the passanger is lower class and didn’t have a cabin. It could indicate a stow away that was found after the ship embarked. It could corolate with survival… Some of these answers may be found analizing this data against the know data points.
titanic_clean <- titanic_clean %>%
mutate(has_cabin_number = ifelse(grepl("NA", cabin), 0, 1))
kable(titanic_clean[1:5, ], caption = "titanic clean")
pclass | survived | name | sex | age | sibsp | parch | ticket | fare | cabin | embarked | boat | body | home.dest | has_cabin_number |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | Allen, Miss. Elisabeth Walton | female | 29.0000 | 0 | 0 | 24160 | 211.3375 | B5 | S | 2 | NA | St Louis, MO | 1 |
1 | 1 | Allison, Master. Hudson Trevor | male | 0.9167 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | 11 | NA | Montreal, PQ / Chesterville, ON | 1 |
1 | 0 | Allison, Miss. Helen Loraine | female | 2.0000 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | NONE | NA | Montreal, PQ / Chesterville, ON | 1 |
1 | 0 | Allison, Mr. Hudson Joshua Creighton | male | 30.0000 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | NONE | 135 | Montreal, PQ / Chesterville, ON | 1 |
1 | 0 | Allison, Mrs. Hudson J C (Bessie Waldo Daniels) | female | 25.0000 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | NONE | NA | Montreal, PQ / Chesterville, ON | 1 |