#clear the workspace
rm(list = ls()) #remove all objects from the environment
cat("\f") #clear the console
gc() #clear unused memory
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 520363 27.8 1156558 61.8 660385 35.3
## Vcells 946423 7.3 8388608 64.0 1769490 13.6
Independence: Independence or independent events is when the occurrence of one event does not affect the occurrence or nonoccurrence of the other event(s). The probability of the intersection of the events (both of them happening) is P(A) X P(B). In other words; if event A happens the probability of event B happening does not change.
Mutually Exclusive: Mutually exclusive events are those that the occurrence of an event prevents the other event from happening. The two events have no intersection. In other words; if event A happens then there is a 0% chance of event B happening.
EXPIRIMENT: Rolling a dice.
Outcome/sample space: {1, 2, 3, 4, 5, 6}
dice <- c(1,2,3,4,5,6)
expand.grid(dice) #shows the possible outcomes, if we roll 2 dice the sample space becomes much large- 36 outcomes
## Var1
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
Total events in the sample space is 6.
MUTUALLY EXCLUSIVE EVENTS: Rolling a 5 (event 1) and a 6 (event 2); rolling an even number (event 1) and a 5 (event 2); rolling a 4 (event 1) and an odd number (event 2).
NON-MUTUALLY EXCLUSIVE EVENTS: rolling an even number (event 1) and rolling a 6 (event 2)- (because if you roll a 6 that is also an even number)
Change the experiment: We will now roll 2 dice separately.
Outcome/ sample space of the second dice is {1, 2, 3, 4, 5, 6}
expand.grid(dice, dice) #possible outcomes of rolling 2 dice- 36 total- 6 X 6 = 36
## Var1 Var2
## 1 1 1
## 2 2 1
## 3 3 1
## 4 4 1
## 5 5 1
## 6 6 1
## 7 1 2
## 8 2 2
## 9 3 2
## 10 4 2
## 11 5 2
## 12 6 2
## 13 1 3
## 14 2 3
## 15 3 3
## 16 4 3
## 17 5 3
## 18 6 3
## 19 1 4
## 20 2 4
## 21 3 4
## 22 4 4
## 23 5 4
## 24 6 4
## 25 1 5
## 26 2 5
## 27 3 5
## 28 4 5
## 29 5 5
## 30 6 5
## 31 1 6
## 32 2 6
## 33 3 6
## 34 4 6
## 35 5 6
## 36 6 6
INDEPENDENT EVENTS: Rolling an even number on roll 1 and rolling an odd number on roll 2 are independent events. Rolling an even number on roll 1 has no influence on what will be rolled on the second dice.
If the experiment was flip a coin and roll one dice; these are independent as well. If I flip a head (event 1), it has no influence on what number (1-6) will be rolled on the dice.
NOT INDEPENDENT EVENTS:
New experiment: We have a box of 5 red, 5 green, and 5 yellow balls. We pick 2 balls from the box, one after the other without replacement (the balls are not put back into the box when they are picked).
Not independent event: picking a red ball (event 1), and picking another red ball (event 2). Not independent because if a red ball is picked (5/15 chance) and is not put back. The chance of picking a red ball for event 2 is now 4/14. Picking a red ball in event 1 changed the odds of picking a red ball for event 2.
library(readr)
train <- read_csv("/Users/Ryan/OneDrive/Documents/Data Analysis- Sharma/Discussions/train.csv")
## Rows: 891 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Name, Sex, Ticket, Cabin, Embarked
## dbl (7): PassengerId, Survived, Pclass, Age, SibSp, Parch, Fare
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#na.strings = c('') # this chunk is not working
head(train[,c(2,3,4)], n = 2)
## # A tibble: 2 × 3
## Survived Pclass Name
## <dbl> <dbl> <chr>
## 1 0 3 Braund, Mr. Owen Harris
## 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)
tail(train[,c(2,3,4)], n = 2)
## # A tibble: 2 × 3
## Survived Pclass Name
## <dbl> <dbl> <chr>
## 1 1 1 Behr, Mr. Karl Howell
## 2 0 3 Dooley, Mr. Patrick
Create a 1 2 way cross tabulation
Survival_Sex_Table <- table(train$Survived,
train$Sex
)
colnames(Survival_Sex_Table) = c("Female", "Male")
rownames(Survival_Sex_Table) = c("Died", "Survived")
print(Survival_Sex_Table)
##
## Female Male
## Died 81 468
## Survived 233 109
Create a 2 3 way cross tabulation
table(train$Survived,
train$Sex,
train$Pclass
)
## , , = 1
##
##
## female male
## 0 3 77
## 1 91 45
##
## , , = 2
##
##
## female male
## 0 6 91
## 1 70 17
##
## , , = 3
##
##
## female male
## 0 72 300
## 1 72 47
This pivot table is showing the number of people that both survived and died by passenger class and then sorted again into gender within passenger class. As we can see, 1st class had 3 females die and 91 survive while 77 males died and 45 survived. In total 80 people died and 136 survived.
Class 2; 6 females died and 70 survived while 91 males died and 17 survived. In total 97 people died and 87 survived.
Class 3; 72 females died and 72 females survived while 300 males died and 47 survived. In total 372 people died and 119 survived.
As we can see from the table the higher classes had a much higher survival rate across women, men, and total. However, women survival rate between class 1 and 2 are similar. There are also much more women that survived than men. This makes sense as women were a priority to get on the life boats before men. The survival rate for men drops drastically from class 1 to 2 and 3.
Survival Rates By Class:
1: Male : 36.8%
Female: 96.8%
Total: 63%
Class 2:
male: 15.7%
female: 92.1%
Total: 47.3%
Class 3:
male13.5%
female: 50%
Total: 24.2%