This is to explain some very simple probability & using R scripts to demonstrate the probability concept. Most of you already know these things. We can use this to revise some basic probability concepts.
The probababilty of perticuar event can be described as:
\[ \text{Probabillity} = \frac{\text{Number of favourable outcome}}{\text{Number of possible eaqually like outcomes}} \]
Lets assume simple coin. It has a two faces and if you toss the coin it will give result either HEAD or TAIL
The probability of the coin landing H or HEAD is \(P(H)\)
And the probability of the coin landing T or TAIL is \(P(T)\)
Given that all outcomes are equally likely, we can compute the probability of a landing HEAD using the formula:
\[P(H) = \frac{\text{Number of favourable outcome}}{\text{Number of possible eaqually like outcomes}} = \frac{1}{2} \]
Similarly probability of landing TAIL is \(\frac{1}{2}\).
Probability is Just a Guide.
Probability does not confirm us exactly what will happen, it is just a guide.
Example: toss a coin 10 times, how many Heads will come up? According to the Probability of the coin landing Head is ½ chance, so we can expect 5 Heads. But when we actually try it we might get 3 heads, or 4 heads … or anything really.
Lets simulate this using some R script
coin <- c("H","T")
result <- sample(coin,10, replace = TRUE)
Lets see the result
result
## [1] "T" "T" "T" "T" "T" "T" "H" "T" "H" "H"
We can show as a table
table(result)
## result
## H T
## 3 7
Lets convert the table to data frame.
df_total <- data.frame(table(result))
Lets see the df_total
df_total
with above results we can see HEAD 3 times and TAIL df_total$Freq[df_total$result == "T"] times.
Using prop.table() we can add additional columns to show probability of each “H” & “T”
df_prob <- cbind(df_total, prop.table(df_total$Freq))
names(df_prob) <- c("result","Freq", "probability")
df_prob
To show the results graphically we can generate bar plot using barplot in base R
barplot(df_prob$Freq, names.arg = df_total$result, main = "Head & Tail count of Tossing a coin 10 times")
barplot(df_prob$probability, names.arg = df_total$result, main = "Probability of Tossing a coin 10 times")
What is the conclusion?
seems number of HEAD or TAIL frequencies are equal/ not equal?
Probability of HEAD \(P(H)\)
Probability of TAIL \(P(T)\)
In here \(P(H) = P(T) = \frac{1}{2}\) may be/may not be observed.
Probability is Just a Guide Probability does not confirm us exactly what will happen
To demonstrate that we can create tossCoin function & plotCoin function to plot the results nicely.
We can use ggplot2 package to create plots and gridExtra package to arrange the plots.
library(ggplot2)
library(gridExtra)
Lets write the two functions.
tossCoin <- function(numToss = 10){
coin <- c("H","T")
result <- sample(coin,numToss, replace = TRUE)
df_total <- data.frame(table(result))
}
plotCoin <- function(df){
number <- sum(df$Freq)
plot1 <- ggplot(df, aes(result,Freq, fill = result)) + geom_bar(stat = "identity") + labs(title = paste0("Tossing a coin ", number," times")) +theme(legend.position = "none")
plot2 <- ggplot(df, aes(result, prop.table(df$Freq), fill = result)) + geom_bar(stat = "identity") + labs(title = paste0("Measured probability of Tossing a coin ", number," times"), y = "Probability") + theme(legend.position = "none")
grid.arrange(plot1, plot2)
}
Lets see the results by tossing coin 50 times
df50 <- tossCoin(50)
df50
plotCoin(df50)
May be 1000 times
df1000 <- tossCoin(1000)
df1000
plotCoin(df1000)
So if we toss the coin large number of times, the probability of each occurance is approximately/equal to the theoritical probability.
Lets try some big number…
df1million <- tossCoin(1000000)
df1million
plotCoin(df1million)
***
In here instead of two outcomes we have 6 outcomes.
To demonstrate that we can create tossDice function & plotDice function similar to toss Coin.
tossDice <- function(numToss = 10){
dice <- c(1:6)
result <- sample(dice, numToss, replace = TRUE)
df_total <- data.frame(table(result))
}
plotDice <- function(df){
number <- sum(df$Freq)
plot1 <- ggplot(df, aes(result,Freq, fill = result)) + geom_bar(stat = "identity") + labs(title = paste0("Tossing a dice ", number," times")) +theme(legend.position = "none")
plot2 <- ggplot(df, aes(result, prop.table(df$Freq), fill = result)) + geom_bar(stat = "identity") + labs(title = paste0("Probability of Tossing a dice ", number," times"), y = "Probability") + theme(legend.position = "none")
grid.arrange(plot1, plot2)
}
Lets see the results by tossing coin 50 times
df50 <- tossDice(50)
df50
plotDice(df50)
May be 1000 times
df1000 <- tossDice(1000)
df1000
plotDice(df1000)
Dice can be tossed 1 million times
df1million <- tossDice(1000000)
df1million
plotDice(df1million)
as a proportional table
cbind(df1million, prop.table(df1million$Freq))
We can see if the number of times tossing dice increases the probablity of each occurance is approximating to \(\frac{1}{6} = 0.1666667\)