Introduction

This is to explain some very simple probability & using R scripts to demonstrate the probability concept. Most of you already know these things. We can use this to revise some basic probability concepts.

Basic probability

The probababilty of perticuar event can be described as:

\[ \text{Probabillity} = \frac{\text{Number of favourable outcome}}{\text{Number of possible eaqually like outcomes}} \]

Lets assume simple coin. It has a two faces and if you toss the coin it will give result either HEAD or TAIL

The probability of the coin landing H or HEAD is \(P(H)\)

And the probability of the coin landing T or TAIL is \(P(T)\)

Given that all outcomes are equally likely, we can compute the probability of a landing HEAD using the formula:

\[P(H) = \frac{\text{Number of favourable outcome}}{\text{Number of possible eaqually like outcomes}} = \frac{1}{2} \]

Similarly probability of landing TAIL is \(\frac{1}{2}\).

Probability is Just a Guide.

Probability does not confirm us exactly what will happen, it is just a guide.

Example: toss a coin 10 times, how many Heads will come up? According to the Probability of the coin landing Head is ½ chance, so we can expect 5 Heads. But when we actually try it we might get 3 heads, or 4 heads … or anything really.

Lets simulate this using some R script

  1. we create the coin & assign “H” & “T” (for HEAD & TAIL)
coin <- c("H","T")
  1. toss the coin 10 times and store the results in result
result <- sample(coin,10, replace = TRUE)

Lets see the result

result
##  [1] "T" "T" "T" "T" "T" "T" "H" "T" "H" "H"

We can show as a table

table(result)
## result
## H T 
## 3 7

Lets convert the table to data frame.

df_total <- data.frame(table(result))

Lets see the df_total

df_total

with above results we can see HEAD 3 times and TAIL df_total$Freq[df_total$result == "T"] times.

Using prop.table() we can add additional columns to show probability of each “H” & “T”

df_prob <- cbind(df_total, prop.table(df_total$Freq))
names(df_prob) <- c("result","Freq", "probability")
df_prob

To show the results graphically we can generate bar plot using barplot in base R

barplot(df_prob$Freq, names.arg = df_total$result, main = "Head & Tail count of Tossing a coin 10 times")

barplot(df_prob$probability, names.arg = df_total$result, main = "Probability of Tossing a coin 10 times")

What is the conclusion?

seems number of HEAD or TAIL frequencies are equal/ not equal?

Probability of HEAD \(P(H)\)

Probability of TAIL \(P(T)\)

In here \(P(H) = P(T) = \frac{1}{2}\) may be/may not be observed.

Probability is Just a Guide Probability does not confirm us exactly what will happen

toss Coin function

To demonstrate that we can create tossCoin function & plotCoin function to plot the results nicely.

We can use ggplot2 package to create plots and gridExtra package to arrange the plots.

library(ggplot2)
library(gridExtra)

Lets write the two functions.

tossCoin <- function(numToss = 10){
  coin <- c("H","T")
  result <- sample(coin,numToss, replace = TRUE)
  df_total <- data.frame(table(result))
  

}

plotCoin <- function(df){
  number <- sum(df$Freq)
  
  
  plot1 <- ggplot(df, aes(result,Freq, fill = result)) + geom_bar(stat = "identity") + labs(title = paste0("Tossing a coin ", number," times")) +theme(legend.position = "none")
  plot2 <- ggplot(df, aes(result, prop.table(df$Freq), fill = result)) + geom_bar(stat = "identity") + labs(title = paste0("Measured probability of Tossing a coin ", number," times"), y = "Probability") + theme(legend.position = "none")
  
  grid.arrange(plot1, plot2)
  
}

Lets see the results by tossing coin 50 times

df50 <- tossCoin(50)
df50
plotCoin(df50)

May be 1000 times

df1000 <- tossCoin(1000)
df1000
plotCoin(df1000)

So if we toss the coin large number of times, the probability of each occurance is approximately/equal to the theoritical probability.

Lets try some big number…

df1million <- tossCoin(1000000)
df1million
plotCoin(df1million)

***

Can you think of similar procedure for toss a Dice.

In here instead of two outcomes we have 6 outcomes.

toss Dice function

To demonstrate that we can create tossDice function & plotDice function similar to toss Coin.

tossDice <- function(numToss = 10){
  dice <- c(1:6)
  result <- sample(dice, numToss, replace = TRUE)
  df_total <- data.frame(table(result))
  
}

plotDice <- function(df){
  number <- sum(df$Freq)
  
 
  plot1 <- ggplot(df, aes(result,Freq, fill = result)) + geom_bar(stat = "identity") + labs(title = paste0("Tossing a dice ", number," times")) +theme(legend.position = "none")


  plot2 <- ggplot(df, aes(result, prop.table(df$Freq), fill = result)) + geom_bar(stat = "identity") + labs(title = paste0("Probability of Tossing a dice ", number," times"), y = "Probability") + theme(legend.position = "none")
  
  grid.arrange(plot1, plot2)

}

Lets see the results by tossing coin 50 times

df50 <- tossDice(50)
df50
plotDice(df50)

May be 1000 times

df1000 <- tossDice(1000)
df1000
plotDice(df1000)

Dice can be tossed 1 million times

df1million <- tossDice(1000000)
df1million
plotDice(df1million)

as a proportional table

cbind(df1million, prop.table(df1million$Freq))

We can see if the number of times tossing dice increases the probablity of each occurance is approximating to \(\frac{1}{6} = 0.1666667\)