Komputasi Statistika

~ Tugas 1 ~

Kontak	: \(\downarrow\)
Email	diyasaryanugroho@gmail.com
Instagram	https://www.instagram.com/diasary_nm/
RPubs	https://rpubs.com/diyasarya/

Exercise 1

Suppose there are twenty multiple choice questions in an Statistics class quiz. Each question has five possible answers, and only one of them is correct. Find the probability of having four or less correct answers if a student attempts to answer every question at random.

Answer 1

If we expect there are four questions are right. We can find an exact binomial probability value.

p <- 1/5        # prob of get a quest is correct
q <- 4/5        # prob of get a quest is wrong
n <- 20         # num of quests
k <- 4          # num of quest is correct ans

dbinom(k, size = n, prob = p)

## [1] 0.2181994

We can get the probability of four questions are right is 0,2181994 is equal to 0,22.

Or we can use visualization if we expect there are four questions are right.

library(dplyr)
library(ggplot2)
data.frame(quest = 0:n, 
           pmf = dbinom(x = 0:n, size = n, prob = p)) %>%
  mutate(Quest = ifelse(quest == k, "4", "other"))%>%
  ggplot(aes(x = factor(quest), y = pmf, fill = Quest)) +
  geom_col() +
  theme_minimal()+
  geom_text(
    aes(label = round(pmf,2), y = pmf + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0) +
  labs(title = "Probability of X = x Right.",
       subtitle = "b(20, .2)",
       x = "Right Answer (x)",
       y = "probability")

But if we expect there are four or less questions are right, we can use a cumulative method.

p <- 1/5        # prob of get a quest is correct
q <- 4/5        # prob of get a quest is wrong
n <- 20         # num of quests
k <- 0:4        # num of quest is correct ans

prob <- dbinom(k, size=n, prob=p)   # calculate the prob of k correct ans
sum(prob)                           # sum up the prob of 0 to 4 correct ans

## [1] 0.6296483

Or we can use the manual calculation and the alternative way.

manual <- dbinom(0, size=n, prob=p) +
  dbinom(1, size=n, prob=p) +
  dbinom(2, size=n, prob=p) +
  dbinom(3, size=n, prob=p) +
  dbinom(4, size=n, prob=p)
alternative <- pbinom(4, size=n, prob=p)

manual

## [1] 0.6296483

alternative

## [1] 0.6296483

We can get the probability of four or less questions are right is 0,6296483 is equal to 0,63.

Or we can use visualization if we expect there are four or less questions are right.

data.frame(quest = 0:n, 
           pmf = dbinom(x = 0:n, size = n, prob = p),
           cdf = pbinom(q = 0:n, size = n, prob = p, 
                        lower.tail = TRUE)) %>%
  mutate(Quest = ifelse(quest <= 4, "<=4", "other")) %>%
  ggplot(aes(x = factor(quest), y = cdf, fill = Quest)) +
  geom_col() +
  theme_minimal()+
  geom_text(
    aes(label = round(cdf,2), y = cdf + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0) +
  labs(title = "Probability of X <= 4 right",
       subtitle = "b(20, .2)",
       x = "Right (x)",
       y = "probability")

So, if we want only four questions to be true then the probability value is what is taken. However, if we want four or less of the questions to be true then the probability value will be calculated cumulatively from no correct questions to four correct questions.

Exercise 2

If twenty cars are crossing a bridge per minute on average, visualize and find the probability of having thirteen or more cars crossing the bridge in a particular minute.

Answer 2

To find the probability of having thirteen or more cars crossing the bridge in a particular minute, we can sum up the probabilities for k equal to 13, 14, 15, 16, 17, 18, 19, and 20:

n <- 20         # avg num of cars crossing the bridge per minute
k <- 13:20      # num of cars crossing the bridge in a particular minute

prob <- dpois(k, n)
sum(prob)

## [1] 0.5200806

Or we can use the manual calculation and the alternative way.

Note : if we use the alternative way, upper tail - lower tail (a-1).

According to this case, we want to find the probability of having thirteen or more cars crossing the bridge in a particular minute. So, we have upper tail equal twenty and lower tail equal thirteen. In Poisson Dist, we get the alternative way with formula :

Upper tail (20) - lower tail (13-1)

manual <- dpois(13, n) +
  dpois(14, n) +
  dpois(15, n) +
  dpois(16, n) +
  dpois(17, n) +
  dpois(18, n) +
  dpois(19, n) +
  dpois(20, n)
alternative <- ppois(20, 20, lower.tail = TRUE) -
  ppois(12, 20, lower.tail = TRUE)      #If we want to use alt way lower tail is q=a-1

manual

## [1] 0.5200806

alternative

## [1] 0.5200806

Or we can use visualization to see the probabilty of thirteen or more cars crossing the bridge in a particular minute.

options(scipen = 999, digits = 2) # sig digits
cars <- 0:n
density <- dpois(cars, lambda = n)
prob <- ppois(q = cars, lambda = n, lower.tail = TRUE)
df <- data.frame(cars, density, prob)
ggplot(df, aes(x = factor(cars), y = density)) +
  theme_minimal()+
  geom_col() +
  geom_text(
    aes(label = round(density,2), y = density + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0) +
  labs(title = "PMF and CDF of Poisson Distribution",
       subtitle = "P(20).",
       x = "Events (x)",
       y = "Density") +
  geom_line(data = df, aes(x = cars, y = prob))

Excercise 3

Suppose the probability that a drug produces a certain side effect is p = 0.1% and n = 1,000 patients in a clinical trial receive the drug. What is the probability 0 people experience the side effect by using visualization techniques?

Answer 3

We expect that 0 people get the side effect by clinical trial receive the drug. So, we calculated the probability using poisson distribution.

p <- 0.001
n <- 1000
x <- 0

ppois(x, n*p)

## [1] 0.37

We get 0,37 probability of 0 people get the side effect by clinical trial receive the drug. So, 0 people have a chance equal 0,37 or 37% to get the side effect.

Or we can use the visualization to know the probability of people get the side effect.

options(scipen = 999, digits = 2)             # sig digits
patients <- 0:10
density <- dpois(x = patients, lambda = n*p)
prob <- ppois(q = patients, lambda = n*p, lower.tail = TRUE)
df <- data.frame(patients, density, prob)
ggplot(df, aes(x = patients, y = density)) +
  geom_col() +
  geom_text(
    aes(label = round(density,2), y = density + 0.01),
    position = position_dodge(0.9),
    size = 3,
    vjust = 0
  )+
  theme_minimal()+
  labs(title = "Poisson(1000)",
       subtitle = "PMF and CDF of Poisson(0) distribution.",
       x = "Hits (x)",
       y = "Density") +
  geom_line(data = df, aes(x = patients, y = prob))

According to the visualization, we can conclude that 0 to 4 people still have a chance to get the side effect although the chance are very small. So, 4 of 1000 patient have a small chance to get the side effect.

Exercise 4

Suppose the mean checkout time of a supermarket cashier is three minutes. Find the probability of a customer checkout being completed by the cashier in less than two minutes.

Answer 4

If we want to know how big the probability of a customer checkout being completed by the cashier in less than two minutes, we can calculated using exponential distribution.

lambda <- 1/3             # mean of checkout time
x <- 2                    # cutoff time

pexp(x, rate = lambda, lower.tail = TRUE)

## [1] 0.49

So, we get that the probability of a customer checkout being completed by the cashier in less than two minutes is equal to 0,49 or 49%.