library(ggplot2)
student_df <- data.frame("SN" = 1:20,"score" = c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94), stringsAsFactors = FALSE)
(student_df)
## SN score
## 1 1 57
## 2 2 66
## 3 3 69
## 4 4 71
## 5 5 72
## 6 6 73
## 7 7 74
## 8 8 77
## 9 9 78
## 10 10 78
## 11 11 79
## 12 12 79
## 13 13 81
## 14 14 81
## 15 15 82
## 16 16 83
## 17 17 83
## 18 18 88
## 19 19 89
## 20 20 94
b <- boxplot(student_df$score, main = "Mean score for students in probability class", xlab = "student's score", ylab = "student", col = "blue", border = "red", horizontal = TRUE, notch = TRUE)
b
## $stats
## [,1]
## [1,] 66.0
## [2,] 72.5
## [3,] 78.5
## [4,] 82.5
## [5,] 94.0
##
## $n
## [1] 20
##
## $conf
## [,1]
## [1,] 74.96701
## [2,] 82.03299
##
## $out
## [1] 57
##
## $group
## [1] 1
##
## $names
## [1] ""
From the observation we can conclude that: 1. Q1 = 72.5 2. Q2(median) = 78.5 3. Q3 = 82.5 4. Max = 94 5. Min = 66
Outlaier point is 57 with index(1) in the dataframe.
This graph has a symmetric, single-peaked(unimodal) distribution where the histogram forms an approximate mirror imagewith respect to the center of the distribution.
This histogram is match with boxplot number #2
boxplot matching_2
This graph has a symmetric, bimodal uniform distribution.
boxplot matching_3
This graph is skewed-right distribution.
boxplot matching_1
Skewed-Left Distribution, mean < median. The distribution of house prices are likely left skewed as there is a natural boundary at 0 and meaningful number of houses cost more than $6M. Therefore the center would be best described by the median, and variability would be best described by the IQR.
Skewed-Right Distribution. The distribution of house prices are likely right skewed as there is a natural boundary at 0 and only a few number of houses cost more than $1.2M. Therefore the center would be best described by the median, and variability would be best described by the IQR.
The distribution of number of alcoholic drinks consumed is likely right skewed as there is a natural boundary at 0 and only a few drinks are allowed. Therefore the center would be best described by the median, and variability would be best described by the IQR.
The distribution of annual salaries are likely left skewed as there is a natural boundary at 0 and only a few people have much higher salaries. Therefore the center would be best described by the median, and variability would be best described by the IQR. The IQR is a much better measure of variability in the amounts earned by nearly all of employees. The standard deviation gets affected greatly by the two high salaries, but the IQR is robust to these extreme observations.
The variables survival time and transplant are not independent. The difference in the survival time between who got transplant or not was not due to chance, and heart transplant affected the rate of survival time.
It suggestes that the median survival time for patient who had treatment is much higher than median for the control group. In addition, the max survival time for patients who had treatment is over 1500 days compared to the control group who lived around 100 days without treatment.
#propotion of patients died in thetreatment group
prop_treat = 45/69
prop_treat
## [1] 0.6521739
prop_cont = 30/34
prop_cont
## [1] 0.8823529
i. What are the claims being tested?
Heart transplant increase lifespan
iii. What do the simulation results shown below suggest about the effectiveness of the transplant program?
We can conclude that the data provide strong evidence that the transplant provides a longer lifespan in this clinical setting.