Graded: 1.8, 1.10, 1.28, 1.36, 1.48, 1.50, 1.56, 1.70 (use the library(openintro); data(heartTr) to load the data)
#1.8 Smoking habits of UK residents: -
#a) Each row in the matrix represents a case.
#b) 1691
#c) age: Numerical Discrete
# grossIncome: Numerical Continuous
# amtWeekend: Numerical Discrete
# amtWeekdays: Numerical Discrete
# Sex: Categorical nominal
# Martial: Categorical Ordinal
# smoke: Categorical Ordinal
#1.10 Cheaters, scope of inference.
#a) Population of Interest : Cheaters
# Sample : 160 children between the ages of 5 and 15.
#b) This study cannot be generalized as causal relationship cannot be established, This is an example of observational study.
#1.28 Reading the paper.
#a) Information in the article show strong relationship between the smokers and people who are have dementia. Study proves people who smoke more are more prone to dementia /Alzheimer / Vascular dementia. This study can be classified as an Observational study as it puts a question mark on any causal relation ship between smokers and Dementia.
#b) This is example of an Observational study, as causal relationship between (Sleep disorders & Bullying) variables cannot be considered. But study show relationship between bullying and sleep disorders.
#1.36 Exercise and mental health.
#a) This is experimental study
#b) Treatment group : people who do exercise twice a week
# Control group : people who will remain as they are now.
#c) Blocking variable is Exercise
#d) No
#e) I agree study establishes causal relationship between exercise and mental health. It can be generalized as random samples from stratified were considered.
#f) study could had been based on gender and also further divided into a) exercise five times a week b) excercise 2-3 times a week and c) no exercise groups to be more generalized and get clear outcomes.
#1.48
statScores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
statScores
## [1] 57 66 69 71 72 73 74 77 78 78 79 79 81 81 82 83 83 88 89 94
boxplot(statScores)

summary(statScores)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 57.00 72.75 78.50 77.70 82.25 94.00
#1.50 Mix-and-match.
#a) Data in first Histogram is symmetric and is Bimodal and matches to (1) boxplot
#b) Data in second Histogram is right skewed and is Multimodal and matches to (2) boxplot
#c) Data is third histogram is right skewed and is unimodal and matches to (3) boxplot.
#1.56 Distributions and appropriate statistics, Part II
#a) The data is right skewed, Median is the best way of observe distribution and for variability IQR should be good technique.
#b) The data is symmetric as the 25% of data is below 300000 and if we calculate (1.5*IQR ) + Q3, which comes out to be 1,800,000 it gives us the upper outlier limit as very few houses are above 1,200,000. And similarly for lower outlier (1.5* IQR ) and then Q1- (1.5*IQR) gives us the lower outlier limit which comes out to be -600,000 thus we can say most of the data lies between the limits thus it is symmetric . Distribution can be observed by Median, and variability can be measured by IQR.
#c) This data is right skewed and Distribution can be measured by mean and SD will be used for variability.
#d) The data is right skewed as salary cannot be negative, and distribution can be measured by mean and SD will be used for variability.
#1.70 Heart transplants.
#a) I agree number of patients who got treatment have survived more than the pateint who were in control section.
#b) Looking at BoxPlot, it is evident that people who got treatment have more chances to survive.
#c)
#d) i) whether treatment is an effective means for survival.
# ii) 50% , 50%, 69, 34, 0, from chance alone.
# iii) The data is left skewed, which states that treatment is not that effective.
library(openintro)
## Warning: package 'openintro' was built under R version 3.5.2
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
data(heartTr)
NROW(heartTr)
## [1] 103
controlDied <- subset(heartTr, heartTr$transplant =='control' & heartTr$survived =='dead' )
dim(controlDied)
## [1] 30 8
NROW(controlDied)
## [1] 30
treatmentDied <- subset(heartTr, heartTr$transplant =='treatment' & heartTr$survived =='dead' )
dim(treatmentDied)
## [1] 45 8
nrow(treatmentDied)
## [1] 45
#proportion of control dead
(NROW(controlDied)/nrow(heartTr))
## [1] 0.2912621
#proportion of treatment dead
(NROW(treatmentDied)/nrow(heartTr))
## [1] 0.4368932
treatmentgrp <- subset(heartTr, heartTr$transplant=='treatment')
nrow(treatmentgrp)
## [1] 69