Graded: 1.8, 1.10, 1.28, 1.36, 1.48, 1.50, 1.56, 1.70 (use the library(openintro); data(heartTr) to load the data)

#1.8 Smoking habits of UK residents: - 
#a) Each row in the matrix represents a case. 
#b) 1691
#c) age: Numerical Discrete
#   grossIncome: Numerical Continuous
#   amtWeekend: Numerical Discrete 
#   amtWeekdays: Numerical Discrete 
#   Sex:  Categorical nominal 
#   Martial: Categorical Ordinal
#   smoke: Categorical Ordinal
#1.10 Cheaters, scope of inference.
#a) Population of Interest : Cheaters
#     Sample : 160 children between the ages of 5 and 15.
#b) This study cannot be generalized as causal relationship cannot be established, This is an example of observational study.
#1.28 Reading the paper.
#a)  Information in the article show strong relationship between the smokers and people who are have dementia. Study proves people who smoke more are more prone to dementia /Alzheimer / Vascular dementia. This study can be classified as an Observational study as it puts a question mark on any causal relation ship between smokers and Dementia. 
#b)  This is example of an Observational study, as  causal relationship between (Sleep disorders & Bullying) variables  cannot be considered. But study show relationship between bullying and sleep disorders.
#1.36 Exercise and mental health.

#a) This is experimental study
#b) Treatment group : people who do exercise twice a week
#   Control group : people who will remain as they are now.
#c) Blocking variable is Exercise
#d) No
#e) I agree study establishes causal relationship between exercise and mental health. It can be generalized as random samples from stratified were considered.
#f) study could had been based on gender and also further divided into  a)  exercise five times a week b) excercise 2-3 times a week and c) no exercise groups to be more generalized and get clear outcomes.

#1.48
statScores  <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
statScores
##  [1] 57 66 69 71 72 73 74 77 78 78 79 79 81 81 82 83 83 88 89 94
boxplot(statScores)

summary(statScores)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   57.00   72.75   78.50   77.70   82.25   94.00
#1.50 Mix-and-match.
#a)  Data in first Histogram is symmetric  and is Bimodal and matches to (1) boxplot
#b) Data in second Histogram is right skewed and is Multimodal and matches to (2) boxplot
#c) Data is third histogram is right skewed and is unimodal and matches to (3) boxplot.


#1.56 Distributions and appropriate statistics, Part II
#a) The data is right skewed, Median is the best way of observe distribution and for variability IQR should be good technique.
#b) The data is symmetric as the 25% of data is below 300000 and if we calculate (1.5*IQR ) + Q3, which comes out to be 1,800,000 it gives us the upper outlier  limit as very few houses are above 1,200,000. And similarly for lower outlier (1.5* IQR ) and then Q1- (1.5*IQR) gives us the lower outlier limit which comes out to be -600,000 thus we can say most of the data lies between the limits thus it is symmetric . Distribution can be observed by Median, and variability can be measured by IQR.
#c) This data is right skewed and Distribution can be measured by mean and SD will be used for variability.
#d) The data is right skewed as salary cannot be negative, and distribution can be measured by mean and SD will be used for variability.


#1.70 Heart transplants.
#a) I agree number of patients who got treatment have survived more than the pateint who were in control section.
#b) Looking at  BoxPlot, it is evident that people who got treatment have more chances to survive.
#c) 

#d)  i) whether treatment is an effective means for survival.
    #  ii) 50% , 50%, 69, 34, 0, from chance alone.
     # iii)  The data is left skewed, which states that treatment is not that effective.



library(openintro) 
## Warning: package 'openintro' was built under R version 3.5.2
## Please visit openintro.org for free statistics materials
## 
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
## 
##     cars, trees
data(heartTr)

NROW(heartTr)
## [1] 103
 controlDied <- subset(heartTr, heartTr$transplant =='control' & heartTr$survived =='dead' )
dim(controlDied)
## [1] 30  8
NROW(controlDied)
## [1] 30
treatmentDied <- subset(heartTr, heartTr$transplant =='treatment' & heartTr$survived =='dead' )
dim(treatmentDied)
## [1] 45  8
nrow(treatmentDied)
## [1] 45
#proportion of control dead 
(NROW(controlDied)/nrow(heartTr))
## [1] 0.2912621
#proportion of treatment dead
(NROW(treatmentDied)/nrow(heartTr))
## [1] 0.4368932
treatmentgrp <- subset(heartTr, heartTr$transplant=='treatment')
nrow(treatmentgrp)
## [1] 69