This is Homework #1 for the DATA 606 course. The following problems are solved in this homework from the Open Intro statistics book using the R script - Exercise 1.8, 1.10, 1.28, 1.36, 1.48, 1.50, 1.56, 1.70.
openintro library is being loaded and heartTr data being used in this R Markdown.
Each row of this data represents the smoking habits of UK residents by gender, age, marital status, income, whether or not they smoke and how much they smoke during weekends of week days.
df <- read.csv(url('https://raw.githubusercontent.com/vepark/Datasets/mushroom/SmokingData.csv'))
nrow(df)## [1] 1693
#Tried a few things below to come to conclusion about the variable type
names(df)## [1] "Sex" "Age"
## [3] "Marital.Status" "Highest.Qualification"
## [5] "Nationality" "Ethnicity"
## [7] "Gross.Income" "Region"
## [9] "Smoke." "Amount.Weekends"
## [11] "Amount.Weekdays" "Type"
## [13] "X" "X.1"
typeof(df)## [1] "list"
eapply(.GlobalEnv,typeof)## $df
## [1] "list"
sapply(df, class)## Sex Age Marital.Status
## "factor" "integer" "factor"
## Highest.Qualification Nationality Ethnicity
## "factor" "factor" "factor"
## Gross.Income Region Smoke.
## "factor" "factor" "factor"
## Amount.Weekends Amount.Weekdays Type
## "factor" "factor" "factor"
## X X.1
## "logical" "logical"
Categorical Variables: 1) Sex 2) Marital.Stauts 3) Highest.Qualification (Ordinal) 4) Nationality 5) Ethnicity 6) Region 7) Smoke 8) Type
Numerical variables: 1) Age (Discrete) 2) Gross.Income (Continuous) 3) Amount.Weekends (Discrete) 4) Amount.Weekdays (Discrete)
Population of interest = Children between 5 and 15 year old Sample size = 160 Children
No, the results of this study can’t be generalized due to small sample size. Assocation does not always mean causation.
No, the study is observational and all the samples are voluntarily participated created a bias. More experimental study is needed to conclude smoking causes dementia.
No, the statement is not justified. Sleep disorder might have some negative effect on the children, but it is not the only cause for bullying behavior in children.
This is a designed experimental study.
Treatment group = People exercise twice a week Control group = People who does not exercise
Yes, this study uses blocking. Age group is the blocking variable.
No, this study is not a blind study since the testing subjects knows what kind of treatment they are receiving.
Yes, this experimental study can be used to establish a casual relationship since the design of the experiment follows the sample-population concepts.
I would be relunctant to fund this project as it is now since the control and treated population will have several other influencing factors that might need large number of samples and population and many replications.
statScores <- c(57,66,69,71,72,73,74,77,78,78,79,79,81,81,82,83,83,88,89,94)
boxplot(statScores)This is right skewed since many houses over the price of 6 mil. Median and IQR would be the appropriate statistics.
This must be a symmetical distribution. Mean and standard deviation would provide a good measurement of the spread.
This would be right skewed due to most students don’t drink with the exception of a few. IQR and Median would be a good measurements.
This might be symmetric with slight right skew. Mean and median could be used.
hTr <- read.csv(url("https://raw.githubusercontent.com/vepark/Datasets/mushroom/heart_transplant.csv"))
mosaicplot(table(hTr$transplant,hTr$survived))boxplot(hTr$survtime ~ hTr$transplant) (b) Boxplot clearly shows that the treated subject survived longer than control group and therefore the treatment seems effective.
# percent control group died
(nrow(subset(hTr,transplant=="control" & survived=="dead")) / nrow(subset(hTr,transplant=="control")) ) * 100## [1] 88.23529
# percent treatment group died
(nrow(subset(hTr,transplant=="treatment" & survived=="dead")) / nrow(subset(hTr,transplant=="treatment")) ) * 100## [1] 65.21739
In control group ~23% died more than treatment group.
(d)-i The experiment test the hypothesis that transplant treatment helps to survive longer than non-tranplant. (d)-ii (d)-iii