A survey was conducted to study the smoking habits of UK residents. Below is a data matrix displaying a portion of the data collected in this survey. Note that “£” stands for British Pounds Sterling, “cig” stands for cigarettes, and “N/A” refers to a missing component of the data
if (!require("openintro")) install.packages('openintro')
## Loading required package: openintro
## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following objects are masked from 'package:datasets':
##
## cars, trees
#data(smoking)
#str(smoking)
#head(smoking)
#summary(smoking)
#dim(smoking)
#summary(smoking[,"age"])
#smoking
summary(smoking)
## gender age maritalStatus highestQualification
## Female:965 Min. :16.00 Divorced :161 No Qualification :586
## Male :726 1st Qu.:34.00 Married :812 GCSE/O Level :308
## Median :48.00 Separated: 68 Degree :262
## Mean :49.84 Single :427 Other/Sub Degree :127
## 3rd Qu.:65.50 Widowed :223 Higher/Sub Degree:125
## Max. :97.00 A Levels :105
## (Other) :178
## nationality ethnicity grossIncome
## English :833 Asian : 41 5,200 to 10,400 :396
## British :538 Black : 34 10,400 to 15,600:268
## Scottish:142 Chinese: 27 2,600 to 5,200 :257
## Other : 71 Mixed : 14 15,600 to 20,800:188
## Welsh : 66 Refused: 13 20,800 to 28,600:155
## Irish : 23 Unknown: 2 Under 2,600 :133
## (Other) : 18 White :1560 (Other) :294
## region smoke amtWeekends amtWeekdays
## London :182 No :1270 Min. : 0.00 Min. : 0.00
## Midlands & East Anglia:443 Yes: 421 1st Qu.:10.00 1st Qu.: 7.00
## Scotland :148 Median :15.00 Median :12.00
## South East :252 Mean :16.41 Mean :13.75
## South West :157 3rd Qu.:20.00 3rd Qu.:20.00
## The North :426 Max. :60.00 Max. :55.00
## Wales : 83 NA's :1270 NA's :1270
## type
## :1270
## Both/Mainly Hand-Rolled: 10
## Both/Mainly Packets : 42
## Hand-Rolled : 72
## Packets : 297
##
##
Each row represent how the smoking is distirbuted in the communites in UK . Following variables or attributes were taken in to consideration
dim(smoking)
## [1] 1691 12
gender is nominal categorical variable
age is distrete numerical variable
maritalStatus is nominal categorical variable
highestQualification is ordinal categorical variable
nationality is nominal categorical variable
ethnicity is nominal categorical variable
grossIncome is ordinal categorical variable
region is nominal categorical variable
smoke is nominal categorical variable
amtWeekends is ordinal categorical variable
amtWeekdays is nominal categorical variable
type is nominal categorical variable
Exercise 1.5 introduces a study where researchers studying the relationship between honesty, age, and self-control conducted an experiment on 160 children between the ages of 5 and 15. The researchers asked each child to toss a fair coin in private and to record the outcome (white or black) on a paper sheet, and said they would only reward children who report white. Half the students were explicitly told not to cheat and the others were not given any explicit instructions. Di???erences were observed in the cheating rates in the instruction and no instruction groups, as well as some di???erences across children’s characteristics within each group.
sample -160 children. population - 5 to 15 years old
The outcome of the experiment is hard to genarileze due to following reasons.
1- Sample size is small,
2- Not distributed properly in the population,
3- All Explanatory variables are not properly monitored or identify to consider the response variables
reasoning.
FinScor <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
summary(FinScor)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 57.00 72.75 78.50 77.70 82.25 94.00
boxplot(FinScor)
a.The match box plot number is 2. The distribution is unimodel which has one peak. And the histogram is symmetric.
b.The match box plot number is 3. The distribution is multimodel which has many peak. And the histogram is symmetric.
c.The match box plot number is 1. The distribution is unimodel which has one peak. And the histogram is right skewed.
heatTP <- read.csv("https://raw.githubusercontent.com/jbryer/DATA606Fall2016/master/Data/Data%20from%20openintro.org/Ch%201%20Exercise%20Data/heartTr.csv")
head(heatTP)
## id acceptyear age survived survtime prior transplant wait
## 1 15 68 53 dead 1 no control NA
## 2 43 70 43 dead 2 no control NA
## 3 61 71 52 dead 2 no control NA
## 4 75 72 52 dead 2 no control NA
## 5 6 68 54 dead 3 no control NA
## 6 42 70 36 dead 3 no control NA
summary(heatTP)
## id acceptyear age survived
## Min. : 1.0 Min. :67.00 Min. : 8.00 alive:28
## 1st Qu.: 26.5 1st Qu.:69.00 1st Qu.:41.00 dead :75
## Median : 49.0 Median :71.00 Median :47.00
## Mean : 51.4 Mean :70.62 Mean :44.64
## 3rd Qu.: 77.5 3rd Qu.:72.00 3rd Qu.:52.00
## Max. :103.0 Max. :74.00 Max. :64.00
##
## survtime prior transplant wait
## Min. : 1.0 no :91 control :34 Min. : 1.00
## 1st Qu.: 33.5 yes:12 treatment:69 1st Qu.: 10.00
## Median : 90.0 Median : 26.00
## Mean : 310.2 Mean : 38.42
## 3rd Qu.: 412.0 3rd Qu.: 46.00
## Max. :1799.0 Max. :310.00
## NA's :34
mosaicplot(table(heatTP$transplant,heatTP$survived))
#### (b) What do the box plots suggest about the efficacy (effctiveness) of the heart transplant treatment.
percent <- function(x, digits = 2, format = "f", ...) {
paste0(formatC(100 * x, format = format, digits = digits, ...), "%")
}
#-----------------------------
ded_por <- (75/(34+69))
percent(ded_por)
## [1] "72.82%"
Tret_ded_por <- (45/(69))
percent(Tret_ded_por)
## [1] "65.22%"
Cont_ded_por <- (30/(35))
percent(Cont_ded_por)
## [1] "85.71%"
Whether the trasplant is successful or not.
statistical software. Fill in the blanks with a number or phrase, whichever is appropriate. We write alive on 28 ??? cards representing patients who were alive at the end of the study, and dead on 75??? cards representing patients who were not. Then, we shfflee these cards and split them into two groups: one group of size 69 treatment Treatment, and another group of size 34 represent Control. We calculate the dfference between the proportion of dead cards in the treatment and control groups (treatment - control) and record this value. We repeat this 100 times to build a distribution centered at mean ??? 0 .Lastly, we calculate the fraction of simulations where the simulated diffrences in proportions are low. If this fraction is low, we conclude that it is unlikely to have observed such an outcome by chance and that the null hypothesis should be rejected in favor of the alternative.
iii.What do the simulation results shown below suggest about the effectiveness of the transplant program?
This is very heavily emphasized text.