Load data
data("iris")
1.8 a) The data in each row of the matrix represents one resident and includes their age, marital status, gross income, if they are a smoker and how many cigarettes they smoke on the weekend and weekday.
b)1691 people participated in this survey
c)The sex variable is categorical but not ordinal. Age is a numerical variable Marital is categorical but not ordinal grossIncome is categorical and is ordinal Smoke is categorical but not ordinal amtWeekends and amtWeekdays are both categorical and ordinal
1.10 a) The population of interest are children between the ages of 5 and 15. The sample in this study are 160 who are within this age range.
1.28 a) Based in this study I think we can conclude that smoking causes dementia later in life. I believe there are other factors that are very important in this study, such as family history of this type of disease. If we only look at the 25% group with dementia 23 years later, I would want to know the family history of this group versus the family history of the other 75%. However, when you add the additional findings that compare the change in risk based on the number of packs smoked per day, this shows a 7% jump between those who smoked a pack a day versus those who smoked up to 2 packs a day. We also know that the risk doubled when the person exceeded 2 packs a day. This increase based on the amount smoked per day is why I feel we can conclude that smoking causes dementia.
1.36 a) This study is an experiment because the researcher is interacting with the participants.
The control group in this study is the group that has been instructed not to exercise and the treatment group are the individuals who have been instructed to exercise.
Yes this study uses blocking. Age range is the blocking variable.
I do not think this test makes use of blinding because it might be easy for the participants to figure out what the researchers are looking for. Even if not much information is given to the participant, if I were told to exercise or not to exercise, I would probably attempt to guess what the researchers are looking for.
I think it is possible to use this study to establish a relationship between exercise and mental health because you have a control and treatment group, and both groups with have a mental health exam before and after the study. However, I do not think this can be generalized with the population at large until more data points are added to this study. I would want to track anything that might be a stressor for the participant before, during and after the experiment. You might have someone who begins the study experiencing positive affects but then something completely unrelated to the study might happen and mess with the outcome.
To piggyback off my response to the previous question, I would want to add more data points to this study. I also think that this type of study has been done so many times that I would question the point of the study. While the study of mental health is important, what sets this study apart from similar studies in the past?
1.48
scores <- c(57, 66, 69, 71, 72, 73, 74, 77, 78, 78, 79, 79, 81, 81, 82, 83, 83, 88, 89, 94)
summary(scores)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 57.00 72.75 78.50 77.70 82.25 94.00
boxplot(scores)
1.5 Histogram A matches boxplot 2. Histogram B matches boxplot 3 Histogram C matches boxplot 1
1.56 a) This data is left skewed since most of the houses are below $1,000,000 and only a few are $6,000,000. The median would be more meaningful than the mean. The homes that are $6,000,000 may skew the mean since 75% of the data is below $1,000,000.
percentage <- c(25, 50, 75)
home_Price <- c(350000, 450000, 1000000)
homes <- data.frame(percentage, home_Price)
barplot(homes$percentage, ylab = "Percentage", xlab = "Home Price")
b.percentage <- c(25, 50, 75)
b.home_Price <- c(300000, 600000, 900000)
b.homes <- data.frame(b.percentage, b.home_Price)
barplot(b.homes$b.percentage, ylab = "Percentage", xlab = "Home Price")
This data will probably be right skewed since the data will come from students who are 21 or older. I would break the population of students up to 2 groups. One below the legal drinking age and one above. I would then take the average of the second group.
This data might be symmetric if most people earn around the same amount of money. I would also break up this data by position and find the average of each group so that the number is more accurate.
1.7 a) Based on the mosaic plot, I believe that survival is somewhat dependent on whether or not a patient got a transplant. Medical data is tricky because there are so many other factors that have to be considered, and you cannot guarantee that there will not be any complications after a major surgery, such as a transplant. However, I feel there is enough of a difference between the living control and the living treatment group to say that a transplant does play a role in the participant’s survival.
The boxplot shows a big difference between the survival time for those in the control and those in the treatment. This data may prove the effectiveness of the transplant better than the mosaic plot.
It looks like around 70% of the treatment group died and around 85-90% of the control group died.
d-i) The claim being tested is that the heart transplant increases lifespan. d-ii) patient patient 100 100 ? ?
d-iii) the simulation results suggest that there is some difference between the proportions and that the treatment does have an effect on the patients receiving the treatment.