textbooks <- read.csv("https://raw.githubusercontent.com/livelyjing/data/main/textbooks.csv")
Data is paired and it is in wide format
hist(textbooks$diff)
t.test(textbooks$ucla_new,textbooks$amaz_new,paired = TRUE)
##
## Paired t-test
##
## data: textbooks$ucla_new and textbooks$amaz_new
## t = 7.6488, df = 72, p-value = 6.928e-11
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 9.435636 16.087652
## sample estimates:
## mean difference
## 12.76164
t-statistic = 7.6488, df = 72, p-value = 6.928e-11
The p-value is less than our significance level of 0.05. This means that we can reject our null hypothesis and accept that there is a mean difference between price of Amazon and UCLA books.
We are 95% confident that our mean difference between UCLA and Amazon books is between $9.435636 and $16.087652.
Broadway.Show <- read.csv("C:/Users/maaudley-hinds/Downloads/Broadway Show.csv")
Broadway.Show$Type <-c("Play", "Musical")
Categorical Explanatory: Type Numerical Response: gross
boxplot(Gross ~ Type , data= Broadway.Show)
Musical - Mean: 562860.3 Median: 492718.5 St. Dev: 331235.2 Play - Mean: 533735.8 Median: 460753.0 St. Dev: 320264.7
The gross weekly earnings of musicals skew the data towards higher earnings compared to the weekly earnings of plays.
t.test(Broadway.Show$Gross~Broadway.Show$Type)
##
## Welch Two Sample t-test
##
## data: Broadway.Show$Gross by Broadway.Show$Type
## t = 1.9989, df = 1995.7, p-value = 0.04575
## alternative hypothesis: true difference in means between group Musical and group Play is not equal to 0
## 95 percent confidence interval:
## 550.3418 57698.5362
## sample estimates:
## mean in group Musical mean in group Play
## 562860.3 533735.8
Null Hypothesis: There is no true difference in gross weekly earnings for musicals and plays. Alternative Hypothesis: There is a true difference in gross weekly earnings for musicals and plays.
Since the p-value is lower than the significance level of 0.05, we can reject our null hypothesis, meaning that the mean earnings of musicals and plays are significantly different.
We are 95% confident that the mean of both musicals and plays are between 550.3418 and 533735.8.
The sample data called tomato can be loaded using R code.
boxplot(tomato$weight~tomato$field)
anova_result <- aov(weight~field, data = tomato)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## field 2 2092 1046.2 5.792 0.00807 **
## Residuals 27 4877 180.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value is less than the significance level of 0.05, so we know that there is a significant difference between the means of the three fields.
tukey_result <- TukeyHSD(anova_result)
print(tukey_result)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = weight ~ field, data = tomato)
##
## $field
## diff lwr upr p adj
## NoFertilize-ChemicalFertilizer -20.2 -35.102349 -5.297651 0.0063840
## OrganicFertilizer-ChemicalFertilizer -7.3 -22.202349 7.602349 0.4550443
## OrganicFertilizer-NoFertilize 12.9 -2.002349 27.802349 0.0993314
The means of the No Fertilizer Field and the Chemical Fertilizer field are the only ones that show a significant difference.
In the spring of 1846, a group of American pioneers set out for California However, they suffered a series of setbacks and did not arrive at the Sierra Nevada mountains until October. While crossing the mountains, they became trapped by an early snowfall, and had to spend the winter there.
Please load the dataset donner using the following R code. Our variables of interest are - Survived: 0 dead, 1 survived - sex: Male/Female
The aim was to assess whether there is an association between survival outcome and gender.
donner <- read.csv("https://raw.githubusercontent.com/livelyjing/data/main/donner.csv")
df = data.frame( "Survived" = donner$survived, "Sex" = donner$sex)
conTable = table(df)
print(conTable)
## Sex
## Survived Female Male
## 0 10 32
## 1 25 23
lm_sex <- lm(donner$survived~donner$sex)
summary(lm_sex)
##
## Call:
## lm(formula = donner$survived ~ donner$sex)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7143 -0.4182 0.2857 0.5078 0.5818
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.71429 0.08163 8.750 1.34e-13 ***
## donner$sexMale -0.29610 0.10442 -2.836 0.00567 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4829 on 88 degrees of freedom
## Multiple R-squared: 0.08372, Adjusted R-squared: 0.07331
## F-statistic: 8.04 on 1 and 88 DF, p-value: 0.005675
P-value is less than 0.05, so there is a correlation between gender and survival
anova_result <- aov(survived~sex, data = donner)
tukey_result <- TukeyHSD(anova_result)
print(tukey_result)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = survived ~ sex, data = donner)
##
## $sex
## diff lwr upr p adj
## Male-Female -0.2961039 -0.5036258 -0.088582 0.0056746
Men were shown to be less likely to survive compared to women.
You may load the dataset from the following R code:
The research question was to study the relationship between Metabolism and two predictor variables including gastric and gender.
plot(alcohol$Metabol,alcohol$Gastric)
lm_mlr_gastric <- lm(Gastric~Metabol, data = alcohol)
summary(lm_mlr_gastric)
##
## Call:
## lm(formula = Gastric ~ Metabol, data = alcohol)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.87592 -0.44090 -0.07588 0.26909 1.21381
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.13585 0.13260 8.566 1.48e-09 ***
## Metabol 0.30004 0.03719 8.067 5.27e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5505 on 30 degrees of freedom
## Multiple R-squared: 0.6845, Adjusted R-squared: 0.674
## F-statistic: 65.08 on 1 and 30 DF, p-value: 5.266e-09
lm_lmr_gender <- lm(Metabol~Sex, data = alcohol)
summary(lm_lmr_gender)
##
## Call:
## lm(formula = Metabol ~ Sex, data = alcohol)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8214 -1.0304 -0.4607 0.4250 8.1786
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.1000 0.5221 2.107 0.043601 *
## SexMale 3.0214 0.7894 3.828 0.000612 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.215 on 30 degrees of freedom
## Multiple R-squared: 0.3281, Adjusted R-squared: 0.3057
## F-statistic: 14.65 on 1 and 30 DF, p-value: 0.0006117
Write your predicted linear regression model. SST=SSE+SSReg
Interpret the model in term of the coefficient estimate by filling the blanks below.
We can conclude that every one unit increase of gastric alcohol dehydrogenase activity in the stomach (measured by \(\mu\)mol/min/g of tissue) is significantly associated with increased first-pass metabolism of alcohol in the stomach by -1.8271___ mmol/liter-hour, after adjusting for gender (p is less_ (greater/less) than 0.001).
The results from linear regression show that males have ______increased____ (increased/decreased) first-pass metabolism of alcohol in the stomach than females by 1.1_mmol/liter-hour, after adjusting for gastric alcohol dehydrogenase activity in the stomach (p value = 0.0006117____).
Using the same dataset as problem 5, the research question was to study whether the relationship between Metabolism and gastric differs by gender.
lm_lmr_gender <- lm(Gastric~Sex, data = alcohol)
summary(lm_lmr_gender)
##
## Call:
## lm(formula = Gastric ~ Sex, data = alcohol)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1643 -0.6750 -0.0500 0.4393 2.9357
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.5500 0.2143 7.233 4.73e-08 ***
## SexMale 0.7143 0.3240 2.205 0.0353 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9092 on 30 degrees of freedom
## Multiple R-squared: 0.1394, Adjusted R-squared: 0.1108
## F-statistic: 4.861 on 1 and 30 DF, p-value: 0.03528
Interpret your fitted model(whether the relationship between Metabolism and gastric differs by gender, if so, which gender has higher slope)
Predict a person’s Metabolism of alcohol given that this person is a female and Gastric alcohol dehydrogenase activity in the stomach is 1.4 \(\mu\)mol/min/g of tissue using your fitted model from previous question.