Hello

library(readr)
data <- read_csv("data.csv")
library(foreign) 
library(dplyr)
library(lubridate)
library(ggplot2)
library(knitr)
library(magrittr)
library(psych)
library(lsr)
library(vcd)
library(sjPlot)
library(snakecase)
library(corrplot)
library(RColorBrewer)
t1 <- data %>% select(gndr, weight, marsts)
t1 <- na.omit(t1)

task 1.

What are the mean weights of men and women in the sample? Is the mean weight of females different from their median weight?

describeBy(data$weight, data$gndr, mat=T)

Mean weight of men in the sample: 84.46 Mean weight of women in the sample: 68.9. Female’s mean weight is really close to it’s median, which is 67.

task 2.

Draw a boxplot showing the weight of men and women (as separate groups) by their marital status. Do they all have the same average weight?

ggplot(t1, aes(x = gndr, y = weight)) +
  geom_boxplot() +
  facet_wrap(~marsts) +
  theme_bw()

#3. In which marital group do men have the highest weight? Is this a statistically significant difference?

On the graphs above we may see, that boxplot for legally divorced male is higher than others. Let’s check it with the test.

Male <- data %>% filter (gndr == "Male")
Female <- data %>% filter (gndr == "Female")
describeBy(Male$weight, Male$marsts, mat=T)
oneway.test(Male$weight ~ Male$marsts)
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  Male$weight and Male$marsts
## F = 6.6885, num df = 3.000, denom df = 85.321, p-value = 0.0004146
aov.out <- aov(Male$weight ~ Male$marsts)
pairwise.t.test(x = Male$weight, g = Male$marsts)
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  Male$weight and Male$marsts 
## 
##                                                                    Legally divorced/civil union dissolved
## Legally married                                                    0.87                                  
## None of these (NEVER married or in legally registered civil union) 5.4e-05                               
## Widowed/civil partner died                                         0.91                                  
##                                                                    Legally married
## Legally married                                                    -              
## None of these (NEVER married or in legally registered civil union) 0.91           
## Widowed/civil partner died                                         0.91           
##                                                                    None of these (NEVER married or in legally registered civil union)
## Legally married                                                    -                                                                 
## None of these (NEVER married or in legally registered civil union) -                                                                 
## Widowed/civil partner died                                         0.18                                                              
## 
## P value adjustment method: holm

P-value is small - the difference is statistically significant!

par(mar = c(2, 25, 5, 5))
plot(TukeyHSD(aov.out), las = 2)

The highest weight have those, who never were married or in legally registeres civil union.

task 4.

What is the difference between the average weight of men and women? How much is it in kg? Is this a statistically significant difference?

t.test(t1$weight ~ t1$gndr, var.equal = T)
## 
##  Two Sample t-test
## 
## data:  t1$weight by t1$gndr
## t = -18.654, df = 1335, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -16.27564 -13.17818
## sample estimates:
## mean in group Female   mean in group Male 
##             66.96598             81.69289

P-value is less than 0.05 -> difference between the average weight of men and women is statistically significant. And the it is equal to:

dif = 81.69289 - 66.9659
dif
## [1] 14.72699

task 5.

What is the correlation between the age and weight of women? Is it large, medium, or small? Positive or negative?

cor.test(Female$agea, Female$weight)
## 
##  Pearson's product-moment correlation
## 
## data:  Female$agea and Female$weight
## t = 5.9213, df = 1423, p-value = 3.997e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1039803 0.2053468
## sample estimates:
##       cor 
## 0.1550717

P-value is less that 0.05 - it’s good (as always) Correalation is positive and quite small (0.15) -> the higher the age, the higher the weight.

task 6.

Linear regression Report the overall model fit (R-squared), interpret the coefficients, and draw a plot for the interaction effect.

Can the weight of respondents be predicted with their age, gender, and years of education?

model_ad = lm(weight ~ agea + gndr + eduyrs, data = data)
summary(model_ad)
## 
## Call:
## lm(formula = weight ~ agea + gndr + eduyrs, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -43.907  -9.671  -1.834   7.425  66.853 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 61.62976    1.41910  43.429   <2e-16 ***
## agea         0.12553    0.01405   8.935   <2e-16 ***
## gndrMale    15.56171    0.51625  30.144   <2e-16 ***
## eduyrs       0.06996    0.07989   0.876    0.381    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.94 on 2933 degrees of freedom
##   (8 observations deleted due to missingness)
## Multiple R-squared:  0.2535, Adjusted R-squared:  0.2527 
## F-statistic:   332 on 3 and 2933 DF,  p-value: < 2.2e-16

In this model, we can see that p-value is less than 0.05 meaning that our model is better than having no model. Adjusted R-squared value is equal to 0.25 meaning that the model explains about 25% of variance of the predicted variable.

The equation linear model is the following: \[ Weight = 61.63 + 0.13 * Age + 15.56 * IfMale + 0.07 * EducationYears \] Which means that * when age and years of education are equal to 0 and the preson is female their weight is equal to 61.63. Meanwhile, being 1 year older, person has 0.13 kg more, with every additional year of education weight increases by 0.07, and if person is male, their weight increases by 15.56 *

Is education related to weight in the same way or in different ways for men and women?

model_int = lm(weight ~ eduyrs * gndr, data = data)
summary(model_int)
## 
## Call:
## lm(formula = weight ~ eduyrs * gndr, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -40.699  -9.699  -1.699   7.352  66.991 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      74.5674     1.6209  46.002  < 2e-16 ***
## eduyrs           -0.4099     0.1141  -3.593 0.000333 ***
## gndrMale          4.3994     2.3127   1.902 0.057237 .  
## eduyrs:gndrMale   0.7921     0.1599   4.952 7.74e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.05 on 2941 degrees of freedom
## Multiple R-squared:  0.2395, Adjusted R-squared:  0.2387 
## F-statistic: 308.8 on 3 and 2941 DF,  p-value: < 2.2e-16

In this model, we can see that p-value is less than 0.05 meaning that our model is better than having no model. Adjusted R-squared value is equal to 0.24 meaning that the model explains about 24% of variance of the predicted variable.

Here are linear model equations for both men and female:

\[ Weight(Male) = 74.57 - 0.4 ??? EducationYears + 4.4 * IfMale + 0.79 ??? EducationYears * IfMale\]

\[ Weight(Female) = 474.57 - 0.4 ??? EducationYears \] In other words, if predictor is 50 (any random value) and if it is moderator:yes, it’s outcome will be equal to: подставляем значения в уравнения

\[ 44.7+0.31???50+53.5???0.34???50=96.7 \]

… and if it is moderator:no, it’s outcome will be equal to:

\[ 44.7+0.31???50=60.2 \]

tab_model(model_ad, model_int)
  weight weight
Predictors Estimates CI p Estimates CI p
(Intercept) 61.63 58.85 – 64.41 <0.001 74.57 71.39 – 77.74 <0.001
agea 0.13 0.10 – 0.15 <0.001
gndrMale 15.56 14.55 – 16.57 <0.001 4.40 -0.13 – 8.93 0.057
eduyrs 0.07 -0.09 – 0.23 0.381 -0.41 -0.63 – -0.19 <0.001
eduyrs:gndrMale 0.79 0.48 – 1.11 <0.001
Observations 2937 2945
R2 / R2 adjusted 0.254 / 0.253 0.240 / 0.239
plot_model(model_int, type="int")

#7. What will this code do? Name the output.

ggplot(data = possum, aes(y = totalL, x = tailL)) + geom_point(alpha = 0.5) + scale_x_continuous(“Length of Possum Tail (cm)”) + scale_y_continuous(“Length of Possum Body (cm)”) ###