You’ll see the first thing I did was make sure there was nothing in R’s memory (rm command). Then I put in all the libraries I would need. That’s just a stylistic preference.
# clear the decks
rm(list = ls())
# Libraries
library("ggplot2")
library("ggfortify") # required for autoplot
library("skimr")
library("dplyr") # needed for group_by
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
dioecious <- read.csv("~/Dropbox/Teaching/second_year_stats/r_sessions/1.ANOVAS_regressions/dioecious.csv")
dioecious$SEX<-as.factor(dioecious$SEX) #Needs to be a factor for later analysis.
skim(dioecious)
Name | dioecious |
Number of rows | 50 |
Number of columns | 3 |
_______________________ | |
Column type frequency: | |
factor | 1 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
SEX | 0 | 1 | FALSE | 2 | 2: 30, 1: 20 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
DBH | 0 | 1 | 191.6 | 71.21 | 68 | 140.50 | 188 | 242 | 316 | ▅▇▇▅▇ |
FLOWERS | 0 | 1 | 383.5 | 383.04 | 1 | 115.75 | 253 | 533 | 1482 | ▇▅▁▂▁ |
ggplot(dioecious, aes(x = SEX, y = FLOWERS)) +
geom_boxplot() +
theme_bw()
You can answer the MCQ by just looking at the boxplot plots, but I wanted to get exact figures (to put into MCQs). So I used the following code using pipes. Might come in handy.
dioecious %>%
group_by(SEX) %>%
summarize(median_flower= median(FLOWERS, na.rm = TRUE))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 2
## SEX median_flower
## <fct> <dbl>
## 1 1 292
## 2 2 230.
model_flowers<-lm(FLOWERS~SEX, data = dioecious)
sex_anova<-aov(model_flowers)
summary(sex_anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## SEX 1 171841 171841 1.175 0.284
## Residuals 48 7017255 146193
So we can say sex has no significant effect on the number of flowers (One way ANOVA F1,48 = 1.175, p = 0.284)
autoplot(model_flowers, colour = 'SEX')
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
# clear the decks
rm(list = ls())
# Libraries
library("ggplot2")
library("openintro") #Where I'm getting the data from
## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata
library("ggpubr")
library("ggfortify") # required for autoplot
summary(gpa_iq)
## obs gpa iq gender
## Min. : 1.00 Min. : 0.530 Min. : 72.0 Min. :1.000
## 1st Qu.:20.25 1st Qu.: 6.278 1st Qu.:103.0 1st Qu.:1.000
## Median :42.00 Median : 7.829 Median :110.0 Median :2.000
## Mean :42.97 Mean : 7.447 Mean :108.9 Mean :1.603
## 3rd Qu.:62.75 3rd Qu.: 8.983 3rd Qu.:117.5 3rd Qu.:2.000
## Max. :89.00 Max. :10.760 Max. :136.0 Max. :2.000
## concept
## Min. :20.00
## 1st Qu.:51.00
## Median :59.50
## Mean :56.96
## 3rd Qu.:66.00
## Max. :80.00
ggplot(gpa_iq, aes(x = iq, y = gpa)) +
geom_point() +
theme_bw()
It looks like the higher the student’s IQ the better they do on their GPA
model_iq <- lm(gpa_iq$gpa~gpa_iq$iq)
summary(model_iq)
##
## Call:
## lm(formula = gpa_iq$gpa ~ gpa_iq$iq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.3182 -0.5377 0.2178 1.0268 3.5785
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.55706 1.55176 -2.292 0.0247 *
## gpa_iq$iq 0.10102 0.01414 7.142 4.74e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.635 on 76 degrees of freedom
## Multiple R-squared: 0.4016, Adjusted R-squared: 0.3937
## F-statistic: 51.01 on 1 and 76 DF, p-value: 4.737e-10
The equation of the line would be y = -3.56 + 0.1x IQ scores affect GPA scores (R squared = 0.39, F1,76, = 51.01, p = 4.7 x 10^-10)
autoplot(model_iq)
ggscatter(gpa_iq, x = "iq", y = "gpa", add = "reg.line") +
stat_cor(label.y = 10,
aes(label = paste(..rr.label.., ..p.label.., sep = "~`,`~"))) +
stat_regline_equation(label.y = 12)
## `geom_smooth()` using formula 'y ~ x'