dioecious.R

Handout for week 1 practical

ANOVA script

You’ll see the first thing I did was make sure there was nothing in R’s memory (rm command). Then I put in all the libraries I would need. That’s just a stylistic preference.

# clear the decks
rm(list = ls())

# Libraries
library("ggplot2")
library("ggfortify") # required for autoplot
library("skimr")
library("dplyr") # needed for group_by

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

dioecious <- read.csv("~/Dropbox/Teaching/second_year_stats/r_sessions/1.ANOVAS_regressions/dioecious.csv")
dioecious$SEX<-as.factor(dioecious$SEX) #Needs to be a factor for later analysis.


skim(dioecious)

Data summary
Name	dioecious
Number of rows	50
Number of columns	3
_______________________
Column type frequency:
factor	1
numeric	2
________________________
Group variables	None

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
SEX	0	1	FALSE	2	2: 30, 1: 20

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
DBH	0	1	191.6	71.21	68	140.50	188	242	316	▅▇▇▅▇
FLOWERS	0	1	383.5	383.04	1	115.75	253	533	1482	▇▅▁▂▁

ggplot(dioecious, aes(x = SEX, y = FLOWERS)) +
  geom_boxplot() +
  theme_bw()

You can answer the MCQ by just looking at the boxplot plots, but I wanted to get exact figures (to put into MCQs). So I used the following code using pipes. Might come in handy.

dioecious %>%
  group_by(SEX) %>%
  summarize(median_flower= median(FLOWERS, na.rm = TRUE))

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 2 x 2
##   SEX   median_flower
##   <fct>         <dbl>
## 1 1              292 
## 2 2              230.

Carrying out the ANOVA

model_flowers<-lm(FLOWERS~SEX, data = dioecious)
sex_anova<-aov(model_flowers)
summary(sex_anova)

##             Df  Sum Sq Mean Sq F value Pr(>F)
## SEX          1  171841  171841   1.175  0.284
## Residuals   48 7017255  146193

So we can say sex has no significant effect on the number of flowers (One way ANOVA F1,48 = 1.175, p = 0.284)

autoplot(model_flowers, colour = 'SEX')

## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

Regression script

# clear the decks
rm(list = ls())

# Libraries
library("ggplot2")
library("openintro") #Where I'm getting the data from

## Loading required package: airports

## Loading required package: cherryblossom

## Loading required package: usdata

library("ggpubr")
library("ggfortify") # required for autoplot

summary(gpa_iq)

##       obs             gpa               iq            gender     
##  Min.   : 1.00   Min.   : 0.530   Min.   : 72.0   Min.   :1.000  
##  1st Qu.:20.25   1st Qu.: 6.278   1st Qu.:103.0   1st Qu.:1.000  
##  Median :42.00   Median : 7.829   Median :110.0   Median :2.000  
##  Mean   :42.97   Mean   : 7.447   Mean   :108.9   Mean   :1.603  
##  3rd Qu.:62.75   3rd Qu.: 8.983   3rd Qu.:117.5   3rd Qu.:2.000  
##  Max.   :89.00   Max.   :10.760   Max.   :136.0   Max.   :2.000  
##     concept     
##  Min.   :20.00  
##  1st Qu.:51.00  
##  Median :59.50  
##  Mean   :56.96  
##  3rd Qu.:66.00  
##  Max.   :80.00

ggplot(gpa_iq, aes(x = iq, y = gpa)) +
  geom_point() +
  theme_bw()

It looks like the higher the student’s IQ the better they do on their GPA

model_iq <- lm(gpa_iq$gpa~gpa_iq$iq)
summary(model_iq)

## 
## Call:
## lm(formula = gpa_iq$gpa ~ gpa_iq$iq)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3182 -0.5377  0.2178  1.0268  3.5785 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.55706    1.55176  -2.292   0.0247 *  
## gpa_iq$iq    0.10102    0.01414   7.142 4.74e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.635 on 76 degrees of freedom
## Multiple R-squared:  0.4016, Adjusted R-squared:  0.3937 
## F-statistic: 51.01 on 1 and 76 DF,  p-value: 4.737e-10

The equation of the line would be y = -3.56 + 0.1x IQ scores affect GPA scores (R squared = 0.39, F1,76, = 51.01, p = 4.7 x 10^-10)

autoplot(model_iq)

ggscatter(gpa_iq, x = "iq", y = "gpa", add = "reg.line") +
  stat_cor(label.y = 10, 
           aes(label = paste(..rr.label.., ..p.label.., sep = "~`,`~"))) +
  stat_regline_equation(label.y = 12)

## `geom_smooth()` using formula 'y ~ x'

dioecious.R

emb3

2021-03-05

Handout for week 1 practical

ANOVA script

Carrying out the ANOVA

Regression script