Replace the births data set by gifted. The definition of variables in the gifted data set can be found in the openintro package manual.
# Load packages
library(openintro) #for the use of email50 and county data
library(dplyr) #for the use of dplyr functions such as mutate
library(ggplot2) #for use of ggplot2 functions such ggplot()
# Load data
data(gifted)
# View its structure
str(gifted)
## 'data.frame': 36 obs. of 8 variables:
## $ score : int 159 164 154 157 156 150 155 161 163 162 ...
## $ fatheriq: int 115 117 115 113 110 113 118 117 111 122 ...
## $ motheriq: int 117 113 118 131 109 109 119 120 128 120 ...
## $ speak : int 18 20 20 12 17 13 19 18 22 18 ...
## $ count : int 26 37 32 24 34 28 24 32 28 27 ...
## $ read : num 1.9 2.5 2.2 1.7 2.2 1.9 1.8 2.3 2.1 2.1 ...
## $ edutv : num 3 1.75 2.75 2.75 2.25 1.25 2 2.25 1 2.25 ...
## $ cartoons: num 2 3.25 2.5 2.25 2.5 3.75 3 2.5 4 2.75 ...
Create a scatterplot to investigate the relationship between:
Describe the relationship between the two variables.
According to the scatterplot, the more the parent reads to the child, the higher the child scores in the test of analytical skills.
ggplot(data = gifted, aes(x = score, y = read)) +
geom_point()
Suppose that you want to investigate whether the relationship we found above varies by father’s IQ. Create a new categorical variable, fatheriq_cat, and assign “”below average“, or”at or above average“.
fatheriq_cat <- mean(gifted$fatheriq, na.rm = TRUE)
gifted <- gifted %>%
mutate(fatheriq_cat = ifelse(fatheriq < fatheriq_cat, "below average", "at or above average"))
head(gifted)
## score fatheriq motheriq speak count read edutv cartoons
## 1 159 115 117 18 26 1.9 3.00 2.00
## 2 164 117 113 20 37 2.5 1.75 3.25
## 3 154 115 118 20 32 2.2 2.75 2.50
## 4 157 113 131 12 24 1.7 2.75 2.25
## 5 156 110 109 17 34 2.2 2.25 2.50
## 6 150 113 109 13 28 1.9 1.25 3.75
## fatheriq_cat
## 1 at or above average
## 2 at or above average
## 3 at or above average
## 4 below average
## 5 below average
## 6 below average
Add the third variable, fatheriq_cat, to the scatterplot you created in Q1. Does the relationship you found in Q1 vary by mother’s IQ?
The relationship in Q1 does not seem to vary by mothers IQ.
ggplot(data = gifted, aes(x = score, y = read, color = fatheriq_cat)) +
geom_point(show.legend = FALSE) +
facet_wrap(~ fatheriq_cat)
You are only interested in gifted children with analytical skills greater than 100. Fiter the data. How many such children are there?
There are 36 children with analytical skills greater than 100.
gifted_above <- gifted %>%
filter(score > 100)
str(gifted_above)
## 'data.frame': 36 obs. of 9 variables:
## $ score : int 159 164 154 157 156 150 155 161 163 162 ...
## $ fatheriq : int 115 117 115 113 110 113 118 117 111 122 ...
## $ motheriq : int 117 113 118 131 109 109 119 120 128 120 ...
## $ speak : int 18 20 20 12 17 13 19 18 22 18 ...
## $ count : int 26 37 32 24 34 28 24 32 28 27 ...
## $ read : num 1.9 2.5 2.2 1.7 2.2 1.9 1.8 2.3 2.1 2.1 ...
## $ edutv : num 3 1.75 2.75 2.75 2.25 1.25 2 2.25 1 2.25 ...
## $ cartoons : num 2 3.25 2.5 2.25 2.5 3.75 3 2.5 4 2.75 ...
## $ fatheriq_cat: chr "at or above average" "at or above average" "at or above average" "below average" ...