Set up

Replace the births data set by gifted. The definition of variables in the gifted data set can be found in the openintro package manual.

# Load packages
library(openintro) #for the use of email50 and county data
library(dplyr) #for the use of dplyr functions such as mutate
library(ggplot2) #for use of ggplot2 functions such ggplot()

# Load data
data(gifted)

# View its structure
str(gifted)
## 'data.frame':    36 obs. of  8 variables:
##  $ score   : int  159 164 154 157 156 150 155 161 163 162 ...
##  $ fatheriq: int  115 117 115 113 110 113 118 117 111 122 ...
##  $ motheriq: int  117 113 118 131 109 109 119 120 128 120 ...
##  $ speak   : int  18 20 20 12 17 13 19 18 22 18 ...
##  $ count   : int  26 37 32 24 34 28 24 32 28 27 ...
##  $ read    : num  1.9 2.5 2.2 1.7 2.2 1.9 1.8 2.3 2.1 2.1 ...
##  $ edutv   : num  3 1.75 2.75 2.75 2.25 1.25 2 2.25 1 2.25 ...
##  $ cartoons: num  2 3.25 2.5 2.25 2.5 3.75 3 2.5 4 2.75 ...

Q1. Visualize with two variables

Create a scatterplot to investigate the relationship between:

  1. the analytical skills of young gifted children and
  2. the average number of hours per week the child’s mother or father reads to the child.

Describe the relationship between the two variables.

According to the scatterplot, the more the parent reads to the child, the higher the child scores in the test of analytical skills.

ggplot(data = gifted, aes(x = score, y = read)) +
  geom_point()

Q2. Descretize

Suppose that you want to investigate whether the relationship we found above varies by father’s IQ. Create a new categorical variable, fatheriq_cat, and assign “”below average“, or”at or above average“.

fatheriq_cat <- mean(gifted$fatheriq, na.rm = TRUE)

gifted <- gifted %>%
  mutate(fatheriq_cat = ifelse(fatheriq < fatheriq_cat, "below average", "at or above average"))
head(gifted)
##   score fatheriq motheriq speak count read edutv cartoons
## 1   159      115      117    18    26  1.9  3.00     2.00
## 2   164      117      113    20    37  2.5  1.75     3.25
## 3   154      115      118    20    32  2.2  2.75     2.50
## 4   157      113      131    12    24  1.7  2.75     2.25
## 5   156      110      109    17    34  2.2  2.25     2.50
## 6   150      113      109    13    28  1.9  1.25     3.75
##          fatheriq_cat
## 1 at or above average
## 2 at or above average
## 3 at or above average
## 4       below average
## 5       below average
## 6       below average

Q3. Visualize with three variables

Add the third variable, fatheriq_cat, to the scatterplot you created in Q1. Does the relationship you found in Q1 vary by mother’s IQ?

The relationship in Q1 does not seem to vary by mothers IQ.

ggplot(data = gifted, aes(x = score, y = read, color = fatheriq_cat)) +
  geom_point(show.legend = FALSE) +
  facet_wrap(~ fatheriq_cat)

Q4. Filter

You are only interested in gifted children with analytical skills greater than 100. Fiter the data. How many such children are there?

There are 36 children with analytical skills greater than 100.

gifted_above <- gifted %>%
  filter(score > 100)

str(gifted_above)
## 'data.frame':    36 obs. of  9 variables:
##  $ score       : int  159 164 154 157 156 150 155 161 163 162 ...
##  $ fatheriq    : int  115 117 115 113 110 113 118 117 111 122 ...
##  $ motheriq    : int  117 113 118 131 109 109 119 120 128 120 ...
##  $ speak       : int  18 20 20 12 17 13 19 18 22 18 ...
##  $ count       : int  26 37 32 24 34 28 24 32 28 27 ...
##  $ read        : num  1.9 2.5 2.2 1.7 2.2 1.9 1.8 2.3 2.1 2.1 ...
##  $ edutv       : num  3 1.75 2.75 2.75 2.25 1.25 2 2.25 1 2.25 ...
##  $ cartoons    : num  2 3.25 2.5 2.25 2.5 3.75 3 2.5 4 2.75 ...
##  $ fatheriq_cat: chr  "at or above average" "at or above average" "at or above average" "below average" ...