Set up

Replace gifted data by gifted.

# Load packages
library(openintro) #for the use of email50 and county data
library(dplyr)     #for the use of dplyr functions such as mutate
library(ggplot2) #for use of ggplot2 functions such ggplot()

# Load data
data(gifted )

# View its structure
str(gifted )
## 'data.frame':    36 obs. of  8 variables:
##  $ score   : int  159 164 154 157 156 150 155 161 163 162 ...
##  $ fatheriq: int  115 117 115 113 110 113 118 117 111 122 ...
##  $ motheriq: int  117 113 118 131 109 109 119 120 128 120 ...
##  $ speak   : int  18 20 20 12 17 13 19 18 22 18 ...
##  $ count   : int  26 37 32 24 34 28 24 32 28 27 ...
##  $ read    : num  1.9 2.5 2.2 1.7 2.2 1.9 1.8 2.3 2.1 2.1 ...
##  $ edutv   : num  3 1.75 2.75 2.75 2.25 1.25 2 2.25 1 2.25 ...
##  $ cartoons: num  2 3.25 2.5 2.25 2.5 3.75 3 2.5 4 2.75 ...

Q1. Visualize with two variables

Create a scatterplot to investigate the relationship between:

  1. the analytical skills of young gifted children and
  2. the average number of hours per week the child’s mother or father reads to the child.

Describe the relationship between the two variables. The more the kids read the better scores they recieve.

ggplot(data = gifted, aes(x = score, y = read)) +
  geom_point()

Q2. Descretize

Suppose that you want to investigate whether the relationship we found above varies by mother’s IQ. Create a new categorical variable, motheriq_cat, and assign “”below average“, or”at or above average“.

avg_motheriq <- mean(gifted$motheriq, na.rm = TRUE)
avg_motheriq
## [1] 118.1667

gifted <- gifted %>%
  mutate(motheriq_cat = ifelse(motheriq < avg_motheriq, "below average", "at or above average"))
head(gifted)
##   score fatheriq motheriq speak count read edutv cartoons
## 1   159      115      117    18    26  1.9  3.00     2.00
## 2   164      117      113    20    37  2.5  1.75     3.25
## 3   154      115      118    20    32  2.2  2.75     2.50
## 4   157      113      131    12    24  1.7  2.75     2.25
## 5   156      110      109    17    34  2.2  2.25     2.50
## 6   150      113      109    13    28  1.9  1.25     3.75
##          motheriq_cat
## 1       below average
## 2       below average
## 3       below average
## 4 at or above average
## 5       below average
## 6       below average

Q3. Visualize with three variables

Add the third variable, motheriq_cat, to the scatterplot you created in Q1. Does the relationship you found in Q1 vary by mother’s IQ? The relationship that I found in Q1 varies by mother’s IQ because when the child has a mother with a high IQ, resulted in the children reading more

ggplot(data = gifted, aes(x = score, y = read, color = motheriq_cat)) +
  geom_point(show.legend = FALSE) +
  facet_wrap(~ motheriq_cat)

Q4. Filter

You are only interested in gifted children with analytical skills greater than 150. Fiter the data. How many such children are there? 35

gifted_above <- gifted %>%
  filter(score > 150) 

str(gifted_above)
## 'data.frame':    35 obs. of  9 variables:
##  $ score       : int  159 164 154 157 156 155 161 163 162 154 ...
##  $ fatheriq    : int  115 117 115 113 110 118 117 111 122 111 ...
##  $ motheriq    : int  117 113 118 131 109 119 120 128 120 117 ...
##  $ speak       : int  18 20 20 12 17 19 18 22 18 19 ...
##  $ count       : int  26 37 32 24 34 24 32 28 27 32 ...
##  $ read        : num  1.9 2.5 2.2 1.7 2.2 1.8 2.3 2.1 2.1 2.2 ...
##  $ edutv       : num  3 1.75 2.75 2.75 2.25 2 2.25 1 2.25 1.75 ...
##  $ cartoons    : num  2 3.25 2.5 2.25 2.5 3 2.5 4 2.75 3.75 ...
##  $ motheriq_cat: chr  "below average" "below average" "below average" "at or above average" ...