Homework 1: Split the plot region to include histograms on the margins of a scatter diagram using the Galton{HistData} data set.

## Load file

Create a 2*2 matrix and use layout function to set the widths and heights.

plot the histogram of parent and child, respectively.

set up the plot margin first, then plot the sunflowerplot of children by parents.

plot the histogram of parent (x axis) and child (y axis).

Homework 2: Age and Suicide by Country

Load data file

##         A25.34 A35.44 A45.54 A55.64 A65.74
## Canada      22     27     31     34     24
## Israel       9     19     10     14     27
## Japan       22     19     21     31     49
## Austria     29     40     52     53     69
## France      16     25     36     47     56
## Germany     28     35     41     49     52

Transform the data frame from wide to long format

## ─ Attaching packages ────────────────────────── tidyverse 1.3.0 ─
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ─ Conflicts ─────────────────────────── tidyverse_conflicts() ─
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
##   Country    Age Suicide
## 1 Austria A25.34      29
## 2 Austria A35.44      40
## 3 Austria A45.54      52
## 4 Austria A55.64      53
## 5 Austria A65.74      69
## 6  Canada A25.34      22

Change names of age

Note. Actually, I did not really know how this work, the code chunk was I refered from my classmate (Zhe Sun). Thanks him!

##   Country      Age Suicide
## 1 Austria 25 to 34      29
## 2 Austria 35 to 44      40
## 3 Austria 45 to 54      52
## 4 Austria 55 to 64      53
## 5 Austria 65 to 74      69
## 6  Canada 25 to 34      22

Plot the boxplot of suicide by age.

Homework 3: Histogram of IQ for each of 5 Classes with over 30 Pupil

Load data file

## 'data.frame':    2287 obs. of  6 variables:
##  $ lang : int  46 45 33 46 20 30 30 57 36 36 ...
##  $ IQ   : num  15 14.5 9.5 11 8 9.5 9.5 13 9.5 11 ...
##  $ class: Factor w/ 133 levels "180","280","1082",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ GS   : int  29 29 29 29 29 29 29 29 29 29 ...
##  $ SES  : int  23 10 15 23 10 10 23 10 13 15 ...
##  $ COMB : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...

find the class which over 30 pupils

## # A tibble: 5 x 2
##   class count
##   <fct> <int>
## 1 5480     31
## 2 15580    33
## 3 15980    31
## 4 16180    31
## 5 18380    31

select the class which over 30 pupils and spilt to 5 list for plot preparation.

##       lang             IQ            class          GS             SES    
##  Min.   :17.00   Min.   : 6.00   15580  :33   Min.   :32.00   Min.   :10  
##  1st Qu.:40.00   1st Qu.:11.00   5480   :31   1st Qu.:32.00   1st Qu.:23  
##  Median :45.00   Median :12.00   15980  :31   Median :33.00   Median :33  
##  Mean   :43.89   Mean   :12.16   16180  :31   Mean   :33.41   Mean   :32  
##  3rd Qu.:50.00   3rd Qu.:13.00   18380  :31   3rd Qu.:34.00   3rd Qu.:40  
##  Max.   :58.00   Max.   :18.00   180    : 0   Max.   :36.00   Max.   :50  
##                                  (Other): 0                               
##  COMB   
##  0:157  
##  1:  0  
##         
##         
##         
##         
## 

Homework 4: SAT and GPA scores in colleges

##           College SAT_No GPA_No SAT_Yes GPA_Yes
## 1         Barnard   1210   3.08    1317    3.30
## 2    Northwestern   1243   3.10    1333    3.24
## 3         Bowdoin   1200   2.85    1312    3.12
## 4           Colby   1220   2.90    1280    3.04
## 5 Carnegie Mellon   1237   2.70    1308    2.94
## 6    Georgia Tech   1233   2.62    1287    2.80

Plot a scatter plot

Homework 5: Free recall

Solution 1:

I don’t know why the lines or segments extend to wrong positions (e.g., from 10-2 to 15-2), even though I tried another classmates’ code.