Exercise 1: Math score and Read score

Load data file and set random seed

set.seed(123)
library(Ecdat)

## Loading required package: Ecfun

## 
## Attaching package: 'Ecfun'

## The following object is masked from 'package:base':
## 
##     sign

## 
## Attaching package: 'Ecdat'

## The following object is masked from 'package:datasets':
## 
##     Orange

dta <- Ecdat::Caschool
head(dta)

##   distcod  county                        district grspan enrltot teachers
## 1   75119 Alameda              Sunol Glen Unified  KK-08     195    10.90
## 2   61499   Butte            Manzanita Elementary  KK-08     240    11.15
## 3   61549   Butte     Thermalito Union Elementary  KK-08    1550    82.90
## 4   61457   Butte Golden Feather Union Elementary  KK-08     243    14.00
## 5   61523   Butte        Palermo Union Elementary  KK-08    1335    71.50
## 6   62042  Fresno         Burrel Union Elementary  KK-08     137     6.40
##   calwpct mealpct computer testscr   compstu  expnstu      str    avginc
## 1  0.5102  2.0408       67  690.80 0.3435898 6384.911 17.88991 22.690001
## 2 15.4167 47.9167      101  661.20 0.4208333 5099.381 21.52466  9.824000
## 3 55.0323 76.3226      169  643.60 0.1090323 5501.955 18.69723  8.978000
## 4 36.4754 77.0492       85  647.70 0.3497942 7101.831 17.35714  8.978000
## 5 33.1086 78.4270      171  640.85 0.1280899 5235.988 18.67133  9.080333
## 6 12.3188 86.9565       25  605.55 0.1824818 5580.147 21.40625 10.415000
##       elpct readscr mathscr
## 1  0.000000   691.6   690.0
## 2  4.583333   660.5   661.9
## 3 30.000002   636.3   650.9
## 4  0.000000   651.9   643.5
## 5 13.857677   641.8   639.9
## 6 12.408759   605.7   605.4

library(tidyverse)

## ─ Attaching packages ────────────────────────── tidyverse 1.3.0 ─

## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0

## ─ Conflicts ─────────────────────────── tidyverse_conflicts() ─
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

dta %>% group_by(county) %>% 
  sample_n(1)

## # A tibble: 45 x 17
## # Groups:   county [45]
##    distcod county district grspan enrltot teachers calwpct mealpct computer
##      <int> <fct>  <fct>    <fct>    <int>    <dbl>   <dbl>   <dbl>    <int>
##  1   75119 Alame… Sunol G… KK-08      195    10.9    0.510    2.04       67
##  2   61457 Butte  Golden … KK-08      243    14     36.5     77.0        85
##  3   61572 Calav… Mark Tw… KK-08      777    36.8   13.0     39.8       148
##  4   61762 Contr… Oakley … KK-08     4153   200.     7.53    23.4       241
##  5   61911 El Do… Latrobe… KK-08      145     9.30   6.21     0          27
##  6   62331 Fresno Orange … KK-08      379    19     32.2     93.1        35
##  7   62596 Glenn  Lake El… KK-08      129     5      9.30    50.4        10
##  8   62745 Humbo… Cutten … KK-06      515    27.4   16.1     26.2        58
##  9   63172 Imper… Magnoli… KK-08      108     5      3.67    32.1        25
## 10   63255 Inyo   Bishop … KK-08     1510    73.3   10.3     37.1       141
## # … with 35 more rows, and 8 more variables: testscr <dbl>, compstu <dbl>,
## #   expnstu <dbl>, str <dbl>, avginc <dbl>, elpct <dbl>, readscr <dbl>,
## #   mathscr <dbl>

Plot scatter plot of readscr by mathscr

library(lattice)
lattice::xyplot

## function (x, data, ...) 
## UseMethod("xyplot")
## <bytecode: 0x7fca97adba30>
## <environment: namespace:lattice>

xyplot(readscr ~ mathscr, type=c("p","g","r"), data = dta)

Exercise 2: 133 class-level 95%-confidence intervals for language test score

Load data file

library(MASS)

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## The following object is masked from 'package:Ecdat':
## 
##     SP500

dta <- MASS::nlschools
head(dta)

##   lang   IQ class GS SES COMB
## 1   46 15.0   180 29  23    0
## 2   45 14.5   180 29  10    0
## 3   33  9.5   180 29  15    0
## 4   46 11.0   180 29  23    0
## 5   20  8.0   180 29  10    0
## 6   30  9.5   180 29  10    0

Compute the 95% CIs for language and create columns

dta_n <- dta %>%
  mutate(classID = factor(class, levels = levels(class), labels = c(1:length(levels(.$class))))) %>%
  group_by(classID) %>% 
  summarize(language_mean = mean(lang), 
            language_lb = language_mean - 1.96*sd(lang), 
            language_ub = language_mean + 1.96*sd(lang)) %>%
  as.data.frame

tail(dta_n,3)

##     classID language_mean language_lb language_ub
## 131     131      38.09091   26.953191    49.22863
## 132     132      29.30000    3.264031    55.33597
## 133     133      28.42857   14.762009    42.09513

Exercise 3: 133 class-level 95%-confidence intervals for language test score

Load data file

library(car)

## Loading required package: carData

## 
## Attaching package: 'carData'

## The following object is masked from 'package:Ecdat':
## 
##     Mroz

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following object is masked from 'package:purrr':
## 
##     some

data('Prestige')
head(Prestige)

##                     education income women prestige census type
## gov.administrators      13.11  12351 11.16     68.8   1113 prof
## general.managers        12.26  25879  4.02     69.1   1130 prof
## accountants             12.77   9271 15.70     63.4   1171 prof
## purchasing.officers     11.42   8865  9.11     56.8   1175 prof
## chemists                14.62   8403 11.68     73.5   2111 prof
## physicists              15.64  11030  5.13     77.6   2113 prof

First, find the median prestige score for each of the three types of occupation.

Then, use this median values in each type of occupation to define two levels of prestige: High and low, for each occupation, respectively. Summarize the relationship between income and education for each category generated from crossing the factor prestige with the type of occupation.

library(lattice)
lattice::xyplot

## function (x, data, ...) 
## UseMethod("xyplot")
## <bytecode: 0x7fca97adba30>
## <environment: namespace:lattice>

Prestige %>% 
  group_by(type) %>% 
  mutate(pt_med = median(prestige),
         pt_type = case_when(prestige > pt_med ~ "High",
                           prestige < pt_med ~ "Low")) %>%
  xyplot(income ~ education | type, groups = pt_type, data = ., type = c("g","p","r"))

## Warning: Factor `type` contains implicit NA, consider using
## `forcats::fct_explicit_na`

## Warning: Factor `type` contains implicit NA, consider using
## `forcats::fct_explicit_na`

Exercise 4: Nobel Laureates

dta_1 <- read.table("/Users/haolunfu/Documents/資料管理/week5/nobel_countries.txt", header = T)
head(dta_1)

##   Country Year
## 1  France 2014
## 2      UK 1950
## 3      UK 2017
## 4      US 2016
## 5  Canada 2013
## 6   China 2012

dta_2 <- read.table("/Users/haolunfu/Documents/資料管理/week5/nobel_winners.txt", header = T)
head(dta_2)

##                Name Gender Year
## 1   Patrick Modiano   Male 2014
## 2 Bertrand  Russell   Male 1950
## 3    Kazuo Ishiguro   Male 2017
## 4        Bob  Dylan   Male 2016
## 5      Alice  Munro Female 2013
## 6            Mo Yan   Male 2012

full_join(dta_1, dta_2) %>% arrange(desc(Year))

## Joining, by = "Year"

##   Country Year              Name Gender
## 1      UK 2017    Kazuo Ishiguro   Male
## 2      US 2016        Bob  Dylan   Male
## 3  Russia 2015              <NA>   <NA>
## 4  France 2014   Patrick Modiano   Male
## 5  Canada 2013      Alice  Munro Female
## 6   China 2012            Mo Yan   Male
## 7  Sweden 2011              <NA>   <NA>
## 8      UK 1950 Bertrand  Russell   Male
## 9    <NA> 1938        Pearl Buck Female

First, merge the data from nobel_countries dataset and winners by year

Then, arrange the order by year and show in descending power.

Exercise 5

Load data file

fL <- "http://www.amstat.org/publications/jse/datasets/sat.dat.txt"
dta <- read.table(fL, row.names=1)
head(dta)

##               V2   V3     V4 V5  V6  V7   V8
## Alabama    4.405 17.2 31.144  8 491 538 1029
## Alaska     8.963 17.6 47.951 47 445 489  934
## Arizona    4.778 19.3 32.175 27 448 496  944
## Arkansas   4.459 17.1 28.934  6 482 523 1005
## California 4.992 24.0 41.078 45 417 485  902
## Colorado   5.443 18.4 34.571 29 462 518  980

Rename the variable names

names(dta) <- c("Spending", "PTR", "Salary", "PE", "Verbal", "Math", "SAT")
dta$Region <- state.division
head(dta)

##            Spending  PTR Salary PE Verbal Math  SAT             Region
## Alabama       4.405 17.2 31.144  8    491  538 1029 East South Central
## Alaska        8.963 17.6 47.951 47    445  489  934            Pacific
## Arizona       4.778 19.3 32.175 27    448  496  944           Mountain
## Arkansas      4.459 17.1 28.934  6    482  523 1005 West South Central
## California    4.992 24.0 41.078 45    417  485  902            Pacific
## Colorado      5.443 18.4 34.571 29    462  518  980           Mountain

Plot the scatter plot and try to find the relationship between Salary and Region for the SAT scores.

lattice::xyplot

## function (x, data, ...) 
## UseMethod("xyplot")
## <bytecode: 0x7fca97adba30>
## <environment: namespace:lattice>

dta %>% xyplot(SAT ~ Salary | Region, type = c("g","r","p"), data = .)

# The results revealed that the negative signs contains West North Central, Mountain, New England, East South Central, West South Central

Week 5 Homework

Hao-Lun Fu

2020-04-13

Exercise 1: Math score and Read score

Load data file and set random seed

Plot scatter plot of readscr by mathscr

Exercise 2: 133 class-level 95%-confidence intervals for language test score

Load data file

Compute the 95% CIs for language and create columns

Exercise 3: 133 class-level 95%-confidence intervals for language test score

Load data file

First, find the median prestige score for each of the three types of occupation.

Exercise 4: Nobel Laureates

First, merge the data from nobel_countries dataset and winners by year

Then, arrange the order by year and show in descending power.

Exercise 5

Load data file

Rename the variable names

Plot the scatter plot and try to find the relationship between Salary and Region for the SAT scores.

Exercise 6

see the following link https://rpubs.com/haolunfu/598374