Homework 1:

Load data file

##    id female  race    ses schtyp     prog read write math science socst
## 1  70   male white    low public  general   57    52   41      47    57
## 2 121 female white middle public vocation   68    59   53      63    61
## 3  86   male white   high public  general   44    33   54      58    31
## 4 141   male white   high public vocation   63    44   47      53    56
## 5 172   male white middle public academic   47    52   57      53    61
## 6 113   male white middle public academic   44    52   51      63    61

Found some NA values in Science scores, so adopted another method

##    read   write    math science   socst 
##  52.230  52.775  52.645      NA  52.405

Translate the data from wide to long format

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## ─ Attaching packages ────────────────────────── tidyverse 1.3.0 ─
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ stringr 1.4.0
## ✓ tidyr   1.0.2     ✓ forcats 0.4.0
## ✓ readr   1.3.1
## ─ Conflicts ─────────────────────────── tidyverse_conflicts() ─
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Show the descriptive statistic of score

## # A tibble: 5 x 3
##   Academic a_mean  a_se
##   <chr>     <dbl> <dbl>
## 1 math       52.6 0.672
## 2 read       52.1 0.733
## 3 science    51.9 0.702
## 4 socst      52.3 0.771
## 5 write      52.7 0.678

Conducted One-way ANOVA

## 
## Error: id
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  1   4023    4023               
## 
## Error: id:Academic
##          Df Sum Sq Mean Sq
## Academic  4   60.5   15.12
## 
## Error: Within
##            Df Sum Sq Mean Sq F value Pr(>F)
## Academic    4    321   80.33   0.847  0.495
## Residuals 965  91524   94.84

The results revealed that no difference between academic scores.

Show the descriptive statistic of score

## # A tibble: 20 x 4
## # Groups:   race [4]
##    race         Academic a_mean  a_se
##    <fct>        <chr>     <dbl> <dbl>
##  1 african-amer math       46.5 1.50 
##  2 african-amer read       46.3 1.58 
##  3 african-amer science    42.4 2.19 
##  4 african-amer socst      49.4 2.56 
##  5 african-amer write      47.8 2.16 
##  6 asian        math       57.3 3.05 
##  7 asian        read       51.9 2.31 
##  8 asian        science    51.5 2.86 
##  9 asian        socst      51   2.94 
## 10 asian        write      58   2.38 
## 11 hispanic     math       47.6 1.48 
## 12 hispanic     read       47   2.16 
## 13 hispanic     science    46.2 1.52 
## 14 hispanic     socst      48.0 1.95 
## 15 hispanic     write      46.8 1.73 
## 16 white        math       53.8 0.787
## 17 white        read       53.7 0.862
## 18 white        science    54.1 0.766
## 19 white        socst      53.5 0.910
## 20 white        write      53.9 0.770

Test if the 4 different ethnic groups have the same mean scores for each of the 5 variables (individually): read, write, math, science, and socst.

## [[1]]
## Analysis of Variance Table
## 
## Response: read
##            Df  Sum Sq Mean Sq F value   Pr(>F)   
## race        3  1627.6  542.52  5.5496 0.001132 **
## Residuals 191 18672.0   97.76                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[2]]
## Analysis of Variance Table
## 
## Response: write
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## race        3  1758.1  586.04  7.1651 0.0001388 ***
## Residuals 191 15622.2   81.79                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[3]]
## Analysis of Variance Table
## 
## Response: math
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## race        3  1746.3  582.09  7.2581 0.0001231 ***
## Residuals 191 15317.8   80.20                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[4]]
## Analysis of Variance Table
## 
## Response: science
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## race        3  3169.5 1056.51  13.063 8.505e-08 ***
## Residuals 191 15447.2   80.88                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## [[5]]
## Analysis of Variance Table
## 
## Response: socst
##            Df  Sum Sq Mean Sq F value  Pr(>F)  
## race        3   800.7  266.89  2.3501 0.07379 .
## Residuals 191 21690.9  113.56                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A: Yes

I don’t know why it did not work, then I still tried another method to solve it

Homework 2: Loan and Payment

Generate the number

##          L    r   M       P
## 1  5.0e+06 0.02 120  110240
## 2  5.0e+06 0.02 180  102914
## 3  5.0e+06 0.02 240  100870
## 4  5.0e+06 0.02 300  100264
## 5  5.0e+06 0.02 360  100080
## 6  5.0e+06 0.05 120  250719
## 7  5.0e+06 0.05 180  250038
## 8  5.0e+06 0.05 240  250002
## 9  5.0e+06 0.05 300  250000
## 10 5.0e+06 0.05 360  250000
## 11 5.0e+06 0.07 120  350104
## 12 5.0e+06 0.07 180  350002
## 13 5.0e+06 0.07 240  350000
## 14 5.0e+06 0.07 300  350000
## 15 5.0e+06 0.07 360  350000
## 16 1.0e+07 0.02 120  220481
## 17 1.0e+07 0.02 180  205827
## 18 1.0e+07 0.02 240  201741
## 19 1.0e+07 0.02 300  200527
## 20 1.0e+07 0.02 360  200160
## 21 1.0e+07 0.05 120  501437
## 22 1.0e+07 0.05 180  500077
## 23 1.0e+07 0.05 240  500004
## 24 1.0e+07 0.05 300  500000
## 25 1.0e+07 0.05 360  500000
## 26 1.0e+07 0.07 120  700209
## 27 1.0e+07 0.07 180  700004
## 28 1.0e+07 0.07 240  700000
## 29 1.0e+07 0.07 300  700000
## 30 1.0e+07 0.07 360  700000
## 31 1.5e+07 0.02 120  330721
## 32 1.5e+07 0.02 180  308741
## 33 1.5e+07 0.02 240  302611
## 34 1.5e+07 0.02 300  300791
## 35 1.5e+07 0.02 360  300241
## 36 1.5e+07 0.05 120  752156
## 37 1.5e+07 0.05 180  750115
## 38 1.5e+07 0.05 240  750006
## 39 1.5e+07 0.05 300  750000
## 40 1.5e+07 0.05 360  750000
## 41 1.5e+07 0.07 120 1050313
## 42 1.5e+07 0.07 180 1050005
## 43 1.5e+07 0.07 240 1050000
## 44 1.5e+07 0.07 300 1050000
## 45 1.5e+07 0.07 360 1050000

Homework 3: Least Square Method and Maximum Likelihood Method for Simple Linear Regression

## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

Construct a function from the script so that any deviance value for pairs of parameter estimates can be found.

The code chunk was refered from Jay-Liao

## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14
## 
## [1] "95% CI of the intercept: [-99.1530770025164, -75.8802563308165]"
## [1] "95% CI of the slope: [3.27137246888624, 3.62862753111375]"

Homework 3: c-stat example

read in data

## 'data.frame':    42 obs. of  1 variable:
##  $ nc: int  28 46 39 45 24 20 35 37 36 40 ...
##   nc
## 1 28
## 2 46
## 3 39
## 4 45
## 5 24
## 6 20

calculate c-stat for first baseline phase

## [1] 0.2866238

calculate c-stat for first baseline plus group tokens

## $z
## [1] 3.879054
## 
## $pvalue
## [1] 5.243167e-05
## [1] 0.1601208
## [1] 0.2842676
## [1] 0.2866238
## $cden
## [1] 0.6642762
## 
## $sc
## [1] 0.1712469
## 
## $z
## [1] 3.879054
## 
## $pvalue
## [1] 5.243167e-05

Homework 5: Plot the likelihood functoion to estimate the probability of graduate admission by gender

Load data file

## [1] 512 313  89  19 353 207