Exercise

Exercise 1 (GIF plot)

What does the R script do? Revise the code so that it does the same job without resorting to the use of a nested ‘for’ loop.

Create function

## [1] ""

Exercise 2 (need explain)

Use the read and math variables from the high schools data example for this problem. First firgure out what this R script does and then eliminate the for loop in the code segment.

## 'data.frame':    200 obs. of  11 variables:
##  $ id     : int  70 121 86 141 172 113 50 11 84 48 ...
##  $ female : Factor w/ 2 levels "female","male": 2 1 2 2 2 2 2 2 2 2 ...
##  $ race   : Factor w/ 4 levels "african-amer",..: 4 4 4 4 4 4 1 3 4 1 ...
##  $ ses    : Factor w/ 3 levels "high","low","middle": 2 3 1 1 3 3 3 3 3 3 ...
##  $ schtyp : Factor w/ 2 levels "private","public": 2 2 2 2 2 2 2 2 2 2 ...
##  $ prog   : Factor w/ 3 levels "academic","general",..: 2 3 2 3 1 1 2 1 2 1 ...
##  $ read   : int  57 68 44 63 47 44 50 34 63 57 ...
##  $ write  : int  52 59 33 44 52 52 59 46 57 55 ...
##  $ math   : int  41 53 54 47 57 51 42 45 54 52 ...
##  $ science: int  47 63 58 53 53 63 53 39 58 NA ...
##  $ socst  : int  57 61 31 56 61 61 61 36 51 51 ...
##    id female  race    ses schtyp     prog read write math science socst
## 1  70   male white    low public  general   57    52   41      47    57
## 2 121 female white middle public vocation   68    59   53      63    61
## 3  86   male white   high public  general   44    33   54      58    31
## 4 141   male white   high public vocation   63    44   47      53    56
## 5 172   male white middle public academic   47    52   57      53    61
## 6 113   male white middle public academic   44    52   51      63    61
## [1] 0.03596404
## [[1]]
## [1] -0.1557154
## 
## [[2]]
## [1] 0.0158295
## 
## [[3]]
## [1] 0.07903023
## 
## [[4]]
## [1] 0.2402566
## 
## [[5]]
## [1] 0.1757661
## 
## [[6]]
## [1] -0.2846964
## 
##  Pearson's product-moment correlation
## 
## data:  dta[dta$race == "asian", "math"] and dta[dta$race == "asian", "socst"]
## t = 1.9887, df = 9, p-value = 0.07796
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.07083501  0.86552255
## sample estimates:
##       cor 
## 0.5525177

Exercise 3 (unsolved)

An example of obtaining (nonparametric) bootstrap estimates of coefficients for a multiple linear regression model for the anorexia{MASS} data set is presented in the following script . Figure out how it works and improve the code.

##   Treat        Prewt           Postwt      
##  CBT :29   Min.   :70.00   Min.   : 71.30  
##  Cont:26   1st Qu.:79.60   1st Qu.: 79.33  
##  FT  :17   Median :82.30   Median : 84.05  
##            Mean   :82.41   Mean   : 85.17  
##            3rd Qu.:86.00   3rd Qu.: 91.55  
##            Max.   :94.90   Max.   :103.60
## [1] 72  3
##  [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15"
## [16] "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30"
## [31] "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42" "43" "44" "45"
## [46] "46" "47" "48" "49" "50" "51" "52" "53" "54" "55" "56" "57" "58" "59" "60"
## [61] "61" "62" "63" "64" "65" "66" "67" "68" "69" "70" "71" "72"
## 
## Call:
## lm(formula = Postwt ~ Prewt + Treat, data = dta)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.1083  -4.2773  -0.5484   5.4838  15.2922 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  49.7711    13.3910   3.717  0.00041 ***
## Prewt         0.4345     0.1612   2.695  0.00885 ** 
## TreatCont    -4.0971     1.8935  -2.164  0.03400 *  
## TreatFT       4.5631     2.1333   2.139  0.03604 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.978 on 68 degrees of freedom
## Multiple R-squared:  0.2777, Adjusted R-squared:  0.2458 
## F-statistic: 8.713 on 3 and 68 DF,  p-value: 5.719e-05
## (Intercept)       Prewt   TreatCont     TreatFT 
##  49.7711090   0.4344612  -4.0970655   4.5630627
## (Intercept)       Prewt   TreatCont     TreatFT 
##  48.5911158   0.4626028  -3.1025295   3.0240144
##             [,1]       [,2]
## [1,] 16.57250533 76.8546219
## [2,]  0.10622647  0.8242523
## [3,] -7.49270920 -0.6607540
## [4,] -0.09770998  9.0778937
##                  2.5 %     97.5 %
## (Intercept) 23.0498681 76.4923499
## Prewt        0.1128268  0.7560955
## TreatCont   -7.8754712 -0.3186599
## TreatFT      0.3060571  8.8200682

Exercise 5

Here is an example of simulation using R. Figure out how it works and improve the code.

##    Gender   Height
## 1       M 162.1184
## 2       M 158.0372
## 3       M 169.4136
## 4       M 174.9727
## 5       M 174.9417
## 6       M 173.7682
## 7       M 174.7152
## 8       M 185.7050
## 9       M 170.5542
## 10      M 172.2646
## 11      F 160.2013
## 12      F 159.1887
## 13      F 155.0295
## 14      F 150.3822
## 15      F 159.1699
## 16      F 154.1548
## 17      F 155.1310
## 18      F 160.2239
## 19      F 168.4234
## 20      F 166.4881

HW

HW 1

Download the data set consisting 9 different measurements on 1,597 faculty, including their salaries, gender, ranks, and etc, recorded over several years by executing the first few lines of the salary script. Note that there are different numbers of observations per faculty. Generate a plot (see below for example) the distributions of faculty’s salaries in their final years next to that for all years by rank (Assist, Assoc, Full). You will need to do a bit of programming for data manipulation to get the salary for the final year for each faculty.
...

HW4

Simulate sequences of 100 coin tosses and look for the length of the longest streak (run) in each sequence. Investigate the relationship between the probability of heads (or tails) and the maximum run length. (Hint: ?rle).

HW5

Use the geometrical properties of the following plot to write an R function to estimate the value of π (i.e., the area of a unit circle) through Monte Carlo simulation.