Exercise 1: math attainment

input data

read in a plain text file with variable names and assign a name to it

dta <- read.table("/Users/haolunfu/Documents/資料管理/week3/math_attainment.txt", header = T)

checking data

structure of data

str(dta)

## 'data.frame':    39 obs. of  3 variables:
##  $ math2: int  28 56 51 13 39 41 30 13 17 32 ...
##  $ math1: int  18 22 44 8 20 12 16 5 9 18 ...
##  $ cc   : num  328 406 387 167 328 ...

first 6 rows

head(dta)

##   math2 math1     cc
## 1    28    18 328.20
## 2    56    22 406.03
## 3    51    44 386.94
## 4    13     8 166.91
## 5    39    20 328.20
## 6    41    12 328.20

descriptive statistics

variable mean

colMeans(dta)

##     math2     math1        cc 
##  28.76923  15.35897 188.83667

variable sd

apply(dta, 2, sd)

##     math2     math1        cc 
## 10.720029  7.744224 84.842513

correlation matrix

cor(dta)

##           math2     math1        cc
## math2 1.0000000 0.7443604 0.6570098
## math1 0.7443604 1.0000000 0.5956771
## cc    0.6570098 0.5956771 1.0000000

plot data

specify square plot region

par(pty="s")

scatter plot of math2 by math1

plot(math2 ~ math1, data=dta, xlim=c(0, 60), ylim=c(0, 60),
     xlab="Math score at Year 1", ylab="Math score at Year 2")
# add grid lines
grid()


## regression analysis

# regress math2 by math1
dta.lm <- lm(math2 ~ math1, data=dta)


# add regression line

abline(dta.lm, lty=2)


# add plot title
title("Mathematics Attainment")

regression analysis

regress math2 by math1

# show results
summary(dta.lm)

## 
## Call:
## lm(formula = math2 ~ math1, data = dta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.430  -5.521  -0.369   4.253  20.388 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   12.944      2.607   4.965 1.57e-05 ***
## math1          1.030      0.152   6.780 5.57e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.255 on 37 degrees of freedom
## Multiple R-squared:  0.5541, Adjusted R-squared:  0.542 
## F-statistic: 45.97 on 1 and 37 DF,  p-value: 5.571e-08

# show anova table
anova(dta.lm)

## Analysis of Variance Table
## 
## Response: math2
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## math1      1 2419.6 2419.59  45.973 5.571e-08 ***
## Residuals 37 1947.3   52.63                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

diagnostics

specify maximum plot region

par(pty="m")

#
plot(scale(resid(dta.lm)) ~ fitted(dta.lm), 
     ylim=c(-3.5, 3.5), type="n",
     xlab="Fitted values", ylab="Standardized residuals") 

#
text(fitted(dta.lm), scale(resid(dta.lm)), labels=rownames(dta), cex=0.5)  

#
grid()

# add a horizontal red dash line
abline(h=0, lty=2, col="red")

normality check

#
qqnorm(scale(resid(dta.lm)))

qqline(scale(resid(dta.lm)))

grid()

Exercise 2: Women

Load dataset

library(datasets)
dta <- datasets::women

Read the first 6 data

head(dta)

##   height weight
## 1     58    115
## 2     59    117
## 3     60    120
## 4     61    123
## 5     62    126
## 6     63    129

List all values of the data by column or variable

As you can see, there are two variables (height and weight) in the data

c(dta)

## $height
##  [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## 
## $weight
##  [1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164

List all values of the data and seems different column data as same column

c(as.matrix(dta))

##  [1]  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72 115 117 120 123
## [20] 126 129 132 135 139 142 146 150 154 159 164

Exercise 3: Race and Birthweight

Load the data

Show the first 6 data

library(MASS)
dat <- MASS::birthwt
head(dat)

##    low age lwt race smoke ptl ht ui ftv  bwt
## 85   0  19 182    2     0   0  0  1   0 2523
## 86   0  33 155    3     0   0  0  0   3 2551
## 87   0  20 105    1     1   0  0  0   1 2557
## 88   0  21 108    1     1   0  0  1   2 2594
## 89   0  18 107    1     1   0  0  1   0 2600
## 91   0  21 124    3     0   0  0  0   0 2622

Recode the race names

dat$race <- c("White", "Black", "Other")[dat$race]
head(dat)

##    low age lwt  race smoke ptl ht ui ftv  bwt
## 85   0  19 182 Black     0   0  0  1   0 2523
## 86   0  33 155 Other     0   0  0  0   3 2551
## 87   0  20 105 White     1   0  0  0   1 2557
## 88   0  21 108 White     1   0  0  1   2 2594
## 89   0  18 107 White     1   0  0  1   0 2600
## 91   0  21 124 Other     0   0  0  0   0 2622

Show the numbers of different race mothers

aggregate(bwt ~ race, length, data=dat)

##    race bwt
## 1 Black  26
## 2 Other  67
## 3 White  96

There are 26 black mothers in this data frame

Exercise 4: UCBAdmissions

Load dataset

Data structure

library(datasets)
dat <- datasets::UCBAdmissions
str(dat)

##  'table' num [1:2, 1:2, 1:6] 512 313 89 19 353 207 17 8 120 205 ...
##  - attr(*, "dimnames")=List of 3
##   ..$ Admit : chr [1:2] "Admitted" "Rejected"
##   ..$ Gender: chr [1:2] "Male" "Female"
##   ..$ Dept  : chr [1:6] "A" "B" "C" "D" ...

How many Male across Admit and Dept

dat[,1,]

##           Dept
## Admit        A   B   C   D   E   F
##   Admitted 512 353 120 138  53  22
##   Rejected 313 207 205 279 138 351

How many Male in A Dept across Admit and Dept

dat[,1,1]

## Admitted Rejected 
##      512      313

How many Male was admitted across Admit and Dept

dat[1,1,]

##   A   B   C   D   E   F 
## 512 353 120 138  53  22

Exercise 5: chickwts

Load dataset

Show the data structure

library(datasets)
dat <- datasets::chickwts
str(dat)

## 'data.frame':    71 obs. of  2 variables:
##  $ weight: num  179 160 136 227 217 168 108 124 143 140 ...
##  $ feed  : Factor w/ 6 levels "casein","horsebean",..: 2 2 2 2 2 2 2 2 2 2 ...

As you can see, the column 2 is the feeds type

List the column 2 (feeds type) for all the dataset

chickwts[,2]

##  [1] horsebean horsebean horsebean horsebean horsebean horsebean horsebean
##  [8] horsebean horsebean horsebean linseed   linseed   linseed   linseed  
## [15] linseed   linseed   linseed   linseed   linseed   linseed   linseed  
## [22] linseed   soybean   soybean   soybean   soybean   soybean   soybean  
## [29] soybean   soybean   soybean   soybean   soybean   soybean   soybean  
## [36] soybean   sunflower sunflower sunflower sunflower sunflower sunflower
## [43] sunflower sunflower sunflower sunflower sunflower sunflower meatmeal 
## [50] meatmeal  meatmeal  meatmeal  meatmeal  meatmeal  meatmeal  meatmeal 
## [57] meatmeal  meatmeal  meatmeal  casein    casein    casein    casein   
## [64] casein    casein    casein    casein    casein    casein    casein   
## [71] casein   
## Levels: casein horsebean linseed meatmeal soybean sunflower

List the feeds type for all the dataset

chickwts["feed"]

##         feed
## 1  horsebean
## 2  horsebean
## 3  horsebean
## 4  horsebean
## 5  horsebean
## 6  horsebean
## 7  horsebean
## 8  horsebean
## 9  horsebean
## 10 horsebean
## 11   linseed
## 12   linseed
## 13   linseed
## 14   linseed
## 15   linseed
## 16   linseed
## 17   linseed
## 18   linseed
## 19   linseed
## 20   linseed
## 21   linseed
## 22   linseed
## 23   soybean
## 24   soybean
## 25   soybean
## 26   soybean
## 27   soybean
## 28   soybean
## 29   soybean
## 30   soybean
## 31   soybean
## 32   soybean
## 33   soybean
## 34   soybean
## 35   soybean
## 36   soybean
## 37 sunflower
## 38 sunflower
## 39 sunflower
## 40 sunflower
## 41 sunflower
## 42 sunflower
## 43 sunflower
## 44 sunflower
## 45 sunflower
## 46 sunflower
## 47 sunflower
## 48 sunflower
## 49  meatmeal
## 50  meatmeal
## 51  meatmeal
## 52  meatmeal
## 53  meatmeal
## 54  meatmeal
## 55  meatmeal
## 56  meatmeal
## 57  meatmeal
## 58  meatmeal
## 59  meatmeal
## 60    casein
## 61    casein
## 62    casein
## 63    casein
## 64    casein
## 65    casein
## 66    casein
## 67    casein
## 68    casein
## 69    casein
## 70    casein
## 71    casein

Exercise 6: MASS

Load dataset

dat <- MASS::menarche
str(dat)

## 'data.frame':    25 obs. of  3 variables:
##  $ Age     : num  9.21 10.21 10.58 10.83 11.08 ...
##  $ Total   : num  376 200 93 120 90 88 105 111 100 93 ...
##  $ Menarche: num  0 0 0 2 2 5 10 17 16 29 ...

Age: Average age of the group. (The groups are reasonably age homogeneous.)

Total: Total number of children in the group.

Menarche: Number who have reached menarche

Week 3 In-class exercise

Hao-Lun Fu

2020-03-23

Exercise 1: math attainment

input data

read in a plain text file with variable names and assign a name to it

checking data

structure of data

first 6 rows

descriptive statistics

variable mean

variable sd

correlation matrix

plot data

specify square plot region

scatter plot of math2 by math1

regression analysis

regress math2 by math1

diagnostics

specify maximum plot region

normality check

Exercise 2: Women

Load dataset

Read the first 6 data

List all values of the data by column or variable

As you can see, there are two variables (height and weight) in the data

List all values of the data and seems different column data as same column

Exercise 3: Race and Birthweight

Load the data

Show the first 6 data

Recode the race names

Show the numbers of different race mothers

There are 26 black mothers in this data frame

Exercise 4: UCBAdmissions

Load dataset

Data structure

How many Male across Admit and Dept

How many Male in A Dept across Admit and Dept

How many Male was admitted across Admit and Dept

Exercise 5: chickwts

Load dataset

Show the data structure

As you can see, the column 2 is the feeds type

List the column 2 (feeds type) for all the dataset

List the feeds type for all the dataset

Exercise 6: MASS

Load dataset

Age: Average age of the group. (The groups are reasonably age homogeneous.)

Total: Total number of children in the group.

Menarche: Number who have reached menarche