Exercise 1: math attainment

input data

read in a plain text file with variable names and assign a name to it

checking data

structure of data

## 'data.frame':    39 obs. of  3 variables:
##  $ math2: int  28 56 51 13 39 41 30 13 17 32 ...
##  $ math1: int  18 22 44 8 20 12 16 5 9 18 ...
##  $ cc   : num  328 406 387 167 328 ...

first 6 rows

##   math2 math1     cc
## 1    28    18 328.20
## 2    56    22 406.03
## 3    51    44 386.94
## 4    13     8 166.91
## 5    39    20 328.20
## 6    41    12 328.20

descriptive statistics

variable mean

##     math2     math1        cc 
##  28.76923  15.35897 188.83667

variable sd

##     math2     math1        cc 
## 10.720029  7.744224 84.842513

correlation matrix

##           math2     math1        cc
## math2 1.0000000 0.7443604 0.6570098
## math1 0.7443604 1.0000000 0.5956771
## cc    0.6570098 0.5956771 1.0000000

plot data

specify square plot region

scatter plot of math2 by math1

regression analysis

regress math2 by math1

## 
## Call:
## lm(formula = math2 ~ math1, data = dta)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.430  -5.521  -0.369   4.253  20.388 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   12.944      2.607   4.965 1.57e-05 ***
## math1          1.030      0.152   6.780 5.57e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.255 on 37 degrees of freedom
## Multiple R-squared:  0.5541, Adjusted R-squared:  0.542 
## F-statistic: 45.97 on 1 and 37 DF,  p-value: 5.571e-08
## Analysis of Variance Table
## 
## Response: math2
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## math1      1 2419.6 2419.59  45.973 5.571e-08 ***
## Residuals 37 1947.3   52.63                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

diagnostics

specify maximum plot region

Exercise 2: Women

Read the first 6 data

##   height weight
## 1     58    115
## 2     59    117
## 3     60    120
## 4     61    123
## 5     62    126
## 6     63    129

List all values of the data by column or variable

As you can see, there are two variables (height and weight) in the data

## $height
##  [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
## 
## $weight
##  [1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164

List all values of the data and seems different column data as same column

##  [1]  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72 115 117 120 123
## [20] 126 129 132 135 139 142 146 150 154 159 164

Exercise 3: Race and Birthweight

Load the data

Show the first 6 data

##    low age lwt race smoke ptl ht ui ftv  bwt
## 85   0  19 182    2     0   0  0  1   0 2523
## 86   0  33 155    3     0   0  0  0   3 2551
## 87   0  20 105    1     1   0  0  0   1 2557
## 88   0  21 108    1     1   0  0  1   2 2594
## 89   0  18 107    1     1   0  0  1   0 2600
## 91   0  21 124    3     0   0  0  0   0 2622

Recode the race names

##    low age lwt  race smoke ptl ht ui ftv  bwt
## 85   0  19 182 Black     0   0  0  1   0 2523
## 86   0  33 155 Other     0   0  0  0   3 2551
## 87   0  20 105 White     1   0  0  0   1 2557
## 88   0  21 108 White     1   0  0  1   2 2594
## 89   0  18 107 White     1   0  0  1   0 2600
## 91   0  21 124 Other     0   0  0  0   0 2622

Show the numbers of different race mothers

##    race bwt
## 1 Black  26
## 2 Other  67
## 3 White  96

There are 26 black mothers in this data frame

Exercise 4: UCBAdmissions

Load dataset

Data structure

##  'table' num [1:2, 1:2, 1:6] 512 313 89 19 353 207 17 8 120 205 ...
##  - attr(*, "dimnames")=List of 3
##   ..$ Admit : chr [1:2] "Admitted" "Rejected"
##   ..$ Gender: chr [1:2] "Male" "Female"
##   ..$ Dept  : chr [1:6] "A" "B" "C" "D" ...

How many Male across Admit and Dept

##           Dept
## Admit        A   B   C   D   E   F
##   Admitted 512 353 120 138  53  22
##   Rejected 313 207 205 279 138 351

How many Male in A Dept across Admit and Dept

## Admitted Rejected 
##      512      313

How many Male was admitted across Admit and Dept

##   A   B   C   D   E   F 
## 512 353 120 138  53  22

Exercise 5: chickwts

Load dataset

Show the data structure

## 'data.frame':    71 obs. of  2 variables:
##  $ weight: num  179 160 136 227 217 168 108 124 143 140 ...
##  $ feed  : Factor w/ 6 levels "casein","horsebean",..: 2 2 2 2 2 2 2 2 2 2 ...

As you can see, the column 2 is the feeds type

List the column 2 (feeds type) for all the dataset

##  [1] horsebean horsebean horsebean horsebean horsebean horsebean horsebean
##  [8] horsebean horsebean horsebean linseed   linseed   linseed   linseed  
## [15] linseed   linseed   linseed   linseed   linseed   linseed   linseed  
## [22] linseed   soybean   soybean   soybean   soybean   soybean   soybean  
## [29] soybean   soybean   soybean   soybean   soybean   soybean   soybean  
## [36] soybean   sunflower sunflower sunflower sunflower sunflower sunflower
## [43] sunflower sunflower sunflower sunflower sunflower sunflower meatmeal 
## [50] meatmeal  meatmeal  meatmeal  meatmeal  meatmeal  meatmeal  meatmeal 
## [57] meatmeal  meatmeal  meatmeal  casein    casein    casein    casein   
## [64] casein    casein    casein    casein    casein    casein    casein   
## [71] casein   
## Levels: casein horsebean linseed meatmeal soybean sunflower

List the feeds type for all the dataset

##         feed
## 1  horsebean
## 2  horsebean
## 3  horsebean
## 4  horsebean
## 5  horsebean
## 6  horsebean
## 7  horsebean
## 8  horsebean
## 9  horsebean
## 10 horsebean
## 11   linseed
## 12   linseed
## 13   linseed
## 14   linseed
## 15   linseed
## 16   linseed
## 17   linseed
## 18   linseed
## 19   linseed
## 20   linseed
## 21   linseed
## 22   linseed
## 23   soybean
## 24   soybean
## 25   soybean
## 26   soybean
## 27   soybean
## 28   soybean
## 29   soybean
## 30   soybean
## 31   soybean
## 32   soybean
## 33   soybean
## 34   soybean
## 35   soybean
## 36   soybean
## 37 sunflower
## 38 sunflower
## 39 sunflower
## 40 sunflower
## 41 sunflower
## 42 sunflower
## 43 sunflower
## 44 sunflower
## 45 sunflower
## 46 sunflower
## 47 sunflower
## 48 sunflower
## 49  meatmeal
## 50  meatmeal
## 51  meatmeal
## 52  meatmeal
## 53  meatmeal
## 54  meatmeal
## 55  meatmeal
## 56  meatmeal
## 57  meatmeal
## 58  meatmeal
## 59  meatmeal
## 60    casein
## 61    casein
## 62    casein
## 63    casein
## 64    casein
## 65    casein
## 66    casein
## 67    casein
## 68    casein
## 69    casein
## 70    casein
## 71    casein

Exercise 6: MASS

Load dataset

## 'data.frame':    25 obs. of  3 variables:
##  $ Age     : num  9.21 10.21 10.58 10.83 11.08 ...
##  $ Total   : num  376 200 93 120 90 88 105 111 100 93 ...
##  $ Menarche: num  0 0 0 2 2 5 10 17 16 29 ...

Age: Average age of the group. (The groups are reasonably age homogeneous.)

Total: Total number of children in the group.

Menarche: Number who have reached menarche