Exercise 1: math attainment
input data
read in a plain text file with variable names and assign a name to it
checking data
structure of data
## 'data.frame': 39 obs. of 3 variables:
## $ math2: int 28 56 51 13 39 41 30 13 17 32 ...
## $ math1: int 18 22 44 8 20 12 16 5 9 18 ...
## $ cc : num 328 406 387 167 328 ...
first 6 rows
## math2 math1 cc
## 1 28 18 328.20
## 2 56 22 406.03
## 3 51 44 386.94
## 4 13 8 166.91
## 5 39 20 328.20
## 6 41 12 328.20
descriptive statistics
variable mean
## math2 math1 cc
## 28.76923 15.35897 188.83667
variable sd
## math2 math1 cc
## 10.720029 7.744224 84.842513
correlation matrix
## math2 math1 cc
## math2 1.0000000 0.7443604 0.6570098
## math1 0.7443604 1.0000000 0.5956771
## cc 0.6570098 0.5956771 1.0000000
plot data
specify square plot region
scatter plot of math2 by math1

regression analysis
regress math2 by math1
##
## Call:
## lm(formula = math2 ~ math1, data = dta)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.430 -5.521 -0.369 4.253 20.388
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.944 2.607 4.965 1.57e-05 ***
## math1 1.030 0.152 6.780 5.57e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.255 on 37 degrees of freedom
## Multiple R-squared: 0.5541, Adjusted R-squared: 0.542
## F-statistic: 45.97 on 1 and 37 DF, p-value: 5.571e-08
## Analysis of Variance Table
##
## Response: math2
## Df Sum Sq Mean Sq F value Pr(>F)
## math1 1 2419.6 2419.59 45.973 5.571e-08 ***
## Residuals 37 1947.3 52.63
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
diagnostics
specify maximum plot region
#
plot(scale(resid(dta.lm)) ~ fitted(dta.lm),
ylim=c(-3.5, 3.5), type="n",
xlab="Fitted values", ylab="Standardized residuals")
#
text(fitted(dta.lm), scale(resid(dta.lm)), labels=rownames(dta), cex=0.5)
#
grid()
# add a horizontal red dash line
abline(h=0, lty=2, col="red")

normality check

Read the first 6 data
## height weight
## 1 58 115
## 2 59 117
## 3 60 120
## 4 61 123
## 5 62 126
## 6 63 129
List all values of the data by column or variable
As you can see, there are two variables (height and weight) in the data
## $height
## [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
##
## $weight
## [1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164
List all values of the data and seems different column data as same column
## [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 115 117 120 123
## [20] 126 129 132 135 139 142 146 150 154 159 164
Exercise 3: Race and Birthweight
Load the data
Show the first 6 data
## low age lwt race smoke ptl ht ui ftv bwt
## 85 0 19 182 2 0 0 0 1 0 2523
## 86 0 33 155 3 0 0 0 0 3 2551
## 87 0 20 105 1 1 0 0 0 1 2557
## 88 0 21 108 1 1 0 0 1 2 2594
## 89 0 18 107 1 1 0 0 1 0 2600
## 91 0 21 124 3 0 0 0 0 0 2622
Recode the race names
## low age lwt race smoke ptl ht ui ftv bwt
## 85 0 19 182 Black 0 0 0 1 0 2523
## 86 0 33 155 Other 0 0 0 0 3 2551
## 87 0 20 105 White 1 0 0 0 1 2557
## 88 0 21 108 White 1 0 0 1 2 2594
## 89 0 18 107 White 1 0 0 1 0 2600
## 91 0 21 124 Other 0 0 0 0 0 2622
Show the numbers of different race mothers
## race bwt
## 1 Black 26
## 2 Other 67
## 3 White 96
There are 26 black mothers in this data frame
Exercise 4: UCBAdmissions
Load dataset
Data structure
## 'table' num [1:2, 1:2, 1:6] 512 313 89 19 353 207 17 8 120 205 ...
## - attr(*, "dimnames")=List of 3
## ..$ Admit : chr [1:2] "Admitted" "Rejected"
## ..$ Gender: chr [1:2] "Male" "Female"
## ..$ Dept : chr [1:6] "A" "B" "C" "D" ...
How many Male across Admit and Dept
## Dept
## Admit A B C D E F
## Admitted 512 353 120 138 53 22
## Rejected 313 207 205 279 138 351
How many Male in A Dept across Admit and Dept
## Admitted Rejected
## 512 313
How many Male was admitted across Admit and Dept
## A B C D E F
## 512 353 120 138 53 22
Exercise 5: chickwts
Load dataset
Show the data structure
## 'data.frame': 71 obs. of 2 variables:
## $ weight: num 179 160 136 227 217 168 108 124 143 140 ...
## $ feed : Factor w/ 6 levels "casein","horsebean",..: 2 2 2 2 2 2 2 2 2 2 ...
As you can see, the column 2 is the feeds type
List the column 2 (feeds type) for all the dataset
## [1] horsebean horsebean horsebean horsebean horsebean horsebean horsebean
## [8] horsebean horsebean horsebean linseed linseed linseed linseed
## [15] linseed linseed linseed linseed linseed linseed linseed
## [22] linseed soybean soybean soybean soybean soybean soybean
## [29] soybean soybean soybean soybean soybean soybean soybean
## [36] soybean sunflower sunflower sunflower sunflower sunflower sunflower
## [43] sunflower sunflower sunflower sunflower sunflower sunflower meatmeal
## [50] meatmeal meatmeal meatmeal meatmeal meatmeal meatmeal meatmeal
## [57] meatmeal meatmeal meatmeal casein casein casein casein
## [64] casein casein casein casein casein casein casein
## [71] casein
## Levels: casein horsebean linseed meatmeal soybean sunflower
List the feeds type for all the dataset
## feed
## 1 horsebean
## 2 horsebean
## 3 horsebean
## 4 horsebean
## 5 horsebean
## 6 horsebean
## 7 horsebean
## 8 horsebean
## 9 horsebean
## 10 horsebean
## 11 linseed
## 12 linseed
## 13 linseed
## 14 linseed
## 15 linseed
## 16 linseed
## 17 linseed
## 18 linseed
## 19 linseed
## 20 linseed
## 21 linseed
## 22 linseed
## 23 soybean
## 24 soybean
## 25 soybean
## 26 soybean
## 27 soybean
## 28 soybean
## 29 soybean
## 30 soybean
## 31 soybean
## 32 soybean
## 33 soybean
## 34 soybean
## 35 soybean
## 36 soybean
## 37 sunflower
## 38 sunflower
## 39 sunflower
## 40 sunflower
## 41 sunflower
## 42 sunflower
## 43 sunflower
## 44 sunflower
## 45 sunflower
## 46 sunflower
## 47 sunflower
## 48 sunflower
## 49 meatmeal
## 50 meatmeal
## 51 meatmeal
## 52 meatmeal
## 53 meatmeal
## 54 meatmeal
## 55 meatmeal
## 56 meatmeal
## 57 meatmeal
## 58 meatmeal
## 59 meatmeal
## 60 casein
## 61 casein
## 62 casein
## 63 casein
## 64 casein
## 65 casein
## 66 casein
## 67 casein
## 68 casein
## 69 casein
## 70 casein
## 71 casein
Exercise 6: MASS
Load dataset
## 'data.frame': 25 obs. of 3 variables:
## $ Age : num 9.21 10.21 10.58 10.83 11.08 ...
## $ Total : num 376 200 93 120 90 88 105 111 100 93 ...
## $ Menarche: num 0 0 0 2 2 5 10 17 16 29 ...
Age: Average age of the group. (The groups are reasonably age homogeneous.)
Total: Total number of children in the group.
Menarche: Number who have reached menarche