1. Import the library
library(rstatix)
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(openxlsx)
library(rio)
library(agricolae)
2. Import the existing data from R
data("mtcars")
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
3. Copy the data for use
4. Save the data into Excel
5. Import the data
6. Create a new data selecting rows 1:10
## [1] 10 11
7. Create a new data selecting rows 2, 3:6, 18
## [1] 6 11
8. Create a new data deleting rows 15, 18, 20
## [1] 29 11
9. create a new data selecting cyl = 6
## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 4 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 5 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## 6 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## 7 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
10. Create a new data selecting cyl = 6 and 8
## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 4 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## 5 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 6 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 7 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## 8 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## 9 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## 10 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## 11 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## 12 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## 13 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## 14 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## 15 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## 16 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## 17 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## 18 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## 19 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## 20 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## 21 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Create a new dataset selecting mpg, disp, drat, vs
## [1] "mpg" "cyl" "disp" "drat" "vs"
11. Create a new dataset deleting cyl and am
## [1] "mpg" "disp" "hp" "drat" "wt" "qsec" "vs" "gear" "carb"
12. Create a new dataset arranging large to small for wt
## mpg cyl disp hp drat wt qsec vs am gear carb
## 1 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## 2 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## 3 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## 4 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## 5 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## 6 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## 7 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## 8 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## 9 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## 10 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## 11 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## 12 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 13 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## 14 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## 15 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## 16 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## 17 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 18 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## 19 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## 20 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## 21 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## 22 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## 23 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## 24 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## 25 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## 26 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## 27 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## 28 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## 29 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## 30 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## 31 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## 32 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
13. reate a columns (kW = 0.75 x hp)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb" "kw"
II. Descriptive statistics
14. Calculate total for all variables
## mpg cyl disp hp drat wt qsec vs
## 642.900 198.000 7383.100 4694.000 115.090 102.952 571.160 14.000
## am gear carb kw
## 13.000 118.000 90.000 3520.500
15. Calculate means for all variables
## mpg cyl disp hp drat wt qsec
## 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750
## vs am gear carb kw
## 0.437500 0.406250 3.687500 2.812500 110.015625
16. Count the number of cyl
data |> freq_table(cyl)
## # A tibble: 3 × 3
## cyl n prop
## <dbl> <int> <dbl>
## 1 4 11 34.4
## 2 6 7 21.9
## 3 8 14 43.8
17. Please count number of cyl by am
## # A tibble: 6 × 4
## cyl am n prop
## <dbl> <dbl> <int> <dbl>
## 1 4 0 3 27.3
## 2 4 1 8 72.7
## 3 6 0 4 57.1
## 4 6 1 3 42.9
## 5 8 0 12 85.7
## 6 8 1 2 14.3
20. Please convert the result above to have cyl 4, 6, 8 as columns
by selecting only mean and variable
## [1] "cyl" "variable" "n" "mean" "median" "sd"
## # A tibble: 7 × 4
## variable `4` `6` `8`
## <fct> <dbl> <dbl> <dbl>
## 1 mpg 26.7 19.7 15.1
## 2 disp 105. 183. 353.
## 3 hp 82.6 122. 209.
## 4 drat 4.07 3.59 3.23
## 5 wt 2.29 3.12 4.00
## 6 qsec 19.1 18.0 16.8
## 7 kw 62.0 91.7 157.
III. Inferrential statistics
21. Do one-sample t-test for mpg by selecting cyl=4
H0: mpg = 20
H1: mpg # 20
## mpg cyl disp hp drat wt qsec vs am gear carb kw
## 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 69.75
## 2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 46.50
## 3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 71.25
## 4 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 49.50
## 5 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 39.00
## 6 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 48.75
## 7 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 72.75
## 8 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 49.50
## 9 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 68.25
## 10 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 84.75
## 11 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 81.75
Result
## # A tibble: 1 × 7
## .y. group1 group2 n statistic df p
## * <chr> <chr> <chr> <int> <dbl> <dbl> <dbl>
## 1 mpg 1 null model 11 4.90 10 0.000623
Check normality
##
## Shapiro-Wilk normality test
##
## data: df$mpg
## W = 0.94756, p-value = 0.1229
22. plot histogram
H0: mpg = 20

23. Do two-sample t-test for mpg by cyl = 4 & 8
H0: cyl4 = cyl8
H1: cyl4 # cyl8
## mpg cyl disp hp drat wt qsec vs am gear carb kw
## 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 69.75
## 2 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 131.25
## 3 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 183.75
## 4 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 46.50
## 5 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 71.25
## 6 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 135.00
## 7 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 135.00
## 8 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 135.00
## 9 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 153.75
## 10 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 161.25
## 11 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 172.50
## 12 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 49.50
## 13 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 39.00
## 14 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 48.75
## 15 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 72.75
## 16 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 112.50
## 17 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 112.50
## 18 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 183.75
## 19 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 131.25
## 20 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 49.50
## 21 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 68.25
## 22 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 84.75
## 23 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 198.00
## 24 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 251.25
## 25 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 81.75
Result
## # A tibble: 1 × 8
## .y. group1 group2 n1 n2 statistic df p
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 mpg 4 8 11 14 7.60 15.0 0.00000164
24. Check normality
## # A tibble: 2 × 4
## cyl variable statistic p
## <dbl> <chr> <dbl> <dbl>
## 1 4 mpg 0.912 0.261
## 2 8 mpg 0.932 0.323
25. Check variance
## # A tibble: 1 × 9
## .y. group1 group2 n1 n2 statistic p p.adj p.adj.signif
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <chr>
## 1 mpg 4 8 11 14 -4.22 0.0000246 0.0000246 ****
26. Do two-sample t-test with unequal variance
## # A tibble: 1 × 8
## .y. group1 group2 n1 n2 statistic df p
## * <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 mpg 4 8 11 14 7.60 15.0 0.00000164
27. Do anova for wt by cyl
## Analysis of Variance Table
##
## Response: wt
## Df Sum Sq Mean Sq F value Pr(>F)
## cyl 1 18.172 18.1723 47.379 1.218e-07 ***
## Residuals 30 11.507 0.3835
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
28. Do post-hoc
##
## Study: fit ~ "cyl"
##
## LSD t Test for wt
## P value adjustment method: bonferroni
##
## Mean Square Error: 0.3835487
##
## cyl, means and individual ( 95 %) CI
##
## wt std r se LCL UCL Min Max Q25 Q50
## 4 2.285727 0.5695637 11 0.1867299 1.904374 2.667081 1.513 3.190 1.8850 2.200
## 6 3.117143 0.3563455 7 0.2340783 2.639091 3.595195 2.620 3.460 2.8225 3.215
## 8 3.999214 0.7594047 14 0.1655184 3.661181 4.337248 3.170 5.424 3.5325 3.755
## Q75
## 4 2.62250
## 6 3.44000
## 8 4.01375
##
## Alpha: 0.05 ; DF Error: 30
## Critical Value of t: 2.535742
##
## Groups according to probability of means differences and alpha level( 0.05 )
##
## Treatments with the same letter are not significantly different.
##
## wt groups
## 8 3.999214 a
## 6 3.117143 b
## 4 2.285727 c
29. Test normality
##
## Shapiro-Wilk normality test
##
## data: fit$residuals
## W = 0.89969, p-value = 0.006083
30. Test variance
