Classwork

1. Import the library

library(rstatix)

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(openxlsx)
library(rio)
library(agricolae)

2. Import the existing data from R

data("mtcars")
head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

3. Copy the data for use

4. Save the data into Excel

5. Import the data

6. Create a new data selecting rows 1:10

## [1] 10 11

7. Create a new data selecting rows 2, 3:6, 18

## [1]  6 11

8. Create a new data deleting rows 15, 18, 20

## [1] 29 11

9. create a new data selecting cyl = 6

##    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 4 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 5 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 6 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## 7 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

10. Create a new data selecting cyl = 6 and 8

##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 3  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 4  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## 5  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 6  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## 7  19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 8  17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## 9  16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## 10 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## 11 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## 12 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## 13 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## 14 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## 15 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## 16 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## 17 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## 18 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## 19 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## 20 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## 21 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

Create a new dataset selecting mpg, disp, drat, vs

## [1] "mpg"  "cyl"  "disp" "drat" "vs"

11. Create a new dataset deleting cyl and am

## [1] "mpg"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "gear" "carb"

12. Create a new dataset arranging large to small for wt

##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1  10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## 2  14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## 3  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## 4  16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## 5  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## 6  13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## 7  15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## 8  17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## 9  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## 10 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## 11 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## 12 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 13 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## 14 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 15 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## 16 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## 17 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 18 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## 19 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## 20 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## 21 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 22 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## 23 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## 24 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 25 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## 26 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## 27 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## 28 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## 29 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## 30 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## 31 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## 32 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2

13. reate a columns (kW = 0.75 x hp)

##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb" "kw"

II. Descriptive statistics

14. Calculate total for all variables

##      mpg      cyl     disp       hp     drat       wt     qsec       vs 
##  642.900  198.000 7383.100 4694.000  115.090  102.952  571.160   14.000 
##       am     gear     carb       kw 
##   13.000  118.000   90.000 3520.500

15. Calculate means for all variables

##        mpg        cyl       disp         hp       drat         wt       qsec 
##  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
##         vs         am       gear       carb         kw 
##   0.437500   0.406250   3.687500   2.812500 110.015625

16. Count the number of cyl

data |> freq_table(cyl)

## # A tibble: 3 × 3
##     cyl     n  prop
##   <dbl> <int> <dbl>
## 1     4    11  34.4
## 2     6     7  21.9
## 3     8    14  43.8

17. Please count number of cyl by am

## # A tibble: 6 × 4
##     cyl    am     n  prop
##   <dbl> <dbl> <int> <dbl>
## 1     4     0     3  27.3
## 2     4     1     8  72.7
## 3     6     0     4  57.1
## 4     6     1     3  42.9
## 5     8     0    12  85.7
## 6     8     1     2  14.3

18. Please calculate mean, median, and SD for wt by cyl

## # A tibble: 3 × 4
##     cyl  Mean   Med    SD
##   <dbl> <dbl> <dbl> <dbl>
## 1     4  2.29  2.2  0.570
## 2     6  3.12  3.22 0.356
## 3     8  4.00  3.76 0.759

19. Please calculate mean, median, and SD by cyl for all variable except vs, am, carb and gear

## # A tibble: 21 × 6
##      cyl variable     n   mean median     sd
##    <dbl> <fct>    <dbl>  <dbl>  <dbl>  <dbl>
##  1     4 mpg         11  26.7   26     4.51 
##  2     4 disp        11 105.   108    26.9  
##  3     4 hp          11  82.6   91    20.9  
##  4     4 drat        11   4.07   4.08  0.365
##  5     4 wt          11   2.29   2.2   0.57 
##  6     4 qsec        11  19.1   18.9   1.68 
##  7     4 kw          11  62.0   68.2  15.7  
##  8     6 mpg          7  19.7   19.7   1.45 
##  9     6 disp         7 183.   168.   41.6  
## 10     6 hp           7 122.   110    24.3  
## # ℹ 11 more rows

20. Please convert the result above to have cyl 4, 6, 8 as columns by selecting only mean and variable

## [1] "cyl"      "variable" "n"        "mean"     "median"   "sd"

## # A tibble: 7 × 4
##   variable    `4`    `6`    `8`
##   <fct>     <dbl>  <dbl>  <dbl>
## 1 mpg       26.7   19.7   15.1 
## 2 disp     105.   183.   353.  
## 3 hp        82.6  122.   209.  
## 4 drat       4.07   3.59   3.23
## 5 wt         2.29   3.12   4.00
## 6 qsec      19.1   18.0   16.8 
## 7 kw        62.0   91.7  157.

III. Inferrential statistics

21. Do one-sample t-test for mpg by selecting cyl=4

H0: mpg = 20

H1: mpg # 20

##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb    kw
## 1  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 69.75
## 2  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 46.50
## 3  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 71.25
## 4  32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1 49.50
## 5  30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2 39.00
## 6  33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 48.75
## 7  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1 72.75
## 8  27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1 49.50
## 9  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2 68.25
## 10 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2 84.75
## 11 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2 81.75

Result

## # A tibble: 1 × 7
##   .y.   group1 group2         n statistic    df        p
## * <chr> <chr>  <chr>      <int>     <dbl> <dbl>    <dbl>
## 1 mpg   1      null model    11      4.90    10 0.000623

Check normality

## 
##  Shapiro-Wilk normality test
## 
## data:  df$mpg
## W = 0.94756, p-value = 0.1229

22. plot histogram

H0: mpg = 20

23. Do two-sample t-test for mpg by cyl = 4 & 8

H0: cyl4 = cyl8

H1: cyl4 # cyl8

##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb     kw
## 1  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  69.75
## 2  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 131.25
## 3  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4 183.75
## 4  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2  46.50
## 5  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2  71.25
## 6  16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3 135.00
## 7  17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3 135.00
## 8  15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3 135.00
## 9  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4 153.75
## 10 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4 161.25
## 11 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4 172.50
## 12 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1  49.50
## 13 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2  39.00
## 14 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1  48.75
## 15 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1  72.75
## 16 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2 112.50
## 17 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2 112.50
## 18 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4 183.75
## 19 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2 131.25
## 20 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1  49.50
## 21 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2  68.25
## 22 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2  84.75
## 23 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4 198.00
## 24 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8 251.25
## 25 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2  81.75

Result

## # A tibble: 1 × 8
##   .y.   group1 group2    n1    n2 statistic    df          p
## * <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>      <dbl>
## 1 mpg   4      8         11    14      7.60  15.0 0.00000164

24. Check normality

## # A tibble: 2 × 4
##     cyl variable statistic     p
##   <dbl> <chr>        <dbl> <dbl>
## 1     4 mpg          0.912 0.261
## 2     8 mpg          0.932 0.323

25. Check variance

## # A tibble: 1 × 9
##   .y.   group1 group2    n1    n2 statistic         p     p.adj p.adj.signif
## * <chr> <chr>  <chr>  <int> <int>     <dbl>     <dbl>     <dbl> <chr>       
## 1 mpg   4      8         11    14     -4.22 0.0000246 0.0000246 ****

26. Do two-sample t-test with unequal variance

## # A tibble: 1 × 8
##   .y.   group1 group2    n1    n2 statistic    df          p
## * <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>      <dbl>
## 1 mpg   4      8         11    14      7.60  15.0 0.00000164

27. Do anova for wt by cyl

## Analysis of Variance Table
## 
## Response: wt
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## cyl        1 18.172 18.1723  47.379 1.218e-07 ***
## Residuals 30 11.507  0.3835                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

28. Do post-hoc

## 
## Study: fit ~ "cyl"
## 
## LSD t Test for wt 
## P value adjustment method: bonferroni 
## 
## Mean Square Error:  0.3835487 
## 
## cyl,  means and individual ( 95 %) CI
## 
##         wt       std  r        se      LCL      UCL   Min   Max    Q25   Q50
## 4 2.285727 0.5695637 11 0.1867299 1.904374 2.667081 1.513 3.190 1.8850 2.200
## 6 3.117143 0.3563455  7 0.2340783 2.639091 3.595195 2.620 3.460 2.8225 3.215
## 8 3.999214 0.7594047 14 0.1655184 3.661181 4.337248 3.170 5.424 3.5325 3.755
##       Q75
## 4 2.62250
## 6 3.44000
## 8 4.01375
## 
## Alpha: 0.05 ; DF Error: 30
## Critical Value of t: 2.535742 
## 
## Groups according to probability of means differences and alpha level( 0.05 )
## 
## Treatments with the same letter are not significantly different.
## 
##         wt groups
## 8 3.999214      a
## 6 3.117143      b
## 4 2.285727      c

29. Test normality

## 
##  Shapiro-Wilk normality test
## 
## data:  fit$residuals
## W = 0.89969, p-value = 0.006083

Classwork

Lyhour Hin

2025-01-03

1. Import the library

2. Import the existing data from R

3. Copy the data for use

4. Save the data into Excel

5. Import the data

6. Create a new data selecting rows 1:10

7. Create a new data selecting rows 2, 3:6, 18

8. Create a new data deleting rows 15, 18, 20

9. create a new data selecting cyl = 6

10. Create a new data selecting cyl = 6 and 8

Create a new dataset selecting mpg, disp, drat, vs

11. Create a new dataset deleting cyl and am

12. Create a new dataset arranging large to small for wt

13. reate a columns (kW = 0.75 x hp)

II. Descriptive statistics

14. Calculate total for all variables

15. Calculate means for all variables

16. Count the number of cyl

17. Please count number of cyl by am

18. Please calculate mean, median, and SD for wt by cyl

19. Please calculate mean, median, and SD by cyl for all variable except vs, am, carb and gear

20. Please convert the result above to have cyl 4, 6, 8 as columns by selecting only mean and variable

III. Inferrential statistics

21. Do one-sample t-test for mpg by selecting cyl=4

H0: mpg = 20

H1: mpg # 20

Result

Check normality

22. plot histogram

H0: mpg = 20

23. Do two-sample t-test for mpg by cyl = 4 & 8

H0: cyl4 = cyl8

H1: cyl4 # cyl8

Result

24. Check normality

25. Check variance

26. Do two-sample t-test with unequal variance

27. Do anova for wt by cyl

28. Do post-hoc

29. Test normality

30. Test variance