1. Import the library

library(rstatix)
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(openxlsx)
library(rio)
library(agricolae)

2. Import the existing data from R

data("mtcars")
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

3. Copy the data for use

4. Save the data into Excel

5. Import the data

6. Create a new data selecting rows 1:10

## [1] 10 11

7. Create a new data selecting rows 2, 3:6, 18

## [1]  6 11

8. Create a new data deleting rows 15, 18, 20

## [1] 29 11

9. create a new data selecting cyl = 6

##    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 2 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 3 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 4 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 5 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 6 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## 7 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6

10. Create a new data selecting cyl = 6 and 8

##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 3  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 4  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## 5  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 6  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## 7  19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 8  17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## 9  16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## 10 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## 11 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## 12 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## 13 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## 14 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## 15 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## 16 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## 17 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## 18 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## 19 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## 20 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## 21 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

Create a new dataset selecting mpg, disp, drat, vs

## [1] "mpg"  "cyl"  "disp" "drat" "vs"

11. Create a new dataset deleting cyl and am

## [1] "mpg"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "gear" "carb"

12. Create a new dataset arranging large to small for wt

##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1  10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## 2  14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## 3  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## 4  16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## 5  19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## 6  13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## 7  15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## 8  17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## 9  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## 10 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## 11 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## 12 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 13 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## 14 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 15 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## 16 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## 17 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 18 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## 19 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## 20 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## 21 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## 22 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
## 23 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## 24 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## 25 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## 26 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## 27 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## 28 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## 29 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## 30 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## 31 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## 32 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2

13. reate a columns (kW = 0.75 x hp)

##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb" "kw"

II. Descriptive statistics

14. Calculate total for all variables

##      mpg      cyl     disp       hp     drat       wt     qsec       vs 
##  642.900  198.000 7383.100 4694.000  115.090  102.952  571.160   14.000 
##       am     gear     carb       kw 
##   13.000  118.000   90.000 3520.500

15. Calculate means for all variables

##        mpg        cyl       disp         hp       drat         wt       qsec 
##  20.090625   6.187500 230.721875 146.687500   3.596563   3.217250  17.848750 
##         vs         am       gear       carb         kw 
##   0.437500   0.406250   3.687500   2.812500 110.015625

16. Count the number of cyl

data |> freq_table(cyl)
## # A tibble: 3 × 3
##     cyl     n  prop
##   <dbl> <int> <dbl>
## 1     4    11  34.4
## 2     6     7  21.9
## 3     8    14  43.8

17. Please count number of cyl by am

## # A tibble: 6 × 4
##     cyl    am     n  prop
##   <dbl> <dbl> <int> <dbl>
## 1     4     0     3  27.3
## 2     4     1     8  72.7
## 3     6     0     4  57.1
## 4     6     1     3  42.9
## 5     8     0    12  85.7
## 6     8     1     2  14.3

18. Please calculate mean, median, and SD for wt by cyl

## # A tibble: 3 × 4
##     cyl  Mean   Med    SD
##   <dbl> <dbl> <dbl> <dbl>
## 1     4  2.29  2.2  0.570
## 2     6  3.12  3.22 0.356
## 3     8  4.00  3.76 0.759

19. Please calculate mean, median, and SD by cyl for all variable except vs, am, carb and gear

## # A tibble: 21 × 6
##      cyl variable     n   mean median     sd
##    <dbl> <fct>    <dbl>  <dbl>  <dbl>  <dbl>
##  1     4 mpg         11  26.7   26     4.51 
##  2     4 disp        11 105.   108    26.9  
##  3     4 hp          11  82.6   91    20.9  
##  4     4 drat        11   4.07   4.08  0.365
##  5     4 wt          11   2.29   2.2   0.57 
##  6     4 qsec        11  19.1   18.9   1.68 
##  7     4 kw          11  62.0   68.2  15.7  
##  8     6 mpg          7  19.7   19.7   1.45 
##  9     6 disp         7 183.   168.   41.6  
## 10     6 hp           7 122.   110    24.3  
## # ℹ 11 more rows

20. Please convert the result above to have cyl 4, 6, 8 as columns by selecting only mean and variable

## [1] "cyl"      "variable" "n"        "mean"     "median"   "sd"
## # A tibble: 7 × 4
##   variable    `4`    `6`    `8`
##   <fct>     <dbl>  <dbl>  <dbl>
## 1 mpg       26.7   19.7   15.1 
## 2 disp     105.   183.   353.  
## 3 hp        82.6  122.   209.  
## 4 drat       4.07   3.59   3.23
## 5 wt         2.29   3.12   4.00
## 6 qsec      19.1   18.0   16.8 
## 7 kw        62.0   91.7  157.

III. Inferrential statistics

21. Do one-sample t-test for mpg by selecting cyl=4

H0: mpg = 20

H1: mpg # 20

##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb    kw
## 1  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 69.75
## 2  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 46.50
## 3  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 71.25
## 4  32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1 49.50
## 5  30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2 39.00
## 6  33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 48.75
## 7  21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1 72.75
## 8  27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1 49.50
## 9  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2 68.25
## 10 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2 84.75
## 11 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2 81.75

Result

## # A tibble: 1 × 7
##   .y.   group1 group2         n statistic    df        p
## * <chr> <chr>  <chr>      <int>     <dbl> <dbl>    <dbl>
## 1 mpg   1      null model    11      4.90    10 0.000623

Check normality

## 
##  Shapiro-Wilk normality test
## 
## data:  df$mpg
## W = 0.94756, p-value = 0.1229

22. plot histogram

H0: mpg = 20

23. Do two-sample t-test for mpg by cyl = 4 & 8

H0: cyl4 = cyl8

H1: cyl4 # cyl8

##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb     kw
## 1  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  69.75
## 2  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 131.25
## 3  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4 183.75
## 4  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2  46.50
## 5  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2  71.25
## 6  16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3 135.00
## 7  17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3 135.00
## 8  15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3 135.00
## 9  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4 153.75
## 10 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4 161.25
## 11 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4 172.50
## 12 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1  49.50
## 13 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2  39.00
## 14 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1  48.75
## 15 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1  72.75
## 16 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2 112.50
## 17 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2 112.50
## 18 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4 183.75
## 19 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2 131.25
## 20 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1  49.50
## 21 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2  68.25
## 22 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2  84.75
## 23 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4 198.00
## 24 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8 251.25
## 25 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2  81.75

Result

## # A tibble: 1 × 8
##   .y.   group1 group2    n1    n2 statistic    df          p
## * <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>      <dbl>
## 1 mpg   4      8         11    14      7.60  15.0 0.00000164

24. Check normality

## # A tibble: 2 × 4
##     cyl variable statistic     p
##   <dbl> <chr>        <dbl> <dbl>
## 1     4 mpg          0.912 0.261
## 2     8 mpg          0.932 0.323

25. Check variance

## # A tibble: 1 × 9
##   .y.   group1 group2    n1    n2 statistic         p     p.adj p.adj.signif
## * <chr> <chr>  <chr>  <int> <int>     <dbl>     <dbl>     <dbl> <chr>       
## 1 mpg   4      8         11    14     -4.22 0.0000246 0.0000246 ****

26. Do two-sample t-test with unequal variance

## # A tibble: 1 × 8
##   .y.   group1 group2    n1    n2 statistic    df          p
## * <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl>      <dbl>
## 1 mpg   4      8         11    14      7.60  15.0 0.00000164

27. Do anova for wt by cyl

## Analysis of Variance Table
## 
## Response: wt
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## cyl        1 18.172 18.1723  47.379 1.218e-07 ***
## Residuals 30 11.507  0.3835                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

28. Do post-hoc

## 
## Study: fit ~ "cyl"
## 
## LSD t Test for wt 
## P value adjustment method: bonferroni 
## 
## Mean Square Error:  0.3835487 
## 
## cyl,  means and individual ( 95 %) CI
## 
##         wt       std  r        se      LCL      UCL   Min   Max    Q25   Q50
## 4 2.285727 0.5695637 11 0.1867299 1.904374 2.667081 1.513 3.190 1.8850 2.200
## 6 3.117143 0.3563455  7 0.2340783 2.639091 3.595195 2.620 3.460 2.8225 3.215
## 8 3.999214 0.7594047 14 0.1655184 3.661181 4.337248 3.170 5.424 3.5325 3.755
##       Q75
## 4 2.62250
## 6 3.44000
## 8 4.01375
## 
## Alpha: 0.05 ; DF Error: 30
## Critical Value of t: 2.535742 
## 
## Groups according to probability of means differences and alpha level( 0.05 )
## 
## Treatments with the same letter are not significantly different.
## 
##         wt groups
## 8 3.999214      a
## 6 3.117143      b
## 4 2.285727      c

29. Test normality

## 
##  Shapiro-Wilk normality test
## 
## data:  fit$residuals
## W = 0.89969, p-value = 0.006083

30. Test variance