Descriptive statistics

Import Library

library(ggplot2)  # For graphic

Load data set

We will use “msleep” data sets from “ggplot2” packege.

data("msleep")
head(msleep)
## # A tibble: 6 × 11
##   name    genus vore  order conservation sleep_total sleep_rem sleep_cycle awake
##   <chr>   <chr> <chr> <chr> <chr>              <dbl>     <dbl>       <dbl> <dbl>
## 1 Cheetah Acin… carni Carn… lc                  12.1      NA        NA      11.9
## 2 Owl mo… Aotus omni  Prim… <NA>                17         1.8      NA       7  
## 3 Mounta… Aplo… herbi Rode… nt                  14.4       2.4      NA       9.6
## 4 Greate… Blar… omni  Sori… lc                  14.9       2.3       0.133   9.1
## 5 Cow     Bos   herbi Arti… domesticated         4         0.7       0.667  20  
## 6 Three-… Brad… herbi Pilo… <NA>                14.4       2.2       0.767   9.6
## # ℹ 2 more variables: brainwt <dbl>, bodywt <dbl>

{?msleep}

Numerical Data

Let’s see how many observations & variable are in this data set.

str(msleep)
## tibble [83 × 11] (S3: tbl_df/tbl/data.frame)
##  $ name        : chr [1:83] "Cheetah" "Owl monkey" "Mountain beaver" "Greater short-tailed shrew" ...
##  $ genus       : chr [1:83] "Acinonyx" "Aotus" "Aplodontia" "Blarina" ...
##  $ vore        : chr [1:83] "carni" "omni" "herbi" "omni" ...
##  $ order       : chr [1:83] "Carnivora" "Primates" "Rodentia" "Soricomorpha" ...
##  $ conservation: chr [1:83] "lc" NA "nt" "lc" ...
##  $ sleep_total : num [1:83] 12.1 17 14.4 14.9 4 14.4 8.7 7 10.1 3 ...
##  $ sleep_rem   : num [1:83] NA 1.8 2.4 2.3 0.7 2.2 1.4 NA 2.9 NA ...
##  $ sleep_cycle : num [1:83] NA NA NA 0.133 0.667 ...
##  $ awake       : num [1:83] 11.9 7 9.6 9.1 20 9.6 15.3 17 13.9 21 ...
##  $ brainwt     : num [1:83] NA 0.0155 NA 0.00029 0.423 NA NA NA 0.07 0.0982 ...
##  $ bodywt      : num [1:83] 50 0.48 1.35 0.019 600 ...

In the “msleep” data set, there are 83 observations & 11 variables

Now let’s calculate statistics summary only for the numerical variables

summary(msleep[, 6:11], na.rm = TRUE)
##   sleep_total      sleep_rem      sleep_cycle         awake      
##  Min.   : 1.90   Min.   :0.100   Min.   :0.1167   Min.   : 4.10  
##  1st Qu.: 7.85   1st Qu.:0.900   1st Qu.:0.1833   1st Qu.:10.25  
##  Median :10.10   Median :1.500   Median :0.3333   Median :13.90  
##  Mean   :10.43   Mean   :1.875   Mean   :0.4396   Mean   :13.57  
##  3rd Qu.:13.75   3rd Qu.:2.400   3rd Qu.:0.5792   3rd Qu.:16.15  
##  Max.   :19.90   Max.   :6.600   Max.   :1.5000   Max.   :22.10  
##                  NA's   :22      NA's   :51                      
##     brainwt            bodywt        
##  Min.   :0.00014   Min.   :   0.005  
##  1st Qu.:0.00290   1st Qu.:   0.174  
##  Median :0.01240   Median :   1.670  
##  Mean   :0.28158   Mean   : 166.136  
##  3rd Qu.:0.12550   3rd Qu.:  41.750  
##  Max.   :5.71200   Max.   :6654.000  
##  NA's   :27

Categorical Data

table(msleep$vore)
## 
##   carni   herbi insecti    omni 
##      19      32       5      20
proportions(table(msleep$vore))
## 
##      carni      herbi    insecti       omni 
## 0.25000000 0.42105263 0.06578947 0.26315789

Plotitng Numerical Data

check the reltionship between 2 numerical veriables

ggplot(msleep, aes(x = sleep_total, y = sleep_rem))+
  geom_point()
## Warning: Removed 22 rows containing missing values (`geom_point()`).