Descriptive Statistics

For this exercise, please upload the following dataset:

Download data_example1_Clean.xlsx here

Import dataset “data_example1_Clean.xlsx” and rename as data1

library(readxl)

## Warning: package 'readxl' was built under R version 3.6.1

data1=read_excel("data_example1_Clean.xlsx")

A.Descriptive Statistics for Categorical Data (1-Way Frequency Table)

*Notes:

first, we need to install package “gmodels” and load the package using function ‘library’

install.packages(“gmodels”)

library(gmodels)

## Warning: package 'gmodels' was built under R version 3.6.1

to generate frequency table for variable gender

freq.gender=CrossTable(data1$gender,format="SPSS")

## 
##    Cell Contents
## |-------------------------|
## |                   Count |
## |             Row Percent |
## |-------------------------|
## 
## Total Observations in Table:  28 
## 
##           |   Female  |     Male  | 
##           |-----------|-----------|
##           |       11  |       17  | 
##           |   39.286% |   60.714% | 
##           |-----------|-----------|
## 
##

B.Descriptive Statistics for Categorical Data (2-Way Frequency Table)

to generate frequency table for variable race by gender

Notes:* prop.r=TRUE=row percentage*

freq.gender.race=CrossTable(data1$gender,data1$race,expected=FALSE, prop.r=TRUE, prop.c=FALSE,prop.t=FALSE, prop.chisq=FALSE, chisq = FALSE, fisher=FALSE, mcnemar=FALSE, format="SPSS")

## 
##    Cell Contents
## |-------------------------|
## |                   Count |
## |             Row Percent |
## |-------------------------|
## 
## Total Observations in Table:  28 
## 
##              | data1$race 
## data1$gender |   Indian  |    Malay  | Row Total | 
## -------------|-----------|-----------|-----------|
##       Female |        3  |        8  |       11  | 
##              |   27.273% |   72.727% |   39.286% | 
## -------------|-----------|-----------|-----------|
##         Male |        5  |       12  |       17  | 
##              |   29.412% |   70.588% |   60.714% | 
## -------------|-----------|-----------|-----------|
## Column Total |        8  |       20  |       28  | 
## -------------|-----------|-----------|-----------|
## 
##

C.Descriptive Statistics for Numerical Data (1-Way Frequency Table)

first, we need to install package “psych” and load the package using function ’library

install.packages(“psych”)

library(psych)

## Warning: package 'psych' was built under R version 3.6.1

to describe variable ptage

desc.age=describe(data1$ptage,IQR = TRUE)
desc.age

##    vars  n  mean   sd median trimmed  mad min max range  skew kurtosis
## X1    1 27 41.78 5.63     44   41.91 5.93  34  48    14 -0.23    -1.68
##      se IQR
## X1 1.08  13

*Notes:

Distributions of numerical data can be checked using skewness(skew) and kurtosis values.

The data is normally distributed if skewness and kurtosis value lies between -1 to +1¹ and -3 to +3² respectively.

If the data is normally distributed, mean and standard deviation(sd) should be reported.

Median and interquartile range (IQR), if otherwise.

Thus for ptage, mean and sd should be reported as the data is normally distributed.

¹ Bulmer, M. G. (1979), Principles of Statistics. NY:Dover Books on Mathematics.

² Kevin P. Balanda and H.L. MacGillivray. “Kurtosis: A Critical Review”. The American Statistician 42:2 [May 1988], pp 111-119

C1. Extract and Combine Mean(SD) for Numerical Data (1-Way Frequency Table)

Notes:* ‘cbind’ is a funtion to combine values in column*

age.meansd=cbind("Mean"=desc.age$mean,"SD"=desc.age$sd)
age.meansd

##          Mean       SD
## [1,] 41.77778 5.625036

Exercise

Describe height, weight, bmi, sysbp and diasbp.

D.Descriptive Statistics for Numerical Data (2-Way Frequency Table)

to describe variable ptage by gender

Notes:* ‘mat=TRUE’=output in matrix format*

desc.age.gender=describeBy(data1$ptage,data1$gender,IQR=TRUE,mat = TRUE) 
desc.age.gender

##     item group1 vars  n     mean       sd median  trimmed    mad min max
## X11    1 Female    1 11 42.81818 6.096199   45.0 43.22222 4.4478  34  48
## X12    2   Male    1 16 41.06250 5.359960   41.5 41.07143 7.4130  34  48
##     range        skew  kurtosis       se IQR
## X11    14 -0.48040234 -1.705179 1.838073  11
## X12    14 -0.06648471 -1.720123 1.339990  10

D1. Extract and Combine Mean(SD) For Numerical Data (2-Way Frequency Table)

Notes:* ‘cbind’ is a funtion to combine values in column*

age.gender.meansd=cbind(desc.age.gender$mean,desc.age.gender$sd)
rownames(age.gender.meansd)=c("Male","Female")
colnames(age.gender.meansd)=c("Mean","SD")
age.gender.meansd

##            Mean       SD
## Male   42.81818 6.096199
## Female 41.06250 5.359960

Exercise:

Describe height, weight, bmi, sysbp and diasbp among male and female.

Exporting Results

to export result from R to Excel file

*Notes:

first, we need to install package “writexl” and load the package using function ’library

install.packages(“writexl”)

library(writexl)

## Warning: package 'writexl' was built under R version 3.6.1

export ‘age.gender.meansd’ results to excel file

write_xlsx(as.data.frame(age.gender.meansd), path="age.gender.meansd.xlsx")

then the excel file will appear in your project folder

Descriptive Statistics

Evi Diana Omar,Shahrul Aiman Soelar,Fatimah Diana Amin Nordin

September 6, 2019

A.Descriptive Statistics for Categorical Data (1-Way Frequency Table)

B.Descriptive Statistics for Categorical Data (2-Way Frequency Table)

C.Descriptive Statistics for Numerical Data (1-Way Frequency Table)

C1. Extract and Combine Mean(SD) for Numerical Data (1-Way Frequency Table)

D.Descriptive Statistics for Numerical Data (2-Way Frequency Table)

D1. Extract and Combine Mean(SD) For Numerical Data (2-Way Frequency Table)

Exporting Results