Module 5: Data Presentation in R

in this module we will review the basic and fundemantal terms used in the context of data presentation.

Descriptive Statistics in R

There are five common measures used in the area of descriptive statistics namely:

  1. Measures of Central Tendency
  2. Measures of Dispersion
  3. Measures of Position
  4. Measures of Shape
  5. Measures of Frequency

Measures of Central Tendency

There are four common measures of central tendency namely:

  • Mean
  • Mode
  • Median
  • Mid-range

Mean

There are four different types of mean namely:

  • Arithmetic Mean
  • Trimmed Mean
  • Geometric Mean
  • Harmonic Mean
x<-c(11,12,23,25,21,22,23,24,25,26,12,11,10,9,8,3,33)
mean(x)
## [1] 17.52941
mean(x,0.2) # 20% trimmed mean
## [1] 17.63636
library(EnvStats)
## Warning: package 'EnvStats' was built under R version 4.1.3
## 
## Attaching package: 'EnvStats'
## The following objects are masked from 'package:stats':
## 
##     predict, predict.lm
## The following object is masked from 'package:base':
## 
##     print.default
summaryFull(x)
##                                       x
## N                            17.0000000
## Mean                         17.5300000
## Median                       21.0000000
## 10% Trimmed Mean             17.4700000
## Geometric Mean               15.1600000
## Skew                         -0.0001133
## Kurtosis                     -1.1280000
## Min                           3.0000000
## Max                          33.0000000
## Range                        30.0000000
## 1st Quartile                 11.0000000
## 3rd Quartile                 24.0000000
## Standard Deviation            8.4200000
## Geometric Standard Deviation  1.8440000
## Interquartile Range          13.0000000
## Median Absolute Deviation    13.3400000
## Coefficient of Variation      0.4803000
## attr(,"class")
## [1] "summaryStats"
## attr(,"stats.in.rows")
## [1] TRUE
## attr(,"drop0trailing")
## [1] TRUE

Data Visualisation in R

In this section we will review the five common graphs used for presentation of data in R

Box Plot

The box plot is used to present the five number summary of the data namely:

  • Minimum
  • First Quartile
  • Median (Q2)
  • Third Quartile
  • Maximum

The box plot is utilised to detect the presence of outliers in the data.

y<-rnorm(1000) # Normal Distribution
boxplot(y)

mean(y)
## [1] -0.03557044
set.seed(123)
z<-rnorm(1000)
boxplot(z)

mean(z)
## [1] 0.01612787
boxplot(z,col="orange",border="blue",main="Normal Distribution Data",xlab="Time in Years",ylab="Cummulative Frequency")

plot(density(z))

plot(density(z),col="Dark Blue",border="blue",main="Normal Distribution Data",xlab="Time in Years",ylab="Cummulative Frequency")
## Warning in plot.window(...): "border" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "border" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "border" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "border" is not a
## graphical parameter
## Warning in box(...): "border" is not a graphical parameter
## Warning in title(...): "border" is not a graphical parameter
polygon(density(z),col="magenta",border="green",main="Normal Distribution Data",xlab="Time in Years",ylab="Cummulative Frequency")

Vioplot

set.seed(123)
data<-rweibull(1000,2,3)  # Weibull Distribution
library(vioplot)
## Warning: package 'vioplot' was built under R version 4.1.3
## Loading required package: sm
## Warning: package 'sm' was built under R version 4.1.3
## Package 'sm', version 2.2-5.7: type help(sm) for summary information
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.1.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
vioplot(data)

vioplot(data, col="magenta", main="Weibull Distribution", xlab="Time in Months",
        ylab="Average Width")