Model 5: Data presentation in R

In this model, we will review the basic and fundamental terms used in the contect of data presentation.

Descriptive Statistics in R

There are 5 Common measures used in the area of descriptive statistics namely:

  1. Measures of central tendancy
  2. Measures of dispersion
  3. Measures of Position
  4. Measures of Shape
  5. Measures of Frequency

Measures of Central Tendancy

There are four common measures of central tendancy namely:

  • Mean
  • Mode
  • Median
  • Mid-range

Measures of Mean

There are four different types of mean namely:

  • Arithmetic mean
  • Trimmed mean
  • Geometric mean
  • Harmonic mean
x<-c(11,12,23,25,21,22,23,24,25,26,12,11,10,9,8,3,33)
mean(x)
## [1] 17.52941
mean(x,0,20) # This means tells us the trimmed in 20%
## [1] 17.52941
library(EnvStats)
## 
## Attaching package: 'EnvStats'
## The following objects are masked from 'package:stats':
## 
##     predict, predict.lm
## The following object is masked from 'package:base':
## 
##     print.default
geoMean(x)
## [1] 15.16286
summaryFull(x)
##                                       x
## N                            17.0000000
## Mean                         17.5300000
## Median                       21.0000000
## 10% Trimmed Mean             17.4700000
## Geometric Mean               15.1600000
## Skew                         -0.0001133
## Kurtosis                     -1.1280000
## Min                           3.0000000
## Max                          33.0000000
## Range                        30.0000000
## 1st Quartile                 11.0000000
## 3rd Quartile                 24.0000000
## Standard Deviation            8.4200000
## Geometric Standard Deviation  1.8440000
## Interquartile Range          13.0000000
## Median Absolute Deviation    13.3400000
## Coefficient of Variation      0.4803000
## attr(,"class")
## [1] "summaryStats"
## attr(,"stats.in.rows")
## [1] TRUE
## attr(,"drop0trailing")
## [1] TRUE

Data Visualization in R

In this section wwe will review the fice common graphs used for presention of data in R.

Box plot

The Box Plot is used to the five number summary of the data namely:

  • Minimum
  • First Quartile
  • Median
  • Third Quartile
  • Maximum

The box plot is utlized to detect the presence of outliers in the data.

y<-rnorm(1000) # Normal Distribution
boxplot(y,col = "red")

mean(y)
## [1] -0.006555659
set.seed(123)
z<-rnorm(1000)
boxplot(z,col = "black", border = "dark blue", main= "Normal Disturbution Data", ylab= "Time elapsing", xlab = "Cumulative frequancy")

plot(density(z))

plot(density(z), col= "green")
polygon(density(z), col = "gold", border = "red")
polygon(density(z), col = "orange", border = "red")