in this module we will review the basic and fundemantal terms used in the context of data presentation.
There are five common measures used in the area of descriptive statistics namely:
There are four common measures of central tendency namely:
There are four different types of mean namely:
x<-c(11,12,23,25,21,22,23,24,25,26,12,11,10,9,8,3,33)
mean(x)
## [1] 17.52941
mean(x,0.2) # 20% trimmed mean
## [1] 17.63636
library(EnvStats)
## Warning: package 'EnvStats' was built under R version 4.1.3
##
## Attaching package: 'EnvStats'
## The following objects are masked from 'package:stats':
##
## predict, predict.lm
## The following object is masked from 'package:base':
##
## print.default
summaryFull(x)
## x
## N 17.0000000
## Mean 17.5300000
## Median 21.0000000
## 10% Trimmed Mean 17.4700000
## Geometric Mean 15.1600000
## Skew -0.0001133
## Kurtosis -1.1280000
## Min 3.0000000
## Max 33.0000000
## Range 30.0000000
## 1st Quartile 11.0000000
## 3rd Quartile 24.0000000
## Standard Deviation 8.4200000
## Geometric Standard Deviation 1.8440000
## Interquartile Range 13.0000000
## Median Absolute Deviation 13.3400000
## Coefficient of Variation 0.4803000
## attr(,"class")
## [1] "summaryStats"
## attr(,"stats.in.rows")
## [1] TRUE
## attr(,"drop0trailing")
## [1] TRUE
In this section we will review the five common graphs used for presentation of data in R
The box plot is used to present the five number summary of the data namely:
The box plot is utilised to detect the presence of outliers in the data.
y<-rnorm(1000) # Normal Distribution
boxplot(y)
mean(y)
## [1] -0.03557044
set.seed(123)
z<-rnorm(1000)
boxplot(z)
mean(z)
## [1] 0.01612787
boxplot(z,col="orange",border="blue",main="Normal Distribution Data",xlab="Time in Years",ylab="Cummulative Frequency")
plot(density(z))
plot(density(z),col="Dark Blue",border="blue",main="Normal Distribution Data",xlab="Time in Years",ylab="Cummulative Frequency")
## Warning in plot.window(...): "border" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "border" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "border" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "border" is not a
## graphical parameter
## Warning in box(...): "border" is not a graphical parameter
## Warning in title(...): "border" is not a graphical parameter
polygon(density(z),col="magenta",border="green",main="Normal Distribution Data",xlab="Time in Years",ylab="Cummulative Frequency")
set.seed(123)
data<-rweibull(1000,2,3) # Weibull Distribution
library(vioplot)
## Warning: package 'vioplot' was built under R version 4.1.3
## Loading required package: sm
## Warning: package 'sm' was built under R version 4.1.3
## Package 'sm', version 2.2-5.7: type help(sm) for summary information
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.1.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
vioplot(data)
vioplot(data, col="magenta", main="Weibull Distribution", xlab="Time in Months",
ylab="Average Width")