library("dplyr")
library("data.table")
library("moments")
library("outliers")
library("ggplot2")

Reading the data (House prices Kaggle Data)

Structure

Central tendency

The central tendency measures answer the most basic question of which value is the most “typical”.

## [1] 180921.2
## [1] 163000
## [1] 140000

Variability

The variability mesures provide information about dispersion of the data.

## Minimun Sale price
## [1] 34900
## Maximun Sale price
## [1] 755000
## Range of Sale Price
## [1]  34900 755000
## [1] 720100
## [1]  34900 129950 163000 214000 755000
##      10%      45%      67%      98% 
## 106475.0 155000.0 191000.0 394931.1
## [1] 84025
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   34900  129975  163000  180921  214000  755000
## [1] 6311111264
## [1] 79442.5
## [1] 68082.77
## [1] 56338.8

Mean absolute deviation Formula

\[ MAD = \frac{1}{n}\sum_{i=1}^n|x_i-\mu| \]

Median absolute deviation formula

\[ MAD = median(|x_i-median(x)|) \]

Shape

Two additional measures of a distribution that you will hear occasionally include skewness and kurtosis. Skewness is a measure of symmetry for a distribution. Negative values represent a left-skewed distribution where there are more extreme values to the left causing the mean to be less than the median. Positive values represent a right-skewed distribution where there are more extreme values to the right causing the mean to be more than the median.

## [1] 1.880941
## [1] 9.509812

Outliers

Outliers in data can distort predictions and affect their accuracy. Consequently, its important to understand if outliers are present and, if so, which observations are considered outliers.

## [1] 755000

Visualization