Let us first read the data as follows,
df = read.csv("C:/Users/Lenovo/OneDrive/Desktop/dataset (1).csv",
header = TRUE)
head(df)
Now let us calculate the descriptive statistics for the data as follows,
summary(df)
## UID STATEID state.Name hs_degree
## Min. : 38 Min. :32.00 Length:10771 Min. :0.1308
## 1st Qu.: 59962 1st Qu.:34.00 Class :character 1st Qu.:0.8243
## Median :119814 Median :34.00 Mode :character Median :0.9080
## Mean :151350 Mean :33.53 Mean :0.8714
## 3rd Qu.:219980 3rd Qu.:34.00 3rd Qu.:0.9559
## Max. :265927 Max. :34.00 Max. :1.0000
## NA's :56 NA's :56 NA's :36
## hs_degree_male hs_degree_female
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.8183 1st Qu.:0.8239
## Median :0.9113 Median :0.9094
## Mean :0.8701 Mean :0.8729
## 3rd Qu.:0.9635 3rd Qu.:0.9598
## Max. :1.0000 Max. :1.0000
## NA's :41 NA's :50
Here observe that, for UID, mean > median, so UID maybe positively skewed. Again, observe that, for STATEID, hs_degree, hs_degree_male, hs_degree_female, mean < median, so they are maybe negatively skewed.
Now let us plot the histogram of them as follows,
par(mfrow=c(2,2))
hist(df$UID)
hist(df$hs_degree)
hist(df$hs_degree_male)
hist(df$hs_degree_female)
Here from histogram, observe that, hs_degree, hs_degree_male, hs_degree_female are negatively skewed.
The skewness for variables are as follows,
library(moments)
skewness(df[,c(1,2,4,5,6)], na.rm = TRUE)
## UID STATEID hs_degree hs_degree_male
## -0.0805874 -1.2647848 -1.6541721 -1.6645290
## hs_degree_female
## -1.6765083
Here observe that all variables are negatively skewed.
Now, the kurtusis for variables are as follows,
kurtosis(df[,c(1,2,4,5,6)], na.rm = TRUE)
## UID STATEID hs_degree hs_degree_male
## 1.751474 2.599681 6.177343 6.267130
## hs_degree_female
## 6.468598
Here observe that, hs_degree, hs_degree_male, hs_degree_female are are leptokurtic whereas UID, STATEID are platykurtic.