Chapter : Summarizing the Data Set

#Statisticians uses five number summary which describes a distribtion of data : Panch Tatva of Statistics
#In R we use fivenum(x) to get values of Panch Tatva
#summary(x) gives us "Five Number Summary + Mean"
#These five numbers are : 
#1: Sample Minimum or Smallest Value (Xmin)
#2: First or Lower Quartile (Q1)
#3: Median or Middle Value
#4: Third or Upper Quartile
#5: Sample Maximum or Largest Value
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.2
## -- Attaching packages -------------------------------- tidyverse 1.2.1 --
## v tibble  1.4.2     v purrr   0.2.5
## v tidyr   0.8.2     v dplyr   0.7.8
## v readr   1.3.0     v stringr 1.3.1
## v tibble  1.4.2     v forcats 0.3.0
## Warning: package 'tidyr' was built under R version 3.5.2
## Warning: package 'purrr' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.2
## Warning: package 'stringr' was built under R version 3.5.2
## Warning: package 'forcats' was built under R version 3.5.2
## -- Conflicts ----------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Drawing Histograms

ggplot(mpg,aes(displ)) + 
geom_histogram(col = "black") + 
theme(text = element_text(size=30))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Activity 2: Storms data.frame

#Instructions
#This activity has 2 parts.

#Part 1 - Follow these steps to generate your outputs for this activity:

#Load the storms data.frame contained in the tidyverse package, and
#Summarise the wind speed of storms contained in the storms data.frame. Note: To do this, you will need to give the mean, standard deviation, and five number summary of wind, and produce a histogram of wind.
#Part 2 - Once you have generated your outputs, answer the questions.
#Get data storms data.frame------------------------------------------
data(storms)
head(storms)
## # A tibble: 6 x 13
##   name   year month   day  hour   lat  long status category  wind pressure
##   <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr>  <ord>    <int>    <int>
## 1 Amy    1975     6    27     0  27.5 -79   tropi~ -1          25     1013
## 2 Amy    1975     6    27     6  28.5 -79   tropi~ -1          25     1013
## 3 Amy    1975     6    27    12  29.5 -79   tropi~ -1          25     1013
## 4 Amy    1975     6    27    18  30.5 -79   tropi~ -1          25     1013
## 5 Amy    1975     6    28     0  31.5 -78.8 tropi~ -1          25     1012
## 6 Amy    1975     6    28     6  32.4 -78.7 tropi~ -1          25     1012
## # ... with 2 more variables: ts_diameter <dbl>, hu_diameter <dbl>
#Mean Function-------------------------------------------------------
mean(storms$wind)
## [1] 53.495
#Standard Deviation Function-----------------------------------------
sd(storms$wind)
## [1] 26.21387
#Five Number summary Function-----------------------------------------
fivenum(storms$wind)
## [1]  10  30  45  65 160
#Summary Function-----------------------------------------------------
summary(storms$wind)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.00   30.00   45.00   53.49   65.00  160.00
#Drawing Histogram----------------------------------------------------
ggplot(storms,aes(wind)) + 
geom_histogram(col = "blue") + 
theme(text = element_text(size=30))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Q:1 What is the mean wind speed of storms in the storms data.frame?
#Ans : 53.495
#Q:2 What is the standard deviation of the wind speed of storms in the storms data.frame?
#Ans : 26.214
#Q:3 What is the lower quartile of wind speed of storms in the storms data.frame?
#Ans : 30
#Q:4 What is the maximum wind speed of storms in the storms data.frame?
#Ans : 160
#Q:5 Which range contains the highest peak on the histogram of wind speed for the storms data.frame?
#0-20
#21-40 
#41-60
#61-80
#81+
#Ans : 21-40