Chapter : Summarizing the Data Set
#Statisticians uses five number summary which describes a distribtion of data : Panch Tatva of Statistics
#In R we use fivenum(x) to get values of Panch Tatva
#summary(x) gives us "Five Number Summary + Mean"
#These five numbers are :
#1: Sample Minimum or Smallest Value (Xmin)
#2: First or Lower Quartile (Q1)
#3: Median or Middle Value
#4: Third or Upper Quartile
#5: Sample Maximum or Largest Value
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.2
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.2
## -- Attaching packages -------------------------------- tidyverse 1.2.1 --
## v tibble 1.4.2 v purrr 0.2.5
## v tidyr 0.8.2 v dplyr 0.7.8
## v readr 1.3.0 v stringr 1.3.1
## v tibble 1.4.2 v forcats 0.3.0
## Warning: package 'tidyr' was built under R version 3.5.2
## Warning: package 'purrr' was built under R version 3.5.2
## Warning: package 'dplyr' was built under R version 3.5.2
## Warning: package 'stringr' was built under R version 3.5.2
## Warning: package 'forcats' was built under R version 3.5.2
## -- Conflicts ----------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Drawing Histograms
ggplot(mpg,aes(displ)) +
geom_histogram(col = "black") +
theme(text = element_text(size=30))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Activity 2: Storms data.frame
#Instructions
#This activity has 2 parts.
#Part 1 - Follow these steps to generate your outputs for this activity:
#Load the storms data.frame contained in the tidyverse package, and
#Summarise the wind speed of storms contained in the storms data.frame. Note: To do this, you will need to give the mean, standard deviation, and five number summary of wind, and produce a histogram of wind.
#Part 2 - Once you have generated your outputs, answer the questions.
#Get data storms data.frame------------------------------------------
data(storms)
head(storms)
## # A tibble: 6 x 13
## name year month day hour lat long status category wind pressure
## <chr> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <chr> <ord> <int> <int>
## 1 Amy 1975 6 27 0 27.5 -79 tropi~ -1 25 1013
## 2 Amy 1975 6 27 6 28.5 -79 tropi~ -1 25 1013
## 3 Amy 1975 6 27 12 29.5 -79 tropi~ -1 25 1013
## 4 Amy 1975 6 27 18 30.5 -79 tropi~ -1 25 1013
## 5 Amy 1975 6 28 0 31.5 -78.8 tropi~ -1 25 1012
## 6 Amy 1975 6 28 6 32.4 -78.7 tropi~ -1 25 1012
## # ... with 2 more variables: ts_diameter <dbl>, hu_diameter <dbl>
#Mean Function-------------------------------------------------------
mean(storms$wind)
## [1] 53.495
#Standard Deviation Function-----------------------------------------
sd(storms$wind)
## [1] 26.21387
#Five Number summary Function-----------------------------------------
fivenum(storms$wind)
## [1] 10 30 45 65 160
#Summary Function-----------------------------------------------------
summary(storms$wind)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 30.00 45.00 53.49 65.00 160.00
#Drawing Histogram----------------------------------------------------
ggplot(storms,aes(wind)) +
geom_histogram(col = "blue") +
theme(text = element_text(size=30))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Q:1 What is the mean wind speed of storms in the storms data.frame?
#Ans : 53.495
#Q:2 What is the standard deviation of the wind speed of storms in the storms data.frame?
#Ans : 26.214
#Q:3 What is the lower quartile of wind speed of storms in the storms data.frame?
#Ans : 30
#Q:4 What is the maximum wind speed of storms in the storms data.frame?
#Ans : 160
#Q:5 Which range contains the highest peak on the histogram of wind speed for the storms data.frame?
#0-20
#21-40
#41-60
#61-80
#81+
#Ans : 21-40