—————————————————————————
Student Name : Sachid Deshmukh
—————————————————————————
1] Library Initialization
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.4.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
2] Data Set-1 Analysis
Read Crime Rate data for three countries reported in year 1999
Crime.Rate = read.csv("https://raw.githubusercontent.com/mlforsachid/MSDSQ1/master/Data607/Week6/CrimeRate.csv", stringsAsFactors = F)
Preview Data
head(Crime.Rate)
## country year crimeinfo value
## 1 Afghanistan 1999 cases 745
## 2 Afghanistan 1999 population 19987071
## 3 Brazil 1999 cases 37737
## 4 Brazil 1999 population 172006362
## 5 China 1999 cases 212258
## 6 China 1999 population 1272915272
str(Crime.Rate)
## 'data.frame': 6 obs. of 4 variables:
## $ country : chr "Afghanistan" "Afghanistan" "Brazil" "Brazil" ...
## $ year : int 1999 1999 1999 1999 1999 1999
## $ crimeinfo: chr "cases" "population" "cases" "population" ...
## $ value : int 745 19987071 37737 172006362 212258 1272915272
Note that crimeinfo column is stacked. Cases indicates the crime cases reported for a specific country and population indicates total population of the country. Let’s spread crimeinfo column
Crime.Rate = tidyr::spread(Crime.Rate, crimeinfo, value)
Preview unpivoted Crime Rate
head(Crime.Rate)
## country year cases population
## 1 Afghanistan 1999 745 19987071
## 2 Brazil 1999 37737 172006362
## 3 China 1999 212258 1272915272
Note how crimerate column is spread based on categories. This also flattened the whole data frame.
3] Data Set-2 Analysis
Read Student Grade data for three subjects
Stu.Grade = read.csv("https://raw.githubusercontent.com/mlforsachid/MSDSQ1/master/Data607/Week6/StudentGrades.csv", stringsAsFactors = F)
Preview Data
head(Stu.Grade)
## name math science history
## 1 James 68 56 80
## 2 Bob 90 50 67
## 3 Amit 45 89 90
str(Stu.Grade)
## 'data.frame': 3 obs. of 4 variables:
## $ name : chr "James" "Bob" "Amit"
## $ math : int 68 90 45
## $ science: int 56 50 89
## $ history: int 80 67 90
Note that each grades for a particular subject are on different column. Let’s create a single Subject column.
Stu.Grade = tidyr::gather(Stu.Grade, "subject", "grades", 2:4)
Preview unpivoted Crime Rate
head(Stu.Grade)
## name subject grades
## 1 James math 68
## 2 Bob math 90
## 3 Amit math 45
## 4 James science 56
## 5 Bob science 50
## 6 Amit science 89
Note how individual columns for subject are collapsed into single column. The values are captured under newly creared grades column
4] Data Set-3 Analysis
Read City Temperature data for three cities
City.Temp = read.csv("https://raw.githubusercontent.com/mlforsachid/MSDSQ1/master/Data607/Week6/CityTemp.csv", stringsAsFactors = F)
Preview Data
head(City.Temp)
## city date temp
## 1 Redmond 10/01/2018 40
## 2 Bellevue 10/02/2018 38
## 3 Seattle 10/03/2018 42
str(City.Temp)
## 'data.frame': 3 obs. of 3 variables:
## $ city: chr "Redmond" "Bellevue" "Seattle"
## $ date: chr "10/01/2018" "10/02/2018" "10/03/2018"
## $ temp: int 40 38 42
Note Date column. Let’s separate Month, Day and Year into separate columns
City.Temp = tidyr::separate(City.Temp, "date", c("month", "day", "year"), sep="/")
Preview unpivoted Crime Rate
head(City.Temp)
## city month day year temp
## 1 Redmond 10 01 2018 40
## 2 Bellevue 10 02 2018 38
## 3 Seattle 10 03 2018 42
Note how date column is splitted across three separate columns (month, day and year)