This is my homework report for week 3, produced with R Markdown. In this homework I imported Cincinnati whether data and studied its code book and learn different aspects of data. In the third step I visualized data to gain better understanding
Summary:
The data is weather data for cincinnati for over a period of 22 years starting from 1995 to 2016.It seems clear from the data that July to September are the hottest month of the year whereas November to February are the coolest month. Overall yearly variation in temprature seems to be constant, however there is slight increase in average temprature for 2016.
There is not much of a variation in temprature at different days when we compare temprature under a perticular month. The daily temprature remains high for summers and low for winters.
library(tidyverse) ## this package is used for plotting options in graph
The data discription can be found at:
http://academic.udayton.edu/kissock/http/Weather/source.htm
Note on missing data for the whether data set:
http://academic.udayton.edu/kissock/http/Weather/missingdata.htm
setwd("C:/tauseef/data_wrangling/Data Wrangling with R (BANA 8090)")
filename<-"http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt"
col_names<-c("month","day","year",
"avg_daily_temp")
ohcincin<-read.table(filename,header=F,sep="",col.names = col_names,strip.white = T )
str(ohcincin)
## 'data.frame': 7963 obs. of 4 variables:
## $ month : int 1 1 1 1 1 1 1 1 1 1 ...
## $ day : int 1 2 3 4 5 6 7 8 9 10 ...
## $ year : int 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
## $ avg_daily_temp: num 41.1 22.2 22.8 14.9 9.5 23.8 31.1 26.9 31.3 31.5 ...
nrow(ohcincin) ## number of observations in data
## [1] 7963
ncol(ohcincin) ## number of variables in data
## [1] 4
###need to cahnge data format of month day and year to factor from integer###
ohcincin$month<-as.factor(ohcincin$month)
ohcincin$day<-as.factor(ohcincin$day)
ohcincin$year<-as.factor(ohcincin$year)
########################### replacing the missing values which are in the form of -99 to NA####
ohcincin[ohcincin == -99] <- NA
ohcincin_n<-na.omit(ohcincin)
######################summary stats after removing missing values##########
head(ohcincin_n)
## month day year avg_daily_temp
## 1 1 1 1995 41.1
## 2 1 2 1995 22.2
## 3 1 3 1995 22.8
## 4 1 4 1995 14.9
## 5 1 5 1995 9.5
## 6 1 6 1995 23.8
str(ohcincin_n)
## 'data.frame': 7949 obs. of 4 variables:
## $ month : Factor w/ 12 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ day : Factor w/ 31 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ year : Factor w/ 22 levels "1995","1996",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ avg_daily_temp: num 41.1 22.2 22.8 14.9 9.5 23.8 31.1 26.9 31.3 31.5 ...
## - attr(*, "na.action")=Class 'omit' Named int [1:14] 1454 1455 1460 1461 1471 2726 2727 2728 2729 2807 ...
## .. ..- attr(*, "names")= chr [1:14] "1454" "1455" "1460" "1461" ...
nrow(ohcincin_n) ## number of observations in data
## [1] 7949
ncol(ohcincin_n) ## number of variables in data
## [1] 4
sum(is.na(ohcincin_n)) ## missing values in the data
## [1] 0
summary(ohcincin_n)
## month day year avg_daily_temp
## 5 : 682 2 : 262 1996 : 366 Min. :-2.20
## 7 : 682 3 : 262 2000 : 366 1st Qu.:40.20
## 1 : 681 4 : 262 2004 : 366 Median :57.10
## 3 : 681 5 : 262 2012 : 366 Mean :54.73
## 8 : 681 6 : 262 1995 : 365 3rd Qu.:70.70
## 10 : 670 8 : 262 1997 : 365 Max. :89.20
## (Other):3872 (Other):6377 (Other):5755
ggplot(data = ohcincin_n, mapping = aes(x = year, y = avg_daily_temp)) +
geom_boxplot()
The above graph shows yearly min maximum amd median temprature at Cincinnati. Year 2016 seems to be hottest year with in a decade
ggplot(data = ohcincin_n) +
geom_point(mapping = aes(x = month, y = avg_daily_temp))
The above graph shows monthly variation of average daily temprature at Cincinnati. April to July seems to be hottest and November to February seems to be the coolest months of the year.
ggplot(data = ohcincin_n, mapping = aes(x = day, y = avg_daily_temp)) +
stat_summary( mapping = aes(x = day, y = avg_daily_temp),
fun.ymin = min,
fun.ymax = max,
fun.y = median )+
facet_wrap(~ month, nrow = 6)
The daily temprature remains constant for a pirticular month.