This is my homework report for week 3, produced with R Markdown. In this homework I will be working on Cincinnati weather data and perform the following three steps :
Review the codebook
Learn about the data
Visualize the data
For this homework assignmen,t I used the following packages:
library(knitr) # for kniting r code to html files
library(gdata) # for scraping the .xlsx file in exercise
## gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
##
## gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
##
## Attaching package: 'gdata'
## The following object is masked from 'package:stats':
##
## nobs
## The following object is masked from 'package:utils':
##
## object.size
## The following object is masked from 'package:base':
##
## startsWith
library(ggplot2) # for creating graphs
library(lubridate) # for working with dates
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(scales) # to access breaks/formatting functions
library(gridExtra) # for arranging plots
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:gdata':
##
## combine
library(grid) # for arrangeing plots
The data fields in the data set used are: ‘Month’, ‘Day’, ‘Year’, ‘Average Daily Temperature (F)’.
“-99” has been used as a no-data flag when data was not available.
The data contains average daily temperatures from January, 1st 1995 to October, 19th 2016 of 167 international cities. Source data is taken from the Global Summary of the Day (GSOD) database archived by the National Climatic Data Center (NCDC).
cincy_url <- "http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt"
weather_cincy <- read.table(cincy_url, header = FALSE, sep = "", col.names = c('Month', 'Day', 'Year', 'AverageTemp'))
# number of rows and variables
dim(weather_cincy)
## [1] 7963 4
# names of variables
names(weather_cincy)
## [1] "Month" "Day" "Year" "AverageTemp"
head(weather_cincy)
tail(weather_cincy)
#set missing vaues to NA
weather_cincy$AverageTemp[weather_cincy$AverageTemp==-99]<- NA
#count of missing values
sum(is.na(weather_cincy$AverageTemp== TRUE))
## [1] 14
#summary of data set
summary(weather_cincy$AverageTemp)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -2.20 40.20 57.10 54.73 70.70 89.20 14
#Range of the year
range(weather_cincy$Year)
## [1] 1995 2016
#Create Date Variable
weather_cincy$Date <- as.Date(paste(weather_cincy$Year,
weather_cincy$Month, weather_cincy$Day, sep = "-" ),
format = "%Y-%m-%d")
weather_cincy = weather_cincy[weather_cincy$Year !=2016,]
I have created three different visualizations of this data set.
weather_cincy$month_name <- format(weather_cincy$Date,"%B")
ggplot(data = weather_cincy) +
geom_point(mapping = aes(x = Year, y = AverageTemp)) +
facet_wrap(~ month_name, nrow = 4)
This vizualization shows monthly facets of the average temperatures from year 1995 to 2015.
June -August have high temperatures, whereas November - December have low temperatures.
ggplot(data = weather_cincy) +
geom_smooth(mapping = aes(x = Year, y = AverageTemp),
stat="summary",
fun.y="mean")
This vizualization shows the avaerage temperature for each year in F, over 1995 to 2015.
Year 2012 had maximum average temperature. This might be because of global warming.
ggplot(data = weather_cincy)+
stat_summary(mapping = aes(x = Year, y = AverageTemp),
fun.ymin = min,
fun.ymax = max,
fun.y = median
)
This vizualization shows the range of temperature for each year and it’s median temperature
weather_cincy$Date <- as.Date(weather_cincy$Date)
TempDaily <- ggplot(weather_cincy, aes(x=Date, y=AverageTemp)) +
geom_point() +
ggtitle("Daily Air Temperature") +
xlab("Date") + ylab("Temperature (F)") +
scale_x_date(labels=date_format ("%b-%y"))+
theme(plot.title = element_text(lineheight=.8,size = 10)) +
theme(text = element_text(size=10))
TempDaily