This HTML document is created from the associated R Markdown file. In the third week of this homework assignment, I am using Cincinnati weather data and performing basic data analysis over it. I have also created three graphs using R visualization functions to facilitate my analysis. The broad categories, I am covering in this R Markdown file are:
Review the codebook
Learn about the data
Visualize the data
Package(s) used in assignment to exceute R code are mentioned below:
library(ggplot2) ##Package to produce complex multi-layered graphs in R
Source data for the data set are taken from the Global Summary of the Day (GSOD) database archived by the National Climatic Data Center (NCDC). The data fields in the data set are - month, day, year, average daily temperature (F). The average daily temperatures are from Global Summary of the Day (GSOD) dataset and are computed from 24 hourly temperature readings. The dataset contains data from January 1, 1995 to present.
Moreover, “-99” is used as missing value flag whenever data are not available.
#Creating data frame after extracting data from given URL
url <- "http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt"
cincy_weather <- read.table(url,header=FALSE,sep="",col.names=c("Month","Day","Year","AverageTemp"))
#Number of rows and variables
dim(cincy_weather)
## [1] 7963 4
#Names of variables
names(cincy_weather)
## [1] "Month" "Day" "Year" "AverageTemp"
#Checking top and bottom values
head(cincy_weather)
## Month Day Year AverageTemp
## 1 1 1 1995 41.1
## 2 1 2 1995 22.2
## 3 1 3 1995 22.8
## 4 1 4 1995 14.9
## 5 1 5 1995 9.5
## 6 1 6 1995 23.8
tail(cincy_weather)
## Month Day Year AverageTemp
## 7958 10 14 2016 54.4
## 7959 10 15 2016 63.2
## 7960 10 16 2016 68.7
## 7961 10 17 2016 71.1
## 7962 10 18 2016 74.4
## 7963 10 19 2016 75.3
#Replace missing values equal to -99 with NA
cincy_weather$AverageTemp[cincy_weather$AverageTemp==-99] <-NA
#Counting missing values
sum(is.na(cincy_weather$AverageTemp==TRUE))
## [1] 14
# See what all rows have incomplete data
cincy_weather[!complete.cases(cincy_weather),]
## Month Day Year AverageTemp
## 1454 12 24 1998 NA
## 1455 12 25 1998 NA
## 1460 12 30 1998 NA
## 1461 12 31 1998 NA
## 1471 1 10 1999 NA
## 2726 6 18 2002 NA
## 2727 6 19 2002 NA
## 2728 6 20 2002 NA
## 2729 6 21 2002 NA
## 2807 9 7 2002 NA
## 2982 3 1 2003 NA
## 4623 8 28 2007 NA
## 5016 9 24 2008 NA
## 5213 4 9 2009 NA
#Omitting rows with missing values
cincy_weather_final <- na.omit(cincy_weather)
#Displaying summary statistics of final dataframe
summary(cincy_weather_final)
## Month Day Year AverageTemp
## Min. : 1.000 Min. : 1.00 Min. :1995 Min. :-2.20
## 1st Qu.: 4.000 1st Qu.: 8.00 1st Qu.:2000 1st Qu.:40.20
## Median : 6.000 Median :16.00 Median :2005 Median :57.10
## Mean : 6.477 Mean :15.71 Mean :2005 Mean :54.73
## 3rd Qu.: 9.000 3rd Qu.:23.00 3rd Qu.:2011 3rd Qu.:70.70
## Max. :12.000 Max. :31.00 Max. :2016 Max. :89.20
#Graph 1
ggplot(data=cincy_weather_final) +
geom_smooth(mapping = aes(x = Year, y = AverageTemp),color="tomato", se = FALSE) +
facet_wrap(~ Month, nrow = 5) +
ggtitle("Monthwise Average Temperature") +
ylab("Average Temperature(F)")
#Graph 2
ggplot(data=cincy_weather_final) +
geom_point(mapping = aes(x = Year,y = AverageTemp),color="tomato") +
geom_smooth(mapping = aes(x = Year, y = AverageTemp),color="tomato") +
ggtitle("Yearwise Average Temperature") +
ylab("Average Temperature(F)")
#Graph 3
ggplot(data = cincy_weather_final) +
stat_summary(mapping = aes(x = Year, y = AverageTemp),
color="tomato",
geom = "pointrange",
fun.ymin = min,
fun.ymax = max,
fun.y = median
) +
ggtitle("Summarised Average Temperature") +
ylab("Average Temperature(F)")