This is the assignment report for week-3. In this assignment, I have worked on a data set by analyzing the data and creating a Vizulization. By working on week-3 homework I have learned various ways in which data can be presented and analyzed using R.
To complete this assignment and run the codes I have used the following packages:
library(gdata)#scrapping excel files from url #
library(tidyverse) # using ggplot#
library(dplyr) #To calculate year wise mean#
The data set has 4 variables described below:
Year: The years ranging from 1995 to 2016.
Month: The Months in a year represented by number.
Day: The days in a Month.
Avg_Temp: The Average Temperature for that particular day.
‘-99’ has been used as a no data flag for the data values which were not available.
The data contains average daily temperatures from January, 1st 1995 to October, 19th 2016. The source data for these files are from the Global Summary of the Day (GSOD) database archived by the National Climatic Data Center (NCDC). The average daily temperatures posted on this site are computed from 24 hourly temperature readings in the Global Summary of the Day (GSOD) data.
cincy_url <- "http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt"
weather <- read.table(cincy_url, header = FALSE, sep = "", col.names = c('Month', 'Day', 'Year', 'Avg_Temp'))
ncol(weather)
## [1] 4
nrow(weather)
## [1] 7963
names(weather)
## [1] "Month" "Day" "Year" "Avg_Temp"
range(weather$Year)
## [1] 1995 2016
weather$Avg_Temp[weather$Avg_Temp==-99]<- NA
sum(is.na(weather$Avg_Temp==T))
## [1] 14
str(weather)
## 'data.frame': 7963 obs. of 4 variables:
## $ Month : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Year : int 1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
## $ Avg_Temp: num 41.1 22.2 22.8 14.9 9.5 23.8 31.1 26.9 31.3 31.5 ...
summary(weather$Avg_Temp)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -2.20 40.20 57.10 54.73 70.70 89.20 14
weather %>%
group_by(Month) %>%
summarise_each(funs(mean(Avg_Temp, na.rm=TRUE)))
## # A tibble: 12 × 4
## Month Day Year Avg_Temp
## <int> <dbl> <dbl> <dbl>
## 1 1 31.13157 31.13157 31.13157
## 2 2 33.79003 33.79003 33.79003
## 3 3 43.81483 43.81483 43.81483
## 4 4 54.89772 54.89772 54.89772
## 5 5 64.04047 64.04047 64.04047
## 6 6 72.16692 72.16692 72.16692
## 7 7 75.38490 75.38490 75.38490
## 8 8 74.96579 74.96579 74.96579
## 9 9 67.90091 67.90091 67.90091
## 10 10 56.03746 56.03746 56.03746
## 11 11 44.66365 44.66365 44.66365
## 12 12 35.53354 35.53354 35.53354
I have created three visualizations based on the dataset.
1. Visualization # 1 This Visualization shows monthwise distribution of Average Temperature with Year
ggplot(data = weather) +
geom_point(mapping = aes(x = Year, y = Avg_Temp)) +
facet_wrap(~Month, nrow=6) +
ggtitle("Average Temperature v/s Year")
2. Visualization # 2 This Visualization shows Yearwise distribution of Average Temperature with Month
ggplot(data = weather) +
geom_smooth(
mapping = aes(x = Month, y = Avg_Temp, group = Year, colour=Year),
show.legend = FALSE
) +
ggtitle("Average Temperature v/s Month")
#Inference : Average Temperature has rised over the years.#
3. Visualization # 3 This Visualization gives the graphical summary of the distribution of Average Temperature with year
ggplot(data = weather) +
stat_summary(
mapping = aes(x = Year, y = Avg_Temp),
fun.ymin = min,
fun.ymax = max,
fun.y = median
) +
ggtitle("Statistical Summary of Average Temperature with Year")