Synopsis

This is the assignment report for week-3. In this assignment, I have worked on a data set by analyzing the data and creating a Vizulization. By working on week-3 homework I have learned various ways in which data can be presented and analyzed using R.

Packages Required

To complete this assignment and run the codes I have used the following packages:

library(gdata)#scrapping excel files from url #
library(tidyverse) # using ggplot#
library(dplyr) #To calculate year wise mean#

Source Code

The data set has 4 variables described below:

Year: The years ranging from 1995 to 2016.

Month: The Months in a year represented by number.

Day: The days in a Month.

Avg_Temp: The Average Temperature for that particular day.

‘-99’ has been used as a no data flag for the data values which were not available.

Data Description

The data contains average daily temperatures from January, 1st 1995 to October, 19th 2016. The source data for these files are from the Global Summary of the Day (GSOD) database archived by the National Climatic Data Center (NCDC). The average daily temperatures posted on this site are computed from 24 hourly temperature readings in the Global Summary of the Day (GSOD) data.

cincy_url <- "http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt"
weather <- read.table(cincy_url, header = FALSE, sep = "", col.names = c('Month', 'Day', 'Year', 'Avg_Temp'))
ncol(weather)
## [1] 4
nrow(weather)
## [1] 7963
names(weather)
## [1] "Month"    "Day"      "Year"     "Avg_Temp"
range(weather$Year)
## [1] 1995 2016
weather$Avg_Temp[weather$Avg_Temp==-99]<- NA
sum(is.na(weather$Avg_Temp==T))
## [1] 14
str(weather)
## 'data.frame':    7963 obs. of  4 variables:
##  $ Month   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Day     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Year    : int  1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
##  $ Avg_Temp: num  41.1 22.2 22.8 14.9 9.5 23.8 31.1 26.9 31.3 31.5 ...
summary(weather$Avg_Temp)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   -2.20   40.20   57.10   54.73   70.70   89.20      14
weather %>%
  group_by(Month) %>%
  summarise_each(funs(mean(Avg_Temp, na.rm=TRUE)))
## # A tibble: 12 × 4
##    Month      Day     Year Avg_Temp
##    <int>    <dbl>    <dbl>    <dbl>
## 1      1 31.13157 31.13157 31.13157
## 2      2 33.79003 33.79003 33.79003
## 3      3 43.81483 43.81483 43.81483
## 4      4 54.89772 54.89772 54.89772
## 5      5 64.04047 64.04047 64.04047
## 6      6 72.16692 72.16692 72.16692
## 7      7 75.38490 75.38490 75.38490
## 8      8 74.96579 74.96579 74.96579
## 9      9 67.90091 67.90091 67.90091
## 10    10 56.03746 56.03746 56.03746
## 11    11 44.66365 44.66365 44.66365
## 12    12 35.53354 35.53354 35.53354

Data Visualizations

I have created three visualizations based on the dataset.

1. Visualization # 1 This Visualization shows monthwise distribution of Average Temperature with Year

ggplot(data = weather) + 
  geom_point(mapping = aes(x = Year, y = Avg_Temp)) +
  facet_wrap(~Month, nrow=6) + 
  ggtitle("Average Temperature v/s Year")

2. Visualization # 2 This Visualization shows Yearwise distribution of Average Temperature with Month

ggplot(data = weather) +
  geom_smooth(
    mapping = aes(x = Month, y = Avg_Temp, group = Year, colour=Year),
    show.legend = FALSE
  ) +
  ggtitle("Average Temperature v/s Month")

#Inference : Average Temperature has rised over the years.#

3. Visualization # 3 This Visualization gives the graphical summary of the distribution of Average Temperature with year

ggplot(data = weather) + 
  stat_summary(
    mapping = aes(x = Year, y = Avg_Temp),
    fun.ymin = min,
    fun.ymax = max,
    fun.y = median
  ) + 
  ggtitle("Statistical Summary of Average Temperature with Year")