Synopsis

This R Markdown file is created as part of my Week 3 Assignment requirement for Data Wrangling in R course taught at UC. This week, I have scrapped the Cincinnati Weather Data file from U Dayton website, identified the summary statistics of the data and created three visualzations of the data.

Packages required

I have used the following packages for creating this RMD file

library(printr)  # for proper formatting while printing
library(tidyverse)  # for creating visualzations

Source code

The file contains data of the average daily temperature of Cincinnati from 1995 to present. U Dayton has sources and regularly updates this data from he Global Summary of the Day (GSOD) database archived by the National Climatic Data Center (NCDC). The average daily temperature is calculated from 24 hourly temperature readings in the Global Summary of the Day (GSOD) data.

Variable description

The file has 4 variables:

month: Has the month the observation corresponds to

day: Has the day of the month the observation corresponds to

year: Has the year the observation corresponds to

avg_temp: has the average daily temperature in fahrenheit measured as the mean of 24 hourly temperature readings in GSOD data

Missing values

The missing values are represented by -99 in the original file, which I later changed to NA for computational purposes.

Data Description

cincy_url<-"http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt"
cincy_data<-read.table(cincy_url, sep ="", header =FALSE, col.names=c("month","day","year","avg_temp"))
head(cincy_data)
month day year avg_temp
1 1 1995 41.1
1 2 1995 22.2
1 3 1995 22.8
1 4 1995 14.9
1 5 1995 9.5
1 6 1995 23.8
## computations for printing inline in summary statistics below
num_var<-ncol(cincy_data)
num_row<-nrow(cincy_data)
cincy_data$avg_temp[cincy_data$avg_temp==-99] <- NA
n_miss<-sum(is.na(cincy_data))
mean_temp<-mean(cincy_data$avg_temp, na.rm=TRUE)
med_temp<-median(cincy_data$avg_temp, na.rm=TRUE)
min_temp<-min(cincy_data$avg_temp, na.rm=TRUE)
max_temp<-max(cincy_data$avg_temp, na.rm=TRUE)

Summary statistics

  • Number of varibales = 4

  • Number of observations = 7963

  • Number of missing values = 14

  • Mean value of avg_temp = 54.7322934

  • Median value of avg_temp = 57.1

  • Minimum value of avg_temp = -2.2

  • Maximum value of avg_temp = 89.2

Data visualization

Average temperature by year

The following is a visualization of the average yearly temperature from 1995 to present

ggplot(cincy_data, aes(x=factor(year), y=avg_temp)) + 
  stat_summary(fun.y="mean", geom="bar", fill="red")

Average temperature by month

The following is a visualization of the temperature variation by month, based on the daily avrage temperature data from 1995 to present

ggplot(cincy_data, aes(x=factor(month), y=avg_temp)) + 
  stat_summary(fun.y="mean", geom="bar", fill="blue")

Average monthly temperature across years

The follwing visualization shows the variation of the average monthly temperature across all the years from 1995 to present

ggplot(cincy_data, aes(x=factor(month), y=avg_temp)) + 
  stat_summary(fun.y="mean", geom="bar", fill="green", color="black") +
  facet_wrap(~year)