Synopsis

This is my report for week 3 assignment on scarping data , learning about the data and visualising it .The dataset being analysed contains the daily average temperatures for Cincinnati from 1995 to 2016.

Packages Required

This packages contains multiple packages within it with can be used for data representation

library(tidyverse)

Source Code

The data fields are as follows: V1-month v2-day v3-year v4-average daily temperature (F).

Data Description

The number of rows and columns

cincinnati <- read.table("http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt")
ncol(cincinnati)
## [1] 4
nrow(cincinnati)
## [1] 7963

Names of Variables and their data type

cincinnati <- read.table("http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt")
names(cincinnati)
## [1] "V1" "V2" "V3" "V4"
str(cincinnati)
## 'data.frame':    7963 obs. of  4 variables:
##  $ V1: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ V2: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ V3: int  1995 1995 1995 1995 1995 1995 1995 1995 1995 1995 ...
##  $ V4: num  41.1 22.2 22.8 14.9 9.5 23.8 31.1 26.9 31.3 31.5 ...

A sneek peak into the dataset

cincinnati <- read.table("http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt")
head(cincinnati)
##   V1 V2   V3   V4
## 1  1  1 1995 41.1
## 2  1  2 1995 22.2
## 3  1  3 1995 22.8
## 4  1  4 1995 14.9
## 5  1  5 1995  9.5
## 6  1  6 1995 23.8
tail(cincinnati)
##      V1 V2   V3   V4
## 7958 10 14 2016 54.4
## 7959 10 15 2016 63.2
## 7960 10 16 2016 68.7
## 7961 10 17 2016 71.1
## 7962 10 18 2016 74.4
## 7963 10 19 2016 75.3

Checking for missing values

cincinnati <- read.table("http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt")
cincinnati[cincinnati==-99] <- NA
sum(is.na(cincinnati))
## [1] 14

Basic Statistics

cincinnati <- read.table("http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt")
summary(cincinnati)
##        V1               V2              V3             V4        
##  Min.   : 1.000   Min.   : 1.00   Min.   :1995   Min.   :-99.00  
##  1st Qu.: 4.000   1st Qu.: 8.00   1st Qu.:2000   1st Qu.: 40.10  
##  Median : 6.000   Median :16.00   Median :2005   Median : 57.00  
##  Mean   : 6.479   Mean   :15.72   Mean   :2005   Mean   : 54.46  
##  3rd Qu.: 9.000   3rd Qu.:23.00   3rd Qu.:2011   3rd Qu.: 70.70  
##  Max.   :12.000   Max.   :31.00   Max.   :2016   Max.   : 89.20

Data Visualization

Visualization 1: This visualization clearly shows that the maximum mean temperature was in 2016 and the least mean temperature was in 1996

library(tidyverse)
cincinnati <- read.table("http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt")
cincinnati[is.na(cincinnati)] <- 0
cin2 <-with(cincinnati, aggregate(x=V4, by=list(V3), FUN=mean))
names(cin2)[names(cin2) == 'x'] <- 'Average.Temperature'
names(cin2)[names(cin2) == 'Group.1'] <- 'Year'
ggplot(cin2,aes(x=Year, y=Average.Temperature)) + geom_line()+geom_point()+geom_line(color="blue")

Visualization 2:This visualization clearly shows that the maximum mean temperature was on the 30th day of each month and the least mean temperature was on the 5th day of each month

library(tidyverse)
cincinnati <- read.table("http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt")
cincinnati[is.na(cincinnati)] <- 0
cin2 <-with(cincinnati, aggregate(x=V4, by=list(V2), FUN=mean))
names(cin2)[names(cin2) == 'x'] <- 'Average.Temperature'
names(cin2)[names(cin2) == 'Group.1'] <- 'Day'
ggplot(cin2,aes(x=Day, y=Average.Temperature)) + geom_line() +geom_point()+geom_line(color="pink")

Visualization 3:This visualization clearly shows that the maximum mean temperature was in July across the years and the least mean temperature was January across the years

library(tidyverse)
cincinnati <- read.table("http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt")
cincinnati[is.na(cincinnati)] <- 0
cin2 <-with(cincinnati, aggregate(x=V4, by=list(V1), FUN=mean))
names(cin2)[names(cin2) == 'x'] <- 'Average.Temperature'
names(cin2)[names(cin2) == 'Group.1'] <- 'Month'
ggplot(cin2,aes(x=Month, y=Average.Temperature)) + geom_line() +geom_point()+geom_line(color="green")