This Rmarkdown file is a part of the week 3 exercise. In this markdown we create r code chunks to load the data from a url into a dataframe using R. We identify the columns by checking for unique values.Nrow gives the number of records in the whole data set.Summary gives details about the mean,median and the ranges. Three visualizations are done- histogram of temperature.The second visualization shows a bar graph of the year versus temperature. The last graph is a box plot of the temperature in every month.
library(ggplot2)
The library ggplot2 is loaded to plot the graphs.
??ggplot2 gives more details about ggplot2
#Source Code
urlink<-"http://academic.udayton.edu/kissock/http/Weather/gsod95-current/OHCINCIN.txt";
datacincin <- read.table(urlink);
head(datacincin)
## V1 V2 V3 V4
## 1 1 1 1995 41.1
## 2 1 2 1995 22.2
## 3 1 3 1995 22.8
## 4 1 4 1995 14.9
## 5 1 5 1995 9.5
## 6 1 6 1995 23.8
colnames(datacincin)<-c("Month","Day","Year","Temperature")
The first few rows of the data set datacincin is loaded and displayed. Unique values of each column are found.
head(datacincin)
## Month Day Year Temperature
## 1 1 1 1995 41.1
## 2 1 2 1995 22.2
## 3 1 3 1995 22.8
## 4 1 4 1995 14.9
## 5 1 5 1995 9.5
## 6 1 6 1995 23.8
unique(datacincin$Month)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12
unique(datacincin$Day)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31
unique(datacincin$Year)
## [1] 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
## [15] 2009 2010 2011 2012 2013 2014 2015 2016
unique(datacincin$Temperature)
## [1] 41.1 22.2 22.8 14.9 9.5 23.8 31.1 26.9 31.3 31.5 44.4
## [12] 58.0 60.2 45.6 33.7 35.5 41.4 46.0 34.6 25.1 24.0 23.0
## [23] 21.4 20.7 22.1 26.5 22.9 23.2 20.4 35.0 37.2 30.8 26.4
## [34] 17.6 9.0 13.3 13.8 15.3 31.7 6.3 25.9 38.9 35.8 33.1
## [45] 37.6 44.7 42.3 36.5 34.2 50.6 37.5 52.8 54.1 43.6 33.0
## [56] 24.5 27.9 33.3 47.7 59.1 26.6 30.4 48.7 57.7 61.4 64.0
## [67] 60.3 58.5 49.4 55.0 58.4 50.4 51.6 42.1 45.2 49.1 40.2
## [78] 43.4 38.4 39.9 44.0 49.3 50.8 57.5 63.8 59.5 57.3 57.4
## [89] 46.9 48.3 50.0 61.2 66.5 69.6 65.4 52.5 51.9 45.8 49.6
## [100] 52.6 48.4 54.8 55.7 46.1 51.0 54.2 55.4 54.3 56.8 61.1
## [111] 66.2 67.1 58.2 70.4 66.9 61.8 66.7 66.4 58.1 62.4 64.2
## [122] 63.2 67.0 74.7 67.5 63.0 65.1 70.8 66.0 61.0 65.8 67.6
## [133] 69.0 69.7 68.1 70.5 73.7 75.6 75.9 76.1 69.9 63.1 62.7
## [144] 66.1 71.3 71.6 73.2 77.6 74.5 73.3 72.1 71.8 74.0 75.7
## [155] 72.6 71.9 73.9 64.5 74.3 72.9 72.3 68.9 75.0 76.5 79.7
## [166] 82.9 84.6 82.4 77.3 80.9 76.4 75.4 74.1 72.2 77.8 77.5
## [177] 81.7 80.3 82.7 79.3 79.1 83.3 81.6 73.5 75.5 76.6 76.8
## [188] 80.1 83.5 81.5 82.1 82.3 81.2 76.9 78.9 70.6 76.2 77.0
## [199] 74.9 75.3 69.5 73.6 74.6 76.0 72.0 73.8 71.4 65.3 58.3
## [210] 62.3 47.9 51.7 53.6 55.2 61.5 64.1 66.3 63.7 64.7 59.7
## [221] 61.3 55.9 62.8 64.9 59.3 48.0 44.1 60.7 55.3 41.7 59.6
## [232] 55.8 42.2 47.8 49.5 47.6 46.8 58.9 64.8 41.3 29.1 33.5
## [243] 47.4 35.1 26.3 23.7 35.2 33.2 39.1 43.5 38.6 30.5 39.4
## [254] 30.1 30.9 45.4 39.2 49.7 43.3 39.8 29.7 25.4 19.0 8.1
## [265] 18.0 25.7 29.8 43.9 33.6 35.7 21.3 25.6 19.3 21.8 21.0
## [276] 29.0 39.3 40.6 19.2 21.9 17.3 16.8 13.7 22.5 27.3 28.7
## [287] 36.4 52.0 56.0 13.0 25.2 32.6 21.5 23.1 36.8 24.7 14.0
## [298] 9.9 9.6 2.5 -2.2 5.4 18.7 32.2 43.7 41.0 44.6 26.8
## [309] 37.1 30.6 22.7 23.3 46.4 49.2 44.8 50.7 61.6 20.2 23.5
## [320] 33.4 21.1 13.9 14.3 56.4 56.9 42.6 32.0 27.0 27.6 49.8
## [331] 54.9 29.5 38.8 48.9 40.5 53.9 34.3 38.1 31.8 36.1 38.3
## [342] 62.2 40.8 60.0 66.8 65.5 65.9 44.3 54.6 44.9 52.2 56.7
## [353] 51.8 56.1 64.3 70.0 57.6 46.2 45.5 51.1 67.8 78.0 63.9
## [364] 60.6 67.2 67.4 63.5 65.6 70.2 76.7 72.4 68.6 75.2 80.8
## [375] 81.4 69.8 70.9 77.1 78.4 80.6 70.3 68.3 70.7 69.2 78.1
## [386] 79.9 80.2 71.5 73.0 78.7 79.5 79.2 79.0 75.8 73.4 60.9
## [397] 60.4 62.0 57.1 56.6 69.1 58.6 52.4 48.5 47.0 51.4 59.8
## [408] 63.6 62.1 53.3 65.2 40.0 32.8 39.7 43.1 29.4 37.4 31.6
## [419] 46.6 44.5 28.8 35.3 40.1 59.4 52.7 15.7 12.1 21.2 28.9
## [430] 38.0 59.0 60.8 22.3 28.6 3.2 5.1 9.3 5.6 7.8 31.4
## [441] 18.4 26.2 17.8 27.1 42.0 41.5 34.9 27.4 30.0 29.6 49.9
## [452] 54.7 40.3 42.7 36.3 52.3 47.5 43.8 46.5 39.5 53.2 51.2
## [463] 51.3 39.0 54.0 29.2 34.1 50.9 48.2 52.1 48.8 57.9 56.5
## [474] 58.7 53.8 53.1 54.4 67.9 57.8 64.6 63.4 68.7 72.7 80.0
## [485] 73.1 77.9 69.4 65.0 74.4 78.6 85.5 72.5 78.8 68.8 68.4
## [496] 62.6 62.5 71.0 71.2 71.1 69.3 67.3 63.3 74.8 59.9 57.2
## [507] 60.1 52.9 47.3 46.7 38.7 45.1 36.7 45.3 45.0 42.8 35.4
## [518] 25.8 57.0 41.9 24.2 31.2 36.2 31.9 42.9 42.5 39.6 34.7
## [529] 28.5 21.6 37.7 28.4 37.0 37.3 32.4 32.7 34.0 34.8 35.9
## [540] 40.7 40.4 42.4 43.0 47.1 34.5 19.8 18.1 55.1 48.6 35.6
## [551] 74.2 50.3 44.2 59.2 53.5 65.7 64.4 71.7 80.4 83.0 82.8
## [562] 81.8 77.4 76.3 72.8 77.2 70.1 54.5 53.4 56.3 50.1 53.7
## [573] 45.7 33.9 46.3 49.0 36.0 24.3 14.2 -99.0 13.4 24.4 11.5
## [584] 4.0 19.5 16.7 19.9 50.2 38.5 37.8 27.5 36.9 32.9 41.2
## [595] 30.3 41.6 55.6 79.8 78.2 67.7 80.5 82.5 78.5 77.7 84.0
## [606] 83.9 87.7 87.0 61.9 66.6 53.0 45.9 43.2 29.9 37.9 20.0
## [617] 12.8 28.2 24.1 19.6 40.9 30.7 25.5 7.1 26.7 14.7 8.2
## [628] 8.9 10.7 22.4 23.6 32.1 34.4 36.6 61.7 48.1 50.5 60.5
## [639] 62.9 68.0 68.2 75.1 58.8 20.5 24.8 26.1 12.4 13.2 22.6
## [650] 6.6 10.9 7.9 18.3 16.3 22.0 15.5 20.9 29.3 68.5 79.6
## [661] 80.7 47.2 55.5 25.3 17.1 19.1 18.8 15.0 23.9 56.2 78.3
## [672] 82.2 81.9 82.0 83.8 85.3 81.1 38.2 27.8 24.9 27.2 32.3
## [683] 18.9 16.1 17.9 15.8 5.3 33.8 27.7 14.5 23.4 51.5 30.2
## [694] 28.1 20.6 16.6 17.0 16.0 18.6 10.8 3.4 25.0 11.3 20.3
## [705] 8.0 8.3 11.9 11.2 11.8 32.5 28.0 26.0 83.2 81.0 79.4
## [716] 85.6 85.8 84.1 83.6 20.1 41.8 83.4 83.7 16.2 12.6 12.2
## [727] 10.0 9.1 84.2 86.2 84.8 85.9 84.3 87.3 86.4 21.7 15.2
## [738] 24.6 31.0 81.3 7.6 11.7 1.8 14.6 17.2 13.1 15.1 15.4
## [749] 16.9 19.4 83.1 86.1 85.0 17.4 13.5 7.4 85.1 82.6 86.5
## [760] 84.7 87.6 89.2 87.8 16.5 10.5 10.1 -1.6 3.9 3.6 2.0
## [771] 6.1 17.5 20.8 28.3 4.8 12.0 16.4 1.5 17.7 10.4 12.7
From the unique values we know that the first column contains the days or the date of the month. From the second values we know that the data give is the month of the year. The third column is the year in which the temperature is recorded. The fourth column has the temperature values.
Summary command summarizes the data set. The mean, median and the quartile ranges are obtained using summary. nrow gives the count of the rows in the data.
summary(datacincin)
## Month Day Year Temperature
## Min. : 1.000 Min. : 1.00 Min. :1995 Min. :-99.00
## 1st Qu.: 4.000 1st Qu.: 8.00 1st Qu.:2000 1st Qu.: 40.10
## Median : 6.000 Median :16.00 Median :2005 Median : 57.00
## Mean : 6.479 Mean :15.72 Mean :2005 Mean : 54.46
## 3rd Qu.: 9.000 3rd Qu.:23.00 3rd Qu.:2011 3rd Qu.: 70.70
## Max. :12.000 Max. :31.00 Max. :2016 Max. : 89.20
nrow(datacincin)
## [1] 7963
Next, the null values, if any, in the data set are obtained.
sum(is.na(datacincin$Month))
## [1] 0
sum(is.na(datacincin$Day))
## [1] 0
sum(is.na(datacincin$Year))
## [1] 0
sum(is.na(datacincin$Temperature))
## [1] 0
The following visualizations are plotted to understand the data better.
hist(datacincin$Temperature)
The above histogram shows the mean and the ranges of the temperature.
The following graph shows the barplot of month versus the temperature. The temperature variations over the different months can be observed.
counts <- table(datacincin$Month,datacincin$Temperature)
barplot(counts,col="darkgreen",border="red")
The following graph shows a plot of the year against the temperature. The year is plotted on the y axis and temperature on the X axis.
plot(datacincin$Year,datacincin$Temperature,col="darkgreen")