title: “Week 3 Final Project” author: “Bryan Persaud” date: “8/04/2019” output: html_document
theUrl <- "https://raw.githubusercontent.com/bpersaud104/R/master/airquality.csv"
airquality <- read.table(file = theUrl, header = TRUE, sep = ",")
head(airquality)
## X Ozone Solar.R Wind Temp Month Day
## 1 1 41 190 7.4 67 5 1
## 2 2 36 118 8.0 72 5 2
## 3 3 12 149 12.6 74 5 3
## 4 4 18 313 11.5 62 5 4
## 5 5 NA NA 14.3 56 5 5
## 6 6 28 NA 14.9 66 5 6
summary(airquality)
## X Ozone Solar.R Wind
## Min. : 1 Min. : 1.00 Min. : 7.0 Min. : 1.700
## 1st Qu.: 39 1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400
## Median : 77 Median : 31.50 Median :205.0 Median : 9.700
## Mean : 77 Mean : 42.13 Mean :185.9 Mean : 9.958
## 3rd Qu.:115 3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500
## Max. :153 Max. :168.00 Max. :334.0 Max. :20.700
## NA's :37 NA's :7
## Temp Month Day
## Min. :56.00 Min. :5.000 Min. : 1.0
## 1st Qu.:72.00 1st Qu.:6.000 1st Qu.: 8.0
## Median :79.00 Median :7.000 Median :16.0
## Mean :77.88 Mean :6.993 Mean :15.8
## 3rd Qu.:85.00 3rd Qu.:8.000 3rd Qu.:23.0
## Max. :97.00 Max. :9.000 Max. :31.0
##
str(airquality)
## 'data.frame': 153 obs. of 7 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
wind_mean <- mean(airquality$Wind)
wind_median <- median(airquality$Wind)
print(paste("The mean for wind is = ", round(wind_mean, 2), "and the median for wind is = ", wind_median))
## [1] "The mean for wind is = 9.96 and the median for wind is = 9.7"
temp_mean <- mean(airquality$Temp)
temp_median <- median(airquality$Temp)
print(paste("The mean for temp is = ", round(temp_mean, 2), "and the median for temp is = ", temp_median))
## [1] "The mean for temp is = 77.88 and the median for temp is = 79"
Based off the summary function, we can see the data shows ozone, solar radiation, wind, and temperature for the months of May to September. For ozone and solar radiation we see that they contain some NA’s, which shows that some data is missing. The structure functions shows us the data frame consists of 153 observations over 7 variables. All of the variables are integer values except for wind which is a numeric value. The mean for wind for all of the data is 9.96 and the median for wind is 9.7. The mean for temp for all data is 77.88 and the median is 79. The mean and median for both are close to each other, with the mean for wind being higher than the median for wind and the mean for temp being lower than the median for temp.
may_airquality <- data.frame(subset(airquality, Month == 5))
names(may_airquality) <- c("X" = "X", "Ozone" = "May.Ozone", "Solar.R" = "May.Solar.R", "Wind" = "May.Wind", "Temp" = "May.Temp", "Month" = "Month", "Day" = "Day")
summary(may_airquality)
## X May.Ozone May.Solar.R May.Wind
## Min. : 1.0 Min. : 1.00 Min. : 8.0 Min. : 5.70
## 1st Qu.: 8.5 1st Qu.: 11.00 1st Qu.: 72.0 1st Qu.: 8.90
## Median :16.0 Median : 18.00 Median :194.0 Median :11.50
## Mean :16.0 Mean : 23.62 Mean :181.3 Mean :11.62
## 3rd Qu.:23.5 3rd Qu.: 31.50 3rd Qu.:284.5 3rd Qu.:14.05
## Max. :31.0 Max. :115.00 Max. :334.0 Max. :20.10
## NA's :5 NA's :4
## May.Temp Month Day
## Min. :56.00 Min. :5 Min. : 1.0
## 1st Qu.:60.00 1st Qu.:5 1st Qu.: 8.5
## Median :66.00 Median :5 Median :16.0
## Mean :65.55 Mean :5 Mean :16.0
## 3rd Qu.:69.00 3rd Qu.:5 3rd Qu.:23.5
## Max. :81.00 Max. :5 Max. :31.0
##
str(may_airquality)
## 'data.frame': 31 obs. of 7 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ May.Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ May.Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ May.Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ May.Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
head(may_airquality, 31)
## X May.Ozone May.Solar.R May.Wind May.Temp Month Day
## 1 1 41 190 7.4 67 5 1
## 2 2 36 118 8.0 72 5 2
## 3 3 12 149 12.6 74 5 3
## 4 4 18 313 11.5 62 5 4
## 5 5 NA NA 14.3 56 5 5
## 6 6 28 NA 14.9 66 5 6
## 7 7 23 299 8.6 65 5 7
## 8 8 19 99 13.8 59 5 8
## 9 9 8 19 20.1 61 5 9
## 10 10 NA 194 8.6 69 5 10
## 11 11 7 NA 6.9 74 5 11
## 12 12 16 256 9.7 69 5 12
## 13 13 11 290 9.2 66 5 13
## 14 14 14 274 10.9 68 5 14
## 15 15 18 65 13.2 58 5 15
## 16 16 14 334 11.5 64 5 16
## 17 17 34 307 12.0 66 5 17
## 18 18 6 78 18.4 57 5 18
## 19 19 30 322 11.5 68 5 19
## 20 20 11 44 9.7 62 5 20
## 21 21 1 8 9.7 59 5 21
## 22 22 11 320 16.6 73 5 22
## 23 23 4 25 9.7 61 5 23
## 24 24 32 92 12.0 61 5 24
## 25 25 NA 66 16.6 57 5 25
## 26 26 NA 266 14.9 58 5 26
## 27 27 NA NA 8.0 57 5 27
## 28 28 23 13 12.0 67 5 28
## 29 29 45 252 14.9 81 5 29
## 30 30 115 223 5.7 79 5 30
## 31 31 37 279 7.4 76 5 31
may.wind_mean <- mean(may_airquality$May.Wind)
may.wind_median <- median(may_airquality$May.Wind)
print(paste("The mean for the wind for the month of May is = ", round(may.wind_mean, 2), "and the median for the wind for the month of May is = ", may.wind_median))
## [1] "The mean for the wind for the month of May is = 11.62 and the median for the wind for the month of May is = 11.5"
may.temp_mean <- mean(may_airquality$May.Temp)
may.temp_median <- median(may_airquality$May.Temp)
print(paste("The mean for the temperature for the month of may is = ", round(may.temp_mean, 2), "and the median for the temperature for the month of may is = ", may.temp_median))
## [1] "The mean for the temperature for the month of may is = 65.55 and the median for the temperature for the month of may is = 66"
august_airquality <- data.frame(subset(airquality, Month == 8))
names(august_airquality) <- c("X" = "X", "Ozone" = "August.Ozone", "Solar.R" = "August.Solar.R", "Wind" = "August.Wind", "Temp" = "August.Temp", "Month" = "Month", "Day" = "Day")
summary(august_airquality)
## X August.Ozone August.Solar.R August.Wind
## Min. : 93.0 Min. : 9.00 Min. : 24.0 Min. : 2.300
## 1st Qu.:100.5 1st Qu.: 28.75 1st Qu.:107.0 1st Qu.: 6.600
## Median :108.0 Median : 52.00 Median :197.5 Median : 8.600
## Mean :108.0 Mean : 59.96 Mean :171.9 Mean : 8.794
## 3rd Qu.:115.5 3rd Qu.: 82.50 3rd Qu.:231.0 3rd Qu.:11.200
## Max. :123.0 Max. :168.00 Max. :273.0 Max. :15.500
## NA's :5 NA's :3
## August.Temp Month Day
## Min. :72.00 Min. :8 Min. : 1.0
## 1st Qu.:79.00 1st Qu.:8 1st Qu.: 8.5
## Median :82.00 Median :8 Median :16.0
## Mean :83.97 Mean :8 Mean :16.0
## 3rd Qu.:88.50 3rd Qu.:8 3rd Qu.:23.5
## Max. :97.00 Max. :8 Max. :31.0
##
str(august_airquality)
## 'data.frame': 31 obs. of 7 variables:
## $ X : int 93 94 95 96 97 98 99 100 101 102 ...
## $ August.Ozone : int 39 9 16 78 35 66 122 89 110 NA ...
## $ August.Solar.R: int 83 24 77 NA NA NA 255 229 207 222 ...
## $ August.Wind : num 6.9 13.8 7.4 6.9 7.4 4.6 4 10.3 8 8.6 ...
## $ August.Temp : int 81 81 82 86 85 87 89 90 90 92 ...
## $ Month : int 8 8 8 8 8 8 8 8 8 8 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
head(august_airquality, 31)
## X August.Ozone August.Solar.R August.Wind August.Temp Month Day
## 93 93 39 83 6.9 81 8 1
## 94 94 9 24 13.8 81 8 2
## 95 95 16 77 7.4 82 8 3
## 96 96 78 NA 6.9 86 8 4
## 97 97 35 NA 7.4 85 8 5
## 98 98 66 NA 4.6 87 8 6
## 99 99 122 255 4.0 89 8 7
## 100 100 89 229 10.3 90 8 8
## 101 101 110 207 8.0 90 8 9
## 102 102 NA 222 8.6 92 8 10
## 103 103 NA 137 11.5 86 8 11
## 104 104 44 192 11.5 86 8 12
## 105 105 28 273 11.5 82 8 13
## 106 106 65 157 9.7 80 8 14
## 107 107 NA 64 11.5 79 8 15
## 108 108 22 71 10.3 77 8 16
## 109 109 59 51 6.3 79 8 17
## 110 110 23 115 7.4 76 8 18
## 111 111 31 244 10.9 78 8 19
## 112 112 44 190 10.3 78 8 20
## 113 113 21 259 15.5 77 8 21
## 114 114 9 36 14.3 72 8 22
## 115 115 NA 255 12.6 75 8 23
## 116 116 45 212 9.7 79 8 24
## 117 117 168 238 3.4 81 8 25
## 118 118 73 215 8.0 86 8 26
## 119 119 NA 153 5.7 88 8 27
## 120 120 76 203 9.7 97 8 28
## 121 121 118 225 2.3 94 8 29
## 122 122 84 237 6.3 96 8 30
## 123 123 85 188 6.3 94 8 31
august.wind_mean <- mean(august_airquality$August.Wind)
august.wind_median <- median(august_airquality$August.Wind)
print(paste("The mean for the wind for the month of August is ", round(august.wind_mean, 2), "and the median for the wind for the month of August is ", august.wind_median))
## [1] "The mean for the wind for the month of August is 8.79 and the median for the wind for the month of August is 8.6"
august.temp_mean <- mean(august_airquality$August.Temp)
august.temp_median <- median(august_airquality$August.Temp)
print(paste("The mean for the temperature for the month of August is ", round(august.temp_mean, 2), "and the median for the temperature for the month of August is ", august.temp_median))
## [1] "The mean for the temperature for the month of August is 83.97 and the median for the temperature for the month of August is 82"
# Using the ggplot2 package
require(ggplot2)
## Loading required package: ggplot2
qplot(Month, Wind, data = airquality)
qplot(Month, Temp, data = airquality)
boxplot(airquality$Wind)
boxplot(airquality$Temp)
ggplot(data = airquality) + geom_density(aes(x = Wind), fill = "grey50")
ggplot(data = airquality) + geom_density(aes(x = Temp), fill = "grey50")
ggplot(airquality, aes(x = Wind, y = Temp)) + geom_line()
#Meaningful Question for analysis - Conclusion Based on the findings above, we see that for the month of May the mean for the wind is 11.62 and the median is 11.5. The mean for temperature is 65.55 and the median is 66. For the month of August the mean for the wind is 8.79 and the median is 8.6. The mean for temperature is 83.97 and the median is 82. When comparing the two months of May and August, we see that the wind is higher in May than in August, and the temperature is higher in August than in May. The graphs further show that the wind is high for May and June, but starts to decline during July and August and slightly start to go back up in September. The temperature is low for the month of May, then starts to increase for June, July, and August, and then steadily decreases as it approaches September. The line graph shows the relationship between the wind and the temperature. It shows that for the most part as the wind goes up, the temperature does down. This means that a higher temperature usuallys means low wind. This further shows why the wind is higher is May and the temperature is higher in August, since the findings show that May has higher wind than August, but lower temperature than August and August has higher temperature than May but lower wind.