title: “Week 3 Final Project” author: “Bryan Persaud” date: “8/04/2019” output: html_document

I am using the airquality data set

theUrl <- "https://raw.githubusercontent.com/bpersaud104/R/master/airquality.csv"
airquality <- read.table(file = theUrl, header = TRUE, sep = ",")
head(airquality)
##   X Ozone Solar.R Wind Temp Month Day
## 1 1    41     190  7.4   67     5   1
## 2 2    36     118  8.0   72     5   2
## 3 3    12     149 12.6   74     5   3
## 4 4    18     313 11.5   62     5   4
## 5 5    NA      NA 14.3   56     5   5
## 6 6    28      NA 14.9   66     5   6

Question: Find the mean and median for wind and temperature for the month of May and August. Compare both months.

Data Exploration

summary(airquality)
##        X           Ozone           Solar.R           Wind       
##  Min.   :  1   Min.   :  1.00   Min.   :  7.0   Min.   : 1.700  
##  1st Qu.: 39   1st Qu.: 18.00   1st Qu.:115.8   1st Qu.: 7.400  
##  Median : 77   Median : 31.50   Median :205.0   Median : 9.700  
##  Mean   : 77   Mean   : 42.13   Mean   :185.9   Mean   : 9.958  
##  3rd Qu.:115   3rd Qu.: 63.25   3rd Qu.:258.8   3rd Qu.:11.500  
##  Max.   :153   Max.   :168.00   Max.   :334.0   Max.   :20.700  
##                NA's   :37       NA's   :7                       
##       Temp           Month            Day      
##  Min.   :56.00   Min.   :5.000   Min.   : 1.0  
##  1st Qu.:72.00   1st Qu.:6.000   1st Qu.: 8.0  
##  Median :79.00   Median :7.000   Median :16.0  
##  Mean   :77.88   Mean   :6.993   Mean   :15.8  
##  3rd Qu.:85.00   3rd Qu.:8.000   3rd Qu.:23.0  
##  Max.   :97.00   Max.   :9.000   Max.   :31.0  
## 
str(airquality)
## 'data.frame':    153 obs. of  7 variables:
##  $ X      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...
wind_mean <- mean(airquality$Wind)
wind_median <- median(airquality$Wind)
print(paste("The mean for wind is = ", round(wind_mean, 2), "and the median for wind is = ", wind_median))
## [1] "The mean for wind is =  9.96 and the median for wind is =  9.7"
temp_mean <- mean(airquality$Temp)
temp_median <- median(airquality$Temp)
print(paste("The mean for temp is = ", round(temp_mean, 2), "and the median for temp is = ", temp_median))
## [1] "The mean for temp is =  77.88 and the median for temp is =  79"

Based off the summary function, we can see the data shows ozone, solar radiation, wind, and temperature for the months of May to September. For ozone and solar radiation we see that they contain some NA’s, which shows that some data is missing. The structure functions shows us the data frame consists of 153 observations over 7 variables. All of the variables are integer values except for wind which is a numeric value. The mean for wind for all of the data is 9.96 and the median for wind is 9.7. The mean for temp for all data is 77.88 and the median is 79. The mean and median for both are close to each other, with the mean for wind being higher than the median for wind and the mean for temp being lower than the median for temp.

Data Wrangling

may_airquality <- data.frame(subset(airquality, Month == 5))

names(may_airquality) <- c("X" = "X", "Ozone" = "May.Ozone", "Solar.R" = "May.Solar.R", "Wind" = "May.Wind", "Temp" = "May.Temp", "Month" = "Month", "Day" = "Day")

summary(may_airquality)
##        X          May.Ozone       May.Solar.R       May.Wind    
##  Min.   : 1.0   Min.   :  1.00   Min.   :  8.0   Min.   : 5.70  
##  1st Qu.: 8.5   1st Qu.: 11.00   1st Qu.: 72.0   1st Qu.: 8.90  
##  Median :16.0   Median : 18.00   Median :194.0   Median :11.50  
##  Mean   :16.0   Mean   : 23.62   Mean   :181.3   Mean   :11.62  
##  3rd Qu.:23.5   3rd Qu.: 31.50   3rd Qu.:284.5   3rd Qu.:14.05  
##  Max.   :31.0   Max.   :115.00   Max.   :334.0   Max.   :20.10  
##                 NA's   :5        NA's   :4                      
##     May.Temp         Month        Day      
##  Min.   :56.00   Min.   :5   Min.   : 1.0  
##  1st Qu.:60.00   1st Qu.:5   1st Qu.: 8.5  
##  Median :66.00   Median :5   Median :16.0  
##  Mean   :65.55   Mean   :5   Mean   :16.0  
##  3rd Qu.:69.00   3rd Qu.:5   3rd Qu.:23.5  
##  Max.   :81.00   Max.   :5   Max.   :31.0  
## 
str(may_airquality)
## 'data.frame':    31 obs. of  7 variables:
##  $ X          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ May.Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ May.Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ May.Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ May.Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month      : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day        : int  1 2 3 4 5 6 7 8 9 10 ...
head(may_airquality, 31)
##     X May.Ozone May.Solar.R May.Wind May.Temp Month Day
## 1   1        41         190      7.4       67     5   1
## 2   2        36         118      8.0       72     5   2
## 3   3        12         149     12.6       74     5   3
## 4   4        18         313     11.5       62     5   4
## 5   5        NA          NA     14.3       56     5   5
## 6   6        28          NA     14.9       66     5   6
## 7   7        23         299      8.6       65     5   7
## 8   8        19          99     13.8       59     5   8
## 9   9         8          19     20.1       61     5   9
## 10 10        NA         194      8.6       69     5  10
## 11 11         7          NA      6.9       74     5  11
## 12 12        16         256      9.7       69     5  12
## 13 13        11         290      9.2       66     5  13
## 14 14        14         274     10.9       68     5  14
## 15 15        18          65     13.2       58     5  15
## 16 16        14         334     11.5       64     5  16
## 17 17        34         307     12.0       66     5  17
## 18 18         6          78     18.4       57     5  18
## 19 19        30         322     11.5       68     5  19
## 20 20        11          44      9.7       62     5  20
## 21 21         1           8      9.7       59     5  21
## 22 22        11         320     16.6       73     5  22
## 23 23         4          25      9.7       61     5  23
## 24 24        32          92     12.0       61     5  24
## 25 25        NA          66     16.6       57     5  25
## 26 26        NA         266     14.9       58     5  26
## 27 27        NA          NA      8.0       57     5  27
## 28 28        23          13     12.0       67     5  28
## 29 29        45         252     14.9       81     5  29
## 30 30       115         223      5.7       79     5  30
## 31 31        37         279      7.4       76     5  31
may.wind_mean <- mean(may_airquality$May.Wind)
may.wind_median <- median(may_airquality$May.Wind)
print(paste("The mean for the wind for the month of May is = ", round(may.wind_mean, 2), "and the median for the wind for the month of May is = ", may.wind_median))
## [1] "The mean for the wind for the month of May is =  11.62 and the median for the wind for the month of May is =  11.5"
may.temp_mean <- mean(may_airquality$May.Temp)
may.temp_median <- median(may_airquality$May.Temp)
print(paste("The mean for the temperature for the month of may is = ", round(may.temp_mean, 2), "and the median for the temperature for the month of may is = ", may.temp_median))
## [1] "The mean for the temperature for the month of may is =  65.55 and the median for the temperature for the month of may is =  66"
august_airquality <- data.frame(subset(airquality, Month == 8))

names(august_airquality) <- c("X" = "X", "Ozone" = "August.Ozone", "Solar.R" = "August.Solar.R", "Wind" = "August.Wind", "Temp" = "August.Temp", "Month" = "Month", "Day" = "Day")

summary(august_airquality)
##        X          August.Ozone    August.Solar.R   August.Wind    
##  Min.   : 93.0   Min.   :  9.00   Min.   : 24.0   Min.   : 2.300  
##  1st Qu.:100.5   1st Qu.: 28.75   1st Qu.:107.0   1st Qu.: 6.600  
##  Median :108.0   Median : 52.00   Median :197.5   Median : 8.600  
##  Mean   :108.0   Mean   : 59.96   Mean   :171.9   Mean   : 8.794  
##  3rd Qu.:115.5   3rd Qu.: 82.50   3rd Qu.:231.0   3rd Qu.:11.200  
##  Max.   :123.0   Max.   :168.00   Max.   :273.0   Max.   :15.500  
##                  NA's   :5        NA's   :3                       
##   August.Temp        Month        Day      
##  Min.   :72.00   Min.   :8   Min.   : 1.0  
##  1st Qu.:79.00   1st Qu.:8   1st Qu.: 8.5  
##  Median :82.00   Median :8   Median :16.0  
##  Mean   :83.97   Mean   :8   Mean   :16.0  
##  3rd Qu.:88.50   3rd Qu.:8   3rd Qu.:23.5  
##  Max.   :97.00   Max.   :8   Max.   :31.0  
## 
str(august_airquality)
## 'data.frame':    31 obs. of  7 variables:
##  $ X             : int  93 94 95 96 97 98 99 100 101 102 ...
##  $ August.Ozone  : int  39 9 16 78 35 66 122 89 110 NA ...
##  $ August.Solar.R: int  83 24 77 NA NA NA 255 229 207 222 ...
##  $ August.Wind   : num  6.9 13.8 7.4 6.9 7.4 4.6 4 10.3 8 8.6 ...
##  $ August.Temp   : int  81 81 82 86 85 87 89 90 90 92 ...
##  $ Month         : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ Day           : int  1 2 3 4 5 6 7 8 9 10 ...
head(august_airquality, 31)
##       X August.Ozone August.Solar.R August.Wind August.Temp Month Day
## 93   93           39             83         6.9          81     8   1
## 94   94            9             24        13.8          81     8   2
## 95   95           16             77         7.4          82     8   3
## 96   96           78             NA         6.9          86     8   4
## 97   97           35             NA         7.4          85     8   5
## 98   98           66             NA         4.6          87     8   6
## 99   99          122            255         4.0          89     8   7
## 100 100           89            229        10.3          90     8   8
## 101 101          110            207         8.0          90     8   9
## 102 102           NA            222         8.6          92     8  10
## 103 103           NA            137        11.5          86     8  11
## 104 104           44            192        11.5          86     8  12
## 105 105           28            273        11.5          82     8  13
## 106 106           65            157         9.7          80     8  14
## 107 107           NA             64        11.5          79     8  15
## 108 108           22             71        10.3          77     8  16
## 109 109           59             51         6.3          79     8  17
## 110 110           23            115         7.4          76     8  18
## 111 111           31            244        10.9          78     8  19
## 112 112           44            190        10.3          78     8  20
## 113 113           21            259        15.5          77     8  21
## 114 114            9             36        14.3          72     8  22
## 115 115           NA            255        12.6          75     8  23
## 116 116           45            212         9.7          79     8  24
## 117 117          168            238         3.4          81     8  25
## 118 118           73            215         8.0          86     8  26
## 119 119           NA            153         5.7          88     8  27
## 120 120           76            203         9.7          97     8  28
## 121 121          118            225         2.3          94     8  29
## 122 122           84            237         6.3          96     8  30
## 123 123           85            188         6.3          94     8  31
august.wind_mean <- mean(august_airquality$August.Wind)
august.wind_median <- median(august_airquality$August.Wind)
print(paste("The mean for the wind for the month of August is ", round(august.wind_mean, 2), "and the median for the wind for the month of August is ", august.wind_median))
## [1] "The mean for the wind for the month of August is  8.79 and the median for the wind for the month of August is  8.6"
august.temp_mean <- mean(august_airquality$August.Temp)
august.temp_median <- median(august_airquality$August.Temp)
print(paste("The mean for the temperature for the month of August is ", round(august.temp_mean, 2), "and the median for the temperature for the month of August is ", august.temp_median))
## [1] "The mean for the temperature for the month of August is  83.97 and the median for the temperature for the month of August is  82"

Graphics

# Using the ggplot2 package
require(ggplot2)
## Loading required package: ggplot2
qplot(Month, Wind, data = airquality)

qplot(Month, Temp, data = airquality)

boxplot(airquality$Wind)

boxplot(airquality$Temp)

ggplot(data = airquality) + geom_density(aes(x = Wind), fill = "grey50")

ggplot(data = airquality) + geom_density(aes(x = Temp), fill = "grey50")

ggplot(airquality, aes(x = Wind, y = Temp)) + geom_line()

#Meaningful Question for analysis - Conclusion Based on the findings above, we see that for the month of May the mean for the wind is 11.62 and the median is 11.5. The mean for temperature is 65.55 and the median is 66. For the month of August the mean for the wind is 8.79 and the median is 8.6. The mean for temperature is 83.97 and the median is 82. When comparing the two months of May and August, we see that the wind is higher in May than in August, and the temperature is higher in August than in May. The graphs further show that the wind is high for May and June, but starts to decline during July and August and slightly start to go back up in September. The temperature is low for the month of May, then starts to increase for June, July, and August, and then steadily decreases as it approaches September. The line graph shows the relationship between the wind and the temperature. It shows that for the most part as the wind goes up, the temperature does down. This means that a higher temperature usuallys means low wind. This further shows why the wind is higher is May and the temperature is higher in August, since the findings show that May has higher wind than August, but lower temperature than August and August has higher temperature than May but lower wind.