Data Description

Data Description

New York Air Quality Measurements

Description

Daily air quality measurements in New York, May to September 1973.

  • Usage

airquality Format

A data frame with 154 observations on 6 variables.

  1. Ozone numeric Ozone (ppb)
  2. Solar.R numeric Solar R (lang)
  3. Wind numeric Wind (mph)
  4. Temp numeric Temperature (degrees F)
  5. Month numeric Month (1–12)
  6. Day numeric Day of month (1–31)
  • Details

Daily readings of the following air quality values for May 1, 1973 (a Tuesday) to September 30, 1973.

Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island

Solar.R: Solar radiation in Langleys in the frequency band 4000–7700 Angstroms from 0800 to 1200 hours at Central Park

Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport

Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia Airport.

  • Source

The data were obtained from the New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data).

  • References

Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical Methods for Data Analysis. Belmont, CA: Wadsworth.

  • Examples require(graphics) pairs(airquality, panel = panel.smooth, main = "airquality data")

Load Data

display-table

AirQuality Data : The table below shows the clean airquality datasets ready for analysis.

Data Structure : Schema of the dataframe

'data.frame':   111 obs. of  7 variables:
 $ Ozone  : int  41 36 12 18 23 19 8 16 11 14 ...
 $ Solar.R: int  190 118 149 313 299 99 19 256 290 274 ...
 $ Wind   : num  7.4 8 12.6 11.5 8.6 13.8 20.1 9.7 9.2 10.9 ...
 $ Temp   : int  67 72 74 62 65 59 61 69 66 68 ...
 $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
 $ Day    : int  1 2 3 4 7 8 9 12 13 14 ...
 $ date   : Date, format: "1973-05-01" "1973-05-02" ...

Data-summary

Summary : Statistical summary of the airquality dataset

     Ozone          Solar.R           Wind            Temp      
 Min.   :  1.0   Min.   :  7.0   Min.   : 2.30   Min.   :57.00  
 1st Qu.: 18.0   1st Qu.:113.5   1st Qu.: 7.40   1st Qu.:71.00  
 Median : 31.0   Median :207.0   Median : 9.70   Median :79.00  
 Mean   : 42.1   Mean   :184.8   Mean   : 9.94   Mean   :77.79  
 3rd Qu.: 62.0   3rd Qu.:255.5   3rd Qu.:11.50   3rd Qu.:84.50  
 Max.   :168.0   Max.   :334.0   Max.   :20.70   Max.   :97.00  
     Month            Day             date           
 Min.   :5.000   Min.   : 1.00   Min.   :1973-05-01  
 1st Qu.:6.000   1st Qu.: 9.00   1st Qu.:1973-06-14  
 Median :7.000   Median :16.00   Median :1973-07-28  
 Mean   :7.216   Mean   :15.95   Mean   :1973-07-22  
 3rd Qu.:9.000   3rd Qu.:22.50   3rd Qu.:1973-09-01  
 Max.   :9.000   Max.   :31.00   Max.   :1973-09-30  

Pair plots: Showing the correlation of the variables in the air quality dataset

Data Distribution

histogram

Histograms

boxplots

Ozone

Wind

Data Trend

line-plots

Line Plots

heatmap

Heatmap

Scatter Plot

Final Analysis

Analysis Report

  • As the temperature increases the wind decreases.it is negatively corelated with 95% confidence interval of value between -0.6256 and -0.3454.

    Pearson's product-moment correlation

data:  airdata$Temp and airdata$Wind
t = -5.7069, df = 109, p-value = 1.003e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.6113758 -0.3220388
sample estimates:
       cor 
-0.4796409 
  • As the temperature increases the Solar Radiation aslo increases. This has a week positive corelation.

    Pearson's product-moment correlation

data:  airdata$Temp and airdata$Solar.R
t = 2.8392, df = 109, p-value = 0.005396
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.07993696 0.42788364
sample estimates:
      cor 
0.2624199 
  • The Ozone depletion is positively correlated with the Temperature.

    Pearson's product-moment correlation

data:  airdata$Ozone and airdata$Temp
t = 12.844, df = 109, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.6892682 0.8407585
sample estimates:
      cor 
0.7759688 

---
title: "airQuality"
output: 
    flexdashboard::flex_dashboard:
        orientation: columns
        vertical_layout: fill
        self_contained: true
        social : ["twitter","facebook","menu"]
        source_code : embed
---

```{r load-library}
library(psych)
library(ggplot2)
library(tidyr)
library(cowplot)
library(reshape2)

```


```{r load-dataset}

airdata <- datasets::airquality
#remove NA's and add data data is collected
remove_na <- function(df, n=0){
  #data data collected from May 1, 1973 to Sep 30, 1973
  df_date <- seq.Date(as.Date('1973/05/01'), as.Date('1973/09/30'), by='day')
  df['date'] <- df_date
  df <- df[rowSums(is.na(df)) <= n,]
  return (df)
}

airdata <- remove_na(airdata)

```

# Data Description
## Side Bar {.sidebar data-width=150}

## Data Description
###

#### New York Air Quality Measurements
##### Description

Daily air quality measurements in New York, May to September 1973.

- Usage

airquality
Format

A data frame with 154 observations on 6 variables.

1. 	Ozone	 numeric	 Ozone (ppb)
1.	Solar.R	 numeric	 Solar R (lang)
1. 	Wind	 numeric	 Wind (mph)
1. Temp	 numeric	 Temperature (degrees F)
1. Month	 numeric	 Month (1--12)
1. 	Day	 numeric	 Day of month (1--31)

- Details

Daily readings of the following air quality values for May 1, 1973 (a Tuesday) to September 30, 1973.

Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island

Solar.R: Solar radiation in Langleys in the frequency band 4000–7700 Angstroms from 0800 to 1200 hours at Central Park

Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport

Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia Airport.

- Source

The data were obtained from the New York State Department of Conservation (ozone data) and the National Weather Service (meteorological data).

- References

Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical Methods for Data Analysis. Belmont, CA: Wadsworth.

- Examples
``
require(graphics)
pairs(airquality, panel = panel.smooth, main = "airquality data")
``

# Load Data
## display-table
### AirQuality Data : The table below shows the clean airquality datasets ready for analysis.
```{r}
DT::datatable(airdata)
```

### Data Structure : Schema of the dataframe
```{r}
str(airdata)
```

## Data-summary
### Summary : Statistical summary of the airquality dataset
```{r}
summary(airdata)
```

### Pair plots: Showing the correlation of the variables in the air quality dataset
```{r}
pairs.panels(airdata, gap=0)
```

# Data Distribution
## histogram
### Histograms
```{r histogram}
#ggplot(gather(airdata[,c(1,2,3,4)]), aes(value)) + geom_histogram(bins = 20,color='blue') + facet_wrap(~key, scales = 'free_x') + theme_classic()

g <- ggplot(airdata, aes(x=Ozone)) + geom_histogram(binwidth = 10,color='black', fill='white') +theme_classic()
h <- ggplot(airdata, aes(x=Solar.R)) + geom_histogram(binwidth  = 10,color='black', fill='white') + theme_classic()
i <- ggplot(airdata, aes(x=Wind)) + geom_histogram(binwidth = 10,color='black', fill='white') + theme_classic()
j <- ggplot(airdata, aes(x=Temp)) + geom_histogram(binwidth =  10,color='black', fill='white') + theme_classic()

plot_grid(g,h,i,j, nrow = 2, ncol = 2, labels = "AUTO")
```


## boxplots
### Ozone
```{r ozone}
ggplot(airdata, aes(group=Month,x=Month,Ozone)) + geom_boxplot() + theme_classic()
```


### Wind
```{r wind}
ggplot(airdata, aes(group=Month,x=Month,round(Wind))) + geom_boxplot() + theme_classic()
```


# Data Trend
## line-plots
### Line Plots

```{r scale-features}
airdata$Ozone <- pnorm(airdata$Ozone, mean = mean(airdata$Ozone), sd = sd(airdata$Ozone))
airdata$Solar.R <- pnorm(airdata$Solar.R, mean = mean(airdata$Solar.R), sd = sd(airdata$Solar.R))
airdata$Wind <- pnorm(airdata$Wind, mean = mean(airdata$Wind), sd = sd(airdata$Wind))
airdata$Temp <- pnorm(airdata$Temp, mean = mean(airdata$Temp), sd = sd(airdata$Temp))

```


```{r line-plots}

g <- ggplot(airdata, aes(x=date,group=Month))  + geom_line(aes(y=Ozone)) + theme_classic() + geom_point(aes(y=Ozone))
h <- ggplot(airdata, aes(x=date,group=Month))  + geom_line(aes(y=Wind)) + theme_classic() + geom_point(aes(y=Wind))
i <- ggplot(airdata, aes(x=date,group=Month))  + geom_line(aes(y=Solar.R)) + theme_classic() + geom_point(aes(y=Solar.R))
j <- ggplot(airdata, aes(x=date,group=Month))  + geom_line(aes(y=Temp)) + theme_classic() + geom_point(aes(y=Temp))

plot_grid(g,h,i,j, nrow = 2, ncol = 2, labels = 'AUTO')


```

###

```{r}

g <- ggplot(airdata, aes(x=date,group=Month))  + geom_line(aes(y=Ozone,color='Ozone'))  + theme(title = element_text('Ozone'))
g <- g + geom_line(aes(y=Wind,color='Wind'))  + theme(title = element_text('Wind'))
g <- g + geom_line(aes(y=Solar.R, color='Solar'))  + theme(title = element_text('Solar'))
g <- g + geom_line(aes(y=Temp, color='Temp')) + theme(title = element_text('Temp'))
g <- g + ylab('Values') + theme_classic()
g

```



## heatmap
### Heatmap
```{r}


#ggplot(airdata, aes(x= date, y =Wind)) + geom_tile(aes(fill = Wind)) + scale_fill_gradient(low = "#132B43", high = "#56B1F7")


temp <- melt(airdata, id=c("Month","Day","date"))

#temp$scale_value <- pnorm(temp$value,mean = mean(temp$value), sd = sd(temp$value))

ggplot(temp, aes(x=Day, y=variable, fill= value)) + geom_tile() + facet_grid(Month ~ ., labeller = label_both )  +scale_color_gradient() + ggtitle('Daily air quality measurements in NY, May to Sep 1973') + scale_fill_gradientn(colours = rainbow(4), name = "Intensity")  + theme_linedraw()
  

#  + geom_text(aes(label= round(value,1)))
# scale_fill_gradientn(colours = rainbow(4), name = "Intensity") , + scale_fill_gradient(low='white', high='red')

```


### Scatter Plot
```{r}
g <-ggplot(airdata, aes(x=Wind,y=Temp)) + geom_point(aes(y=Ozone,color=Solar.R,size=Ozone)) + ggtitle('Daily air quality measurements in NY, May to Sep 1973')
g
```

# Final Analysis
##
### Analysis Report

- As  the temperature increases the wind decreases.it is negatively corelated with 95% confidence interval of value between -0.6256 and -0.3454.
```{r}
cor.test(airdata$Temp, airdata$Wind)
```

- As the temperature increases the Solar Radiation aslo increases. This has a week positive corelation.
```{r}
cor.test(airdata$Temp, airdata$Solar.R)
```

- The Ozone depletion is positively correlated with the Temperature.
```{r}
cor.test(airdata$Ozone, airdata$Temp)
```

##
###