Introduction
For this assignment purpose, I am going to use an air quality data set which has the following fields:
1. Ozone
2. Solar.R
3. Wind
4. Temp
5. Month
6. Day
This dataset is publicly available at:
"https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fquiz1_data.zip"
Data description
Daily readings of the following air quality values for May 1, 1973 (a Tuesday) to September 30, 1973.
Ozone: Mean ozone in parts per billion from 1300 to 1500 hours at Roosevelt Island
Solar.R: Solar radiation in Langleys in the frequency band 4000-7700 Angstroms from 0800 to
1200 hours at Central Park
Wind: Average wind speed in miles per hour at 0700 and 1000 hours at LaGuardia Airport
Temp: Maximum daily temperature in degrees Fahrenheit at La Guardia Airport.
Main Source of data:
The data were obtained from the New York State Department of Conservation (ozone data) and
the National Weather Service (meteorological data).
References :
Chambers, J. M., Cleveland, W. S., Kleiner, B. and Tukey, P. A. (1983) Graphical Methods for
Data Analysis. Belmont, CA: Wadsworth.
For quality purpose, the missing observation rows were completely removed from the original
dataset and final dataset has only complete observations of each of the variables.
First Few Observations of the Dataset
## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 7 23 299 8.6 65 5 7
## 8 19 99 13.8 59 5 8
Summarizing the Dataset
## Ozone Solar.R Wind Temp Month
## Min. : 1.0 Min. : 7.0 Min. : 2.30 Min. :57.00 5:24
## 1st Qu.: 18.0 1st Qu.:113.5 1st Qu.: 7.40 1st Qu.:71.00 6: 9
## Median : 31.0 Median :207.0 Median : 9.70 Median :79.00 7:26
## Mean : 42.1 Mean :184.8 Mean : 9.94 Mean :77.79 8:23
## 3rd Qu.: 62.0 3rd Qu.:255.5 3rd Qu.:11.50 3rd Qu.:84.50 9:29
## Max. :168.0 Max. :334.0 Max. :20.70 Max. :97.00
##
## Day
## 7 : 5
## 9 : 5
## 13 : 5
## 16 : 5
## 17 : 5
## 18 : 5
## (Other):81
Univariate Histograms of the Variables
Clearly it is evident from the histogram plots below that Ozone depletion is Left Skewed, while
temperature seems to be uniformly distributed.
Wind speed is double-peaked or bimodal distribution (two-humped camel like) and Solar radiation seems
like right skewed distribution.

Univariate Histograms of the Variable with Relative Frequencies (Density Plots)
Below are the univariate histograms of the variables along with their relative frequencies (Density
Plots).

Univariate Bar Plots of the Variables
Below graphs are the bar plots and distribution (ranges) of various variables in respective months.
clearly we can make out June month sees less variation for all the variables as compare to rest months
while September month has highest variations among all.

Univariate Box Plots
We may be interested in comparing the fluctuations in temperature across months or Ozone Depletion
across the months and so forth for the rest of the variables.
We can do this using boxplot.
Below are the boxplots of temperature, ozone, wind speed and solar radiations across months (May
through September, denoted as 5 through 9 in the graphs on Y-axis).

Below is the box plots of the variables, but this time these are not on the comparable scales (such
as months)
Univariate Box Plots

Pairwise Plotting of the Variables (Scatterplot Matrix)
Below is a scatterplot matrix between the variables.
This matrix depicts the various relationships between the (numeric) variables in the dataset.

Now we will explore the data using multivariate relationships.
Let us first set up a question (as below heading):
How do Ozone and temperature measurements relate?
To answer this, we will plot a scatter plot of the two variables, Ozone vs. Temp.

Clearly, we can see that as the temperature goes up, the depletion of Ozone goes up (or Vice versa).
Solar Radiation Vs. Wind Speed
Similarly, we want to learn how solar radiation varies with respect to wind speed.
Here, is the plot:

We can not depict any good relation. It looks like both are independent of each other.
Summary
From the various univariate and multivariate plots above, We can infer that:
1. Solar radiation and wind speed has no significant relationship with each other.
2. There is a significant negative relationship between Ozone depletion and termperature.
3. September month saw the maximum variations in the Ozone depletion, temperature, wind speed
and Solar radiation.
4. Scatter plot shows that Wind speed and Temperature has significant negative relationship.
5. Box plot shows that highest variation is in the solar radiation across all the months while
Ozone depletion has the lowest variations across all the months.
6. Histograms plots show that Ozone is left skewed while solar radiation is right skewed.
Temperature seems uniformaly distibuted data while wind speed is by modal distribution.