Introduction

The air quality of a polluted Italian city was measured by metal oxide chemical sensors located at ground level. Five different gas concentrations were measured: CO, NHMC, NOx, NO2, O3. The temperature, humidity and time of day were also measured. First, plot temperature vs humidity to show if these two variables are related.

ggplot(qual, aes(x = T, y = AH)) + geom_point(size = 0.75) + labs(x = "Temperature in degrees Celsius", y = "Absolute Humidity (mg/m^3)")

The plot shows a positive correlation. Generally, as the temperature increases, humidity increases as well.

Scatter Plots

Now, create a scatter plot for each of the different gas concentrations vs humidity/temperature. Ten different plots are created from the Air Quality dataset.

Side by side plots:

require(gridExtra)
grid.arrange(p0, p1, ncol=2)

grid.arrange(p2, p3, ncol=2)

grid.arrange(p4, p5, ncol=2)

grid.arrange(p6, p7, ncol=2)

grid.arrange(p8, p9, ncol=2)

Most of the scatter plots above dont have a specific pattern, except for the 4th gas, Nitrogen Dioxide (NO2). The plot shows a positive correlation between the gas concentration of NO2 and both temperature and humidity.

Using a scatter plot with color can show a relationship between three seperate variables.

ggplot(qual, aes(x = AH, y = PT08.S4, color = T)) + geom_point(size = 0.75) + labs(x = "Absolute Humidity", y = "Gas Concentration (NO2)")

We can see the correlation between absolute humidity and the nitrogen dioxide gas concentration. As the AH increases, the gas concentration increases. The color represents the temperature in degrees celsius. As the shade of blue gets lighter, the temperature increases.

It is interesting because the color of the points become lighter as the absolute humidity increases. This makes sense because of our first plot correlation.

Another data visualization technique that can be used in finding correlations using correlogram.

Correlogram

A corelogram shows multiple variables plotted against each other. The data frame was changed so that the ggpairs function only took wanted variables.

ggpairs(qual2, title = "Correlations of Gas Concentrations, Temperature and Humidity")

The correlation is also given as a number value between -1 and 1. The values closest to zero show the least amount of correlation. Lower negative values means a stronger negative correlation and higher positive value means a stronger positve correlation.

ggcorr(qual2, title = "visualization by Color")

The correlogram above is much easier to read the actual correlation between each variable. Instead of small scatter plots, decimals and hard to read axis, this chart just uses color. Red is a strong positive correlation and blue is a strong negative correlation.

It is interesting to see that some gas concentrations are heavily correlated to each other such as CO and NMHC. The scatter plot correlations from earlier can also be seen in this chart.

Conclusion

It is interesting to see that most of the gas concentrations are positively related to each other. As more gas is being released into the atmosphere from emissions, pollution and climate change. For example, as CO increases, NMHC is strongly correlated so the gas concentration will increase.

This small sample of gas concentrations from the Italian city can be representative of ozone emissions and global warming. Gas concentration is generally higher based off increase in other gases, increase in temperature and humidity.

References