The purpose of this analysis is to assess the dataset for columns and their values that are unclear.
We will first try to load the data and observe each column for their specific values and figure out which ones are unclear or have values that are unclear.
## country location_name latitude longitude timezone last_updated_epoch
## 1 Afghanistan Kabul 34.52 69.18 Asia/Kabul 1693301400
## 2 Afghanistan Kabul 34.52 69.18 Asia/Kabul 1693364400
## 3 Afghanistan Kabul 34.52 69.18 Asia/Kabul 1693439100
## 4 Afghanistan Kabul 34.52 69.18 Asia/Kabul 1693525500
## 5 Afghanistan Kabul 34.52 69.18 Asia/Kabul 1693611000
## 6 Afghanistan Kabul 34.52 69.18 Asia/Kabul 1693698300
## last_updated temperature_celsius temperature_fahrenheit
## 1 8/29/2023 14:00 28.8 83.8
## 2 8/30/2023 7:30 21.3 70.3
## 3 8/31/2023 4:15 18.1 64.6
## 4 9/1/2023 4:15 19.2 66.6
## 5 9/2/2023 4:00 18.5 65.3
## 6 9/3/2023 4:15 17.0 62.6
## condition_text wind_mph wind_kph wind_degree wind_direction
## 1 Sunny 7.2 11.5 74 ENE
## 2 Sunny 2.2 3.6 199 SSW
## 3 Clear 2.2 3.6 256 WSW
## 4 Clear 2.2 3.6 282 WNW
## 5 Moderate rain at times 2.2 3.6 262 W
## 6 Clear 2.2 3.6 237 WSW
## pressure_mb pressure_in precip_mm precip_in humidity cloud feels_like_celsius
## 1 1004 29.64 0.0 0.00 19 0 26.7
## 2 1011 29.84 0.0 0.00 54 4 21.3
## 3 1010 29.83 0.0 0.00 40 0 18.1
## 4 1010 29.83 0.0 0.00 49 5 19.2
## 5 1010 29.82 0.5 0.02 40 87 18.6
## 6 1009 29.79 0.0 0.00 27 0 17.0
## feels_like_fahrenheit visibility_km visibility_miles uv_index gust_mph
## 1 80.1 10 6 7 8.3
## 2 70.3 10 6 6 2.5
## 3 64.6 10 6 1 3.4
## 4 66.6 10 6 1 3.1
## 5 65.5 10 6 1 2.7
## 6 62.6 10 6 1 2.9
## gust_kph air_quality_Carbon_Monoxide air_quality_Ozone
## 1 13.3 647.5 130.2
## 2 4.0 2964.0 57.2
## 3 5.4 754.4 46.5
## 4 5.0 1228.3 45.4
## 5 4.3 454.0 52.9
## 6 4.7 701.0 64.4
## air_quality_Nitrogen_dioxide air_quality_Sulphur_dioxide air_quality_PM2.5
## 1 1.2 0.4 7.9
## 2 20.9 0.8 31.7
## 3 6.4 0.4 7.7
## 4 12.7 0.7 20.9
## 5 4.7 0.4 10.8
## 6 6.8 0.6 12.2
## air_quality_PM10 air_quality_us_epa_index air_quality_gb_defra_index sunrise
## 1 11.1 1 1 5:24 AM
## 2 39.3 2 3 5:25 AM
## 3 12.8 1 1 5:25 AM
## 4 52.4 2 2 5:26 AM
## 5 24.3 1 1 5:26 AM
## 6 25.9 1 2 5:27 AM
## sunset moonrise moonset moon_phase moon_illumination
## 1 6:24 PM 5:39 PM 2:48 AM Waxing Gibbous 93
## 2 6:23 PM 6:18 PM 4:05 AM Full Moon 98
## 3 6:23 PM 6:18 PM 4:05 AM Full Moon 98
## 4 6:21 PM 6:52 PM 5:22 AM Waning Gibbous 100
## 5 6:20 PM 7:23 PM 6:36 AM Waning Gibbous 99
## 6 6:19 PM 7:53 PM 7:48 AM Waning Gibbous 94
Three columns that are unclear in the dataset without properly reading the documentation are:
## condition_text last_updated_epoch air_quality_Carbon_Monoxide
## Length:2534 Min. :1.693e+09 Min. : 123.5
## Class :character 1st Qu.:1.694e+09 1st Qu.: 220.3
## Mode :character Median :1.694e+09 Median : 270.4
## Mean :1.694e+09 Mean : 488.5
## 3rd Qu.:1.694e+09 3rd Qu.: 433.9
## Max. :1.694e+09 Max. :18158.0
## air_quality_Ozone moon_phase moon_illumination
## Min. : 0.00 Length:2534 Min. : 30.00
## 1st Qu.: 18.10 Class :character 1st Qu.: 60.00
## Median : 35.80 Mode :character Median : 88.00
## Mean : 40.93 Mean : 76.68
## 3rd Qu.: 55.80 3rd Qu.: 98.00
## Max. :320.40 Max. :100.00
## air_quality_us_epa_index air_quality_gb_defra_index
## Min. :1.000 Min. : 1.000
## 1st Qu.:1.000 1st Qu.: 1.000
## Median :1.000 Median : 1.000
## Mean :1.464 Mean : 2.053
## 3rd Qu.:2.000 3rd Qu.: 2.000
## Max. :6.000 Max. :10.000
Last Updated Epoch: It appears to be a timestamp but it’s not immediately clear to a layperson what the numbers mean or how they translate to a date and time. Unix timestamps represent time in seconds since January 1, 1970 (the Unix epoch), which makes it easy for programs to calculate time intervals and convert to different time zones.
Condition Text: This column likely refers to the general weather conditions (e.g., “Sunny”), but without context, it’s not clear if this is a subjective description. This column is meant for general understanding.
Air Quality Carbon Monoxide and Air Quality Ozone: The values in these columns are presumably measurements of different air pollutants, without this context, the significance and health implications of these numbers would be unclear, and users might not understand the severity or safety of the air quality levels. Specifying pollutants in parts per million (ppm) or micrograms per cubic meter (µg/m³) follows scientific and regulatory standards.
Air Quality Index: It would be unclear what scale or index the “air quality_us_epa_index” and “air quality_gb-defra-index” columns are using without knowledge of their scales. It refer to specific air quality measurement scales (like what numerical value corresponds to “good” or “poor” air quality).
Moon Phase: Without context, it’s not immediately evident what the different phases signify or how moon illumination is measured and what its percentages represent. Without knowledge of lunar phases, some users might not understand the implications of each phase (like how it affects tides or light at night).
For this part we focus on air quality indices the “air_quality_us_epa_index”. The challenge with this indices is that without understanding their specific categorizations or breakpoints, the raw numbers might be misleading or difficult to interpret for users not familiar with these systems.
We are now going to focus on visualizing the air_quality_us_epa_index column. This column posed a challenge due to incomplete information regarding the EPA’s Air Quality Index. The index uses a numerical scale ranging from 1 to 6 to indicate varying levels of air quality concern. For example, a numerical value of 1 corresponds to “Good,” signifying satisfactory air quality.
Our visualization will explore the fluctuations in the EPA’s Air Quality Index over a specific period, from August 30th to September 9th, 2023. This will allow us to observe and analyze the changes in air quality during this timeframe.