Data Dive 5

  1. A list of at least 3 columns (or values) in your data which are unclear until you read the documentation.
  1. At least one element or your data that is unclear even after reading the documentation
  1. Build a visualization which uses a column of data that is affected by the issue you brought up in bullet #2, above. In this visualization, find a way to highlight the issue, and explain what is unclear and why it might be unclear.

Unclear Columns Analysis

The purpose of this analysis is to assess the dataset for columns and their values that are unclear.

We will first try to load the data and observe each column for their specific values and figure out which ones are unclear or have values that are unclear.

##       country location_name latitude longitude   timezone last_updated_epoch
## 1 Afghanistan         Kabul    34.52     69.18 Asia/Kabul         1693301400
## 2 Afghanistan         Kabul    34.52     69.18 Asia/Kabul         1693364400
## 3 Afghanistan         Kabul    34.52     69.18 Asia/Kabul         1693439100
## 4 Afghanistan         Kabul    34.52     69.18 Asia/Kabul         1693525500
## 5 Afghanistan         Kabul    34.52     69.18 Asia/Kabul         1693611000
## 6 Afghanistan         Kabul    34.52     69.18 Asia/Kabul         1693698300
##      last_updated temperature_celsius temperature_fahrenheit
## 1 8/29/2023 14:00                28.8                   83.8
## 2  8/30/2023 7:30                21.3                   70.3
## 3  8/31/2023 4:15                18.1                   64.6
## 4   9/1/2023 4:15                19.2                   66.6
## 5   9/2/2023 4:00                18.5                   65.3
## 6   9/3/2023 4:15                17.0                   62.6
##           condition_text wind_mph wind_kph wind_degree wind_direction
## 1                  Sunny      7.2     11.5          74            ENE
## 2                  Sunny      2.2      3.6         199            SSW
## 3                  Clear      2.2      3.6         256            WSW
## 4                  Clear      2.2      3.6         282            WNW
## 5 Moderate rain at times      2.2      3.6         262              W
## 6                  Clear      2.2      3.6         237            WSW
##   pressure_mb pressure_in precip_mm precip_in humidity cloud feels_like_celsius
## 1        1004       29.64       0.0      0.00       19     0               26.7
## 2        1011       29.84       0.0      0.00       54     4               21.3
## 3        1010       29.83       0.0      0.00       40     0               18.1
## 4        1010       29.83       0.0      0.00       49     5               19.2
## 5        1010       29.82       0.5      0.02       40    87               18.6
## 6        1009       29.79       0.0      0.00       27     0               17.0
##   feels_like_fahrenheit visibility_km visibility_miles uv_index gust_mph
## 1                  80.1            10                6        7      8.3
## 2                  70.3            10                6        6      2.5
## 3                  64.6            10                6        1      3.4
## 4                  66.6            10                6        1      3.1
## 5                  65.5            10                6        1      2.7
## 6                  62.6            10                6        1      2.9
##   gust_kph air_quality_Carbon_Monoxide air_quality_Ozone
## 1     13.3                       647.5             130.2
## 2      4.0                      2964.0              57.2
## 3      5.4                       754.4              46.5
## 4      5.0                      1228.3              45.4
## 5      4.3                       454.0              52.9
## 6      4.7                       701.0              64.4
##   air_quality_Nitrogen_dioxide air_quality_Sulphur_dioxide air_quality_PM2.5
## 1                          1.2                         0.4               7.9
## 2                         20.9                         0.8              31.7
## 3                          6.4                         0.4               7.7
## 4                         12.7                         0.7              20.9
## 5                          4.7                         0.4              10.8
## 6                          6.8                         0.6              12.2
##   air_quality_PM10 air_quality_us_epa_index air_quality_gb_defra_index sunrise
## 1             11.1                        1                          1 5:24 AM
## 2             39.3                        2                          3 5:25 AM
## 3             12.8                        1                          1 5:25 AM
## 4             52.4                        2                          2 5:26 AM
## 5             24.3                        1                          1 5:26 AM
## 6             25.9                        1                          2 5:27 AM
##    sunset moonrise moonset     moon_phase moon_illumination
## 1 6:24 PM  5:39 PM 2:48 AM Waxing Gibbous                93
## 2 6:23 PM  6:18 PM 4:05 AM      Full Moon                98
## 3 6:23 PM  6:18 PM 4:05 AM      Full Moon                98
## 4 6:21 PM  6:52 PM 5:22 AM Waning Gibbous               100
## 5 6:20 PM  7:23 PM 6:36 AM Waning Gibbous                99
## 6 6:19 PM  7:53 PM 7:48 AM Waning Gibbous                94

Identifying Unclear Columns

Three columns that are unclear in the dataset without properly reading the documentation are:

  1. Last Updated Epoch: It’s unclear what this column represents. It appears to be a timestamp.
  2. Condition Text: Without context, it’s not clear if this is a categorization, or based on specific criteria.
  3. Air Quality Carbon Monoxide and Air Quality Ozone: The values in these columns are presumably measurements of different air pollutants, but it’s not clear what the units of measurement are.
  4. Air Quality Index: These indices might have specific categorizations or health advisories associated, but without knowledge of these scales, the numbers don’t provide much information.
  5. Moon Phase: It indicates the current phase of the moon (e.g., full moon, waxing crescent) however it is unclear what information the phase provides.
##  condition_text     last_updated_epoch  air_quality_Carbon_Monoxide
##  Length:2534        Min.   :1.693e+09   Min.   :  123.5            
##  Class :character   1st Qu.:1.694e+09   1st Qu.:  220.3            
##  Mode  :character   Median :1.694e+09   Median :  270.4            
##                     Mean   :1.694e+09   Mean   :  488.5            
##                     3rd Qu.:1.694e+09   3rd Qu.:  433.9            
##                     Max.   :1.694e+09   Max.   :18158.0            
##  air_quality_Ozone  moon_phase        moon_illumination
##  Min.   :  0.00    Length:2534        Min.   : 30.00   
##  1st Qu.: 18.10    Class :character   1st Qu.: 60.00   
##  Median : 35.80    Mode  :character   Median : 88.00   
##  Mean   : 40.93                       Mean   : 76.68   
##  3rd Qu.: 55.80                       3rd Qu.: 98.00   
##  Max.   :320.40                       Max.   :100.00   
##  air_quality_us_epa_index air_quality_gb_defra_index
##  Min.   :1.000            Min.   : 1.000            
##  1st Qu.:1.000            1st Qu.: 1.000            
##  Median :1.000            Median : 1.000            
##  Mean   :1.464            Mean   : 2.053            
##  3rd Qu.:2.000            3rd Qu.: 2.000            
##  Max.   :6.000            Max.   :10.000

Encoding Rationale and Implications of Not Reading Documentation

  1. Last Updated Epoch: It appears to be a timestamp but it’s not immediately clear to a layperson what the numbers mean or how they translate to a date and time. Unix timestamps represent time in seconds since January 1, 1970 (the Unix epoch), which makes it easy for programs to calculate time intervals and convert to different time zones.

  2. Condition Text: This column likely refers to the general weather conditions (e.g., “Sunny”), but without context, it’s not clear if this is a subjective description. This column is meant for general understanding.

  3. Air Quality Carbon Monoxide and Air Quality Ozone: The values in these columns are presumably measurements of different air pollutants, without this context, the significance and health implications of these numbers would be unclear, and users might not understand the severity or safety of the air quality levels. Specifying pollutants in parts per million (ppm) or micrograms per cubic meter (µg/m³) follows scientific and regulatory standards.

  4. Air Quality Index: It would be unclear what scale or index the “air quality_us_epa_index” and “air quality_gb-defra-index” columns are using without knowledge of their scales. It refer to specific air quality measurement scales (like what numerical value corresponds to “good” or “poor” air quality).

  5. Moon Phase: Without context, it’s not immediately evident what the different phases signify or how moon illumination is measured and what its percentages represent. Without knowledge of lunar phases, some users might not understand the implications of each phase (like how it affects tides or light at night).

Elements Unclear Even After Reading Documentation

For this part we focus on air quality indices the “air_quality_us_epa_index”. The challenge with this indices is that without understanding their specific categorizations or breakpoints, the raw numbers might be misleading or difficult to interpret for users not familiar with these systems.

Visualization

We are now going to focus on visualizing the air_quality_us_epa_index column. This column posed a challenge due to incomplete information regarding the EPA’s Air Quality Index. The index uses a numerical scale ranging from 1 to 6 to indicate varying levels of air quality concern. For example, a numerical value of 1 corresponds to “Good,” signifying satisfactory air quality.

Our visualization will explore the fluctuations in the EPA’s Air Quality Index over a specific period, from August 30th to September 9th, 2023. This will allow us to observe and analyze the changes in air quality during this timeframe.