Project 2 - Data Transformation

The goal of this assignment is to give you practice in preparing different datasets for downstream analysis work.

Your task is to: Choose any three of the “wide” datasets identified in the course Slack channel. (You may use your own dataset; please don’t use my Sample Post dataset, since that was used in your previous assignment!) For each of the three chosen datasets: • Create a .CSV file (or optionally, a MySQL database!) that includes all of the information included in the dataset. You’re encouraged to use a “wide” structure similar to how the information appears in the discussion item, so that you can practice tidying and transformations as described below. • Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. [Most of your grade will be based on this step!] • Perform the analysis requested in the discussion item. • Your code should be in an R Markdown file, posted to rpubs.com, and should include narrative descriptions of your data cleanup work, analysis, and conclusions.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(lubridate)
library(anytime)

DATA SET 1 - Global Weather

This dataset provides weather information daily starting from August 29, 2023, for capital cities around the world. It includes different features of the weather which reflect weather conditions globally. For example, wind direction, precipitation amount, pressure, humidity, and more. I will be tidying this data so that for each location I can see the temperature for October 14, 2023, in Fahrenheit, precipitation amount, percentage of cloud coverage, wind in miles per hour, humidity, and feels-like in Farenheit.

rawlink_weather <- 'https://raw.githubusercontent.com/rkasa01/DATA607_Project2_Data/main/GlobalWeatherRepository.csv' 
weather_data <- read.csv(rawlink_weather)
head(weather_data)
##               country    location_name latitude longitude        timezone
## 1         Afghanistan            Kabul    34.52     69.18      Asia/Kabul
## 2             Albania           Tirana    41.33     19.82   Europe/Tirane
## 3             Algeria          Algiers    36.76      3.05  Africa/Algiers
## 4             Andorra Andorra La Vella    42.50      1.52  Europe/Andorra
## 5              Angola           Luanda    -8.84     13.23   Africa/Luanda
## 6 Antigua and Barbuda     Saint John's    17.12    -61.85 America/Antigua
##   last_updated_epoch     last_updated temperature_celsius
## 1         1693301400 2023-08-29 14:00                28.8
## 2         1693301400 2023-08-29 11:30                27.0
## 3         1693301400 2023-08-29 10:30                28.0
## 4         1693301400 2023-08-29 11:30                10.2
## 5         1693301400 2023-08-29 10:30                25.0
## 6         1693301400 2023-08-29 05:30                29.0
##   temperature_fahrenheit condition_text wind_mph wind_kph wind_degree
## 1                   83.8          Sunny      7.2     11.5          74
## 2                   80.6  Partly cloudy      3.8      6.1         210
## 3                   82.4  Partly cloudy      8.1     13.0         240
## 4                   50.4          Sunny      6.0      9.7         345
## 5                   77.0  Partly cloudy      2.2      3.6         270
## 6                   84.2     Light rain      9.4     15.1          90
##   wind_direction pressure_mb pressure_in precip_mm precip_in humidity cloud
## 1            ENE        1004       29.64       0.0      0.00       19     0
## 2            SSW        1006       29.71       0.0      0.00       54    75
## 3            WSW        1014       29.94       0.0      0.00       30    25
## 4            NNW        1015       29.97       0.0      0.00       51     6
## 5              W        1016       30.00       0.0      0.00       69    75
## 6              E        1015       29.97       0.3      0.01       79    75
##   feels_like_celsius feels_like_fahrenheit visibility_km visibility_miles
## 1               26.7                  80.1            10                6
## 2               28.0                  82.3            10                6
## 3               27.4                  81.3            10                6
## 4                8.9                  48.0            10                6
## 5               26.9                  80.4            10                6
## 6               34.0                  93.2            10                6
##   uv_index gust_mph gust_kph air_quality_Carbon_Monoxide air_quality_Ozone
## 1        7      8.3     13.3                       647.5             130.2
## 2        6      7.4     11.9                       433.9             104.4
## 3        7      3.4      5.4                       647.5              16.6
## 4        4      7.4     11.9                       190.3              68.0
## 5        6      3.6      5.8                      2136.2             147.3
## 6        1     23.3     37.4                       200.3              16.6
##   air_quality_Nitrogen_dioxide air_quality_Sulphur_dioxide air_quality_PM2.5
## 1                          1.2                         0.4               7.9
## 2                          3.6                         1.8              28.2
## 3                         63.1                        12.6               6.4
## 4                          0.2                         0.2               0.5
## 5                         52.8                        26.9             139.6
## 6                          0.5                         0.5               0.8
##   air_quality_PM10 air_quality_us.epa.index air_quality_gb.defra.index  sunrise
## 1             11.1                        1                          1 05:24 AM
## 2             29.6                        2                          3 06:04 AM
## 3              7.9                        1                          1 06:16 AM
## 4              0.8                        1                          1 07:16 AM
## 5            203.3                        4                         10 06:11 AM
## 6              1.9                        1                          1 05:53 AM
##     sunset moonrise  moonset     moon_phase moon_illumination
## 1 06:24 PM 05:39 PM 02:48 AM Waxing Gibbous                93
## 2 07:19 PM 06:50 PM 03:25 AM Waxing Gibbous                93
## 3 07:21 PM 06:46 PM 03:50 AM Waxing Gibbous                93
## 4 08:34 PM 08:08 PM 04:38 AM Waxing Gibbous                93
## 5 06:06 PM 04:43 PM 04:41 AM Waxing Gibbous                93
## 6 06:23 PM 05:36 PM 04:09 AM Waxing Gibbous                93

Here is the dataset with the specific data which I am looking for:

selected_weather_data <- weather_data %>%
  select(country,location_name, temperature_fahrenheit, precip_in, cloud, wind_mph, humidity, feels_like_fahrenheit)

head(selected_weather_data )
##               country    location_name temperature_fahrenheit precip_in cloud
## 1         Afghanistan            Kabul                   83.8      0.00     0
## 2             Albania           Tirana                   80.6      0.00    75
## 3             Algeria          Algiers                   82.4      0.00    25
## 4             Andorra Andorra La Vella                   50.4      0.00     6
## 5              Angola           Luanda                   77.0      0.00    75
## 6 Antigua and Barbuda     Saint John's                   84.2      0.01    75
##   wind_mph humidity feels_like_fahrenheit
## 1      7.2       19                  80.1
## 2      3.8       54                  82.3
## 3      8.1       30                  81.3
## 4      6.0       51                  48.0
## 5      2.2       69                  80.4
## 6      9.4       79                  93.2

I then wanted to bring back the “last_updated” variable, to only view weather information corresponding to the last date of the recorded data: September 3rd, 2023.

sept_3_weather_data <- weather_data %>%
  mutate(Date = as.Date(last_updated, format = "%Y-%m-%d")) %>% 
  filter(Date == as.Date("2023-09-03")) 
sept_3_weather_data <- sept_3_weather_data %>%
  select(country, location_name, temperature_fahrenheit, feels_like_fahrenheit, precip_in, cloud, humidity, wind_mph, last_updated)

head(sept_3_weather_data)
##       country    location_name temperature_fahrenheit feels_like_fahrenheit
## 1 Afghanistan            Kabul                   62.6                  62.6
## 2     Albania           Tirana                   68.0                  68.0
## 3     Algeria          Algiers                   80.6                  84.8
## 4     Andorra Andorra La Vella                   45.0                  42.6
## 5      Angola           Luanda                   71.6                  76.2
## 6     Armenia          Yerevan                   78.8                  79.8
##   precip_in cloud humidity wind_mph     last_updated
## 1      0.00     0       27      2.2 2023-09-03 04:15
## 2      0.00     0       88      2.2 2023-09-03 01:45
## 3      0.25    75       54     10.5 2023-09-03 00:30
## 4      0.00    76       73      4.5 2023-09-03 01:45
## 5      0.00    75       88      6.9 2023-09-03 00:45
## 6      0.00     0       39     15.0 2023-09-03 03:45

After that, I was curious and wanted to answer the following question: Which location experienced the highest, and lowest temperature on September 3rd, 2023?

Here is the (further) tidied information which we would need to answer that question:

sept_3_weather_data_temp <- sept_3_weather_data %>%
  select(country, location_name, temperature_fahrenheit, last_updated)

head(sept_3_weather_data_temp)
##       country    location_name temperature_fahrenheit     last_updated
## 1 Afghanistan            Kabul                   62.6 2023-09-03 04:15
## 2     Albania           Tirana                   68.0 2023-09-03 01:45
## 3     Algeria          Algiers                   80.6 2023-09-03 00:30
## 4     Andorra Andorra La Vella                   45.0 2023-09-03 01:45
## 5      Angola           Luanda                   71.6 2023-09-03 00:45
## 6     Armenia          Yerevan                   78.8 2023-09-03 03:45

And here is how I found the highest and lowest temperature globally on this date:

row_with_lowest_temp <- which.min(sept_3_weather_data_temp$temperature_fahrenheit)
lowest_temp_location <- sept_3_weather_data_temp$location_name[row_with_lowest_temp]
lowest_temp_country <- sept_3_weather_data_temp$country[row_with_lowest_temp]
lowest_temp_value <- sept_3_weather_data_temp$temperature_fahrenheit[row_with_lowest_temp]
row_with_highest_temp <- which.max(sept_3_weather_data_temp$temperature_fahrenheit)
highest_temp_location <- sept_3_weather_data_temp$location_name[row_with_highest_temp]
highest_temp_country <- sept_3_weather_data_temp$country[row_with_highest_temp]
highest_temp_value <- sept_3_weather_data_temp$temperature_fahrenheit[row_with_highest_temp]

cat("Location with the lowest temperature:", lowest_temp_location, "\n")
## Location with the lowest temperature: Andorra La Vella
cat("Country:", lowest_temp_country, "\n")
## Country: Andorra
cat("Temperature:", lowest_temp_value, "°F\n\n")
## Temperature: 45 °F
cat("Location with the highest temperature:", highest_temp_location, "\n")
## Location with the highest temperature: Carreria
cat("Country:", highest_temp_country, "\n")
## Country: Paraguay
cat("Temperature:", highest_temp_value, "°F\n")
## Temperature: 99.1 °F

Here we can see that the lowest temperature globally recorded was in Andorra La Vell, Andorra, with a temperature of 45 °F, whereas the highest temperature was found in Carreria, Paraguay, with a temperature of 99.1 °F!