Assignment 1 - Solution
Data Description
The following code generates the required dataset for this assignment. It will create a data frame with 155 rows and 13 columns. The variables of this data frame are as following:
Date: Date of the reading (in YYYY-MM-DD format)Temperature: Temperature reading in CelsiusHumidity: Humidity reading as a percentagePressure: Atmospheric pressure in millibarsWindSpeed: Wind speed in kilometers per hourWindDirection: Wind directionDewPoint: Dew point temperature in CelsiusCloudCover: Cloud cover as a percentagePrecipitation: Precipitation amount in millimetersVisibility: Visibility distance in kilometersUVIndex: UV index readingcondition: The global weather condition over the dayLocation: The city of the recorded data
# Set the seed for reproducibility
set.seed(246)
# Create a sequence of dates for few months in 2024
date_March <- seq(from = as.Date("2024-03-01"), to = as.Date("2024-03-31"), by = 1)
date_May <- seq(from = as.Date("2024-05-01"), to = as.Date("2024-05-31"), by = 1)
date_July <- seq(from = as.Date("2024-07-01"), to = as.Date("2024-07-31"), by = 1)
date_October <- seq(from = as.Date("2024-10-01"), to = as.Date("2024-10-31"), by = 1)
date_December <- seq(from = as.Date("2024-12-01"), to = as.Date("2024-12-31"), by = 1)
dates = c(date_March, date_May, date_July, date_October, date_December)
# Determine the correct number of repetitions for Location
cities <- c("Canberra", "Melbourne", "Sydney")
Location <- rep(cities, length.out = length(dates))
# Create a data frame to store the weather data
weather <- data.frame(
Date = dates,
Temperature = round(runif(length(dates), 2, 37), 1),
Humidity = sample(50:100, length(dates), replace = TRUE),
Pressure = sample(995:1015, length(dates), replace = TRUE),
WindSpeed = sample(5:20, length(dates), replace = TRUE),
WindDirection = sample(c("N", "NE", "E", "SE", "S", "SW", "W", "NW"), length(dates), replace = TRUE),
DewPoint = round(runif(length(dates), 10, 15), 1),
CloudCover = sample(0:100, length(dates), replace = TRUE),
Precipitation = round(runif(length(dates), 0, 15), 1),
Visibility = sample(5:20, length(dates), replace = TRUE),
UVIndex = sample(1:12, length(dates), replace = TRUE),
Condition = sample(c("Sunny", "Partly Cloudy", "Rainy", "Snowy"), length(dates), replace = TRUE),
Location = Location
)Initial Section:
Submission for Part A: Data Understanding
Please follow this structure:
1- The data set description goes here
The dataset encompasses daily weather data for three Australian cities—Canberra, Melbourne, and Sydney—over five months during 2024. It has 13 variables and 155 rows, which address weather metrics that comprise temperature, humidity, wind speed, and cloud cover. This dataset follows atmospheric pressure, precipitation, and general weather conditions to create a detailed daily weather report.
For the analysis of seasonal trends, weather patterns, and variations in local climate, this data offers itself as a resource. Furthermore, it can enhance predictive modeling for weather forecasting, find correlations between variables (e.g., temperature and humidity), and elucidate environmental factors that impinge upon visibility and the UV index.
# Set the seed for reproducibility
set.seed(246)
# Create a sequence of dates for few months in 2024
date_March <- seq(from = as.Date("2024-03-01"), to = as.Date("2024-03-31"), by = 1)
date_May <- seq(from = as.Date("2024-05-01"), to = as.Date("2024-05-31"), by = 1)
date_July <- seq(from = as.Date("2024-07-01"), to = as.Date("2024-07-31"), by = 1)
date_October <- seq(from = as.Date("2024-10-01"), to = as.Date("2024-10-31"), by = 1)
date_December <- seq(from = as.Date("2024-12-01"), to = as.Date("2024-12-31"), by = 1)
dates = c(date_March, date_May, date_July, date_October, date_December)
# Determine the correct number of repetitions for Location
cities <- c("Canberra", "Melbourne", "Sydney")
Location <- rep(cities, length.out = length(dates))
# Create a data frame to store the weather data
weather <- data.frame(
Date = dates,
Temperature = round(runif(length(dates), 2, 37), 1),
Humidity = sample(50:100, length(dates), replace = TRUE),
Pressure = sample(995:1015, length(dates), replace = TRUE),
WindSpeed = sample(5:20, length(dates), replace = TRUE),
WindDirection = sample(c("N", "NE", "E", "SE", "S", "SW", "W", "NW"), length(dates), replace = TRUE),
DewPoint = round(runif(length(dates), 10, 15), 1),
CloudCover = sample(0:100, length(dates), replace = TRUE),
Precipitation = round(runif(length(dates), 0, 15), 1),
Visibility = sample(5:20, length(dates), replace = TRUE),
UVIndex = sample(1:12, length(dates), replace = TRUE),
Condition = sample(c("Sunny", "Partly Cloudy", "Rainy", "Snowy"), length(dates), replace = TRUE),
Location = Location
)
# Set the seed for reproducibility
set.seed(246)
# Create a sequence of dates for few months in 2024
date_March <- seq(from = as.Date("2024-03-01"), to = as.Date("2024-03-31"), by = 1)
date_May <- seq(from = as.Date("2024-05-01"), to = as.Date("2024-05-31"), by = 1)
date_July <- seq(from = as.Date("2024-07-01"), to = as.Date("2024-07-31"), by = 1)
date_October <- seq(from = as.Date("2024-10-01"), to = as.Date("2024-10-31"), by = 1)
date_December <- seq(from = as.Date("2024-12-01"), to = as.Date("2024-12-31"), by = 1)
dates = c(date_March, date_May, date_July, date_October, date_December)
# Determine the correct number of repetitions for Location
cities <- c("Canberra", "Melbourne", "Sydney")
Location <- rep(cities, length.out = length(dates))
# Create a data frame to store the weather data
weather <- data.frame(
Date = dates,
Temperature = round(runif(length(dates), 2, 37), 1),
Humidity = sample(50:100, length(dates), replace = TRUE),
Pressure = sample(995:1015, length(dates), replace = TRUE),
WindSpeed = sample(5:20, length(dates), replace = TRUE),
WindDirection = sample(c("N", "NE", "E", "SE", "S", "SW", "W", "NW"), length(dates), replace = TRUE),
DewPoint = round(runif(length(dates), 10, 15), 1),
CloudCover = sample(0:100, length(dates), replace = TRUE),
Precipitation = round(runif(length(dates), 0, 15), 1),
Visibility = sample(5:20, length(dates), replace = TRUE),
UVIndex = sample(1:12, length(dates), replace = TRUE),
Condition = sample(c("Sunny", "Partly Cloudy", "Rainy", "Snowy"), length(dates), replace = TRUE),
Location = Location
)
# Summary of the weather dataset
summary(weather)## Date Temperature Humidity Pressure
## Min. :2024-03-01 Min. : 2.00 Min. : 50.00 Min. : 995
## 1st Qu.:2024-05-08 1st Qu.: 9.20 1st Qu.: 66.00 1st Qu.:1000
## Median :2024-07-16 Median :17.60 Median : 77.00 Median :1004
## Mean :2024-07-28 Mean :18.06 Mean : 76.31 Mean :1004
## 3rd Qu.:2024-10-23 3rd Qu.:26.35 3rd Qu.: 88.00 3rd Qu.:1009
## Max. :2024-12-31 Max. :37.00 Max. :100.00 Max. :1015
## WindSpeed WindDirection DewPoint CloudCover
## Min. : 5.00 Length:155 Min. :10.00 Min. : 0.00
## 1st Qu.: 9.00 Class :character 1st Qu.:11.00 1st Qu.: 24.50
## Median :13.00 Mode :character Median :12.30 Median : 52.00
## Mean :12.44 Mean :12.44 Mean : 51.24
## 3rd Qu.:16.00 3rd Qu.:13.65 3rd Qu.: 78.00
## Max. :20.00 Max. :15.00 Max. :100.00
## Precipitation Visibility UVIndex Condition
## Min. : 0.100 Min. : 5.00 Min. : 1.000 Length:155
## 1st Qu.: 3.850 1st Qu.: 9.00 1st Qu.: 3.000 Class :character
## Median : 7.600 Median :13.00 Median : 6.000 Mode :character
## Mean : 7.408 Mean :12.85 Mean : 6.297
## 3rd Qu.:10.850 3rd Qu.:17.00 3rd Qu.: 9.500
## Max. :14.800 Max. :20.00 Max. :12.000
## Location
## Length:155
## Class :character
## Mode :character
##
##
##
#----------------------------------#
# 3- The code for task 3 goes here
hist(weather$Temperature,
main = "Histogram of Temperature Readings",
xlab = "Temperature (°C)",
col = "blue",
border = "black")4- The reflection and notes for task 2 and 3 goes here
For task 2,The range of dates confirms even distribution across the selected months.
Temperature(Min/Max,Mean/Median,Quartiles)
Humidity
Pressure
Wind Speed/Directions
Dew Point, Cloud Cover, Precipitation, Visibility, UV Index: Each variable contributes to understanding specific weather characteristics.
Condition: Summary shows the frequency of each weather type, highlighting predominant conditions.
For task 3, The histogram of temperature readings will visually display the distribution of temperatures: Shape: The shape of the histogram (normal, skewed, bimodal, and so on) delivers understanding on the variations in temperature observed in the dataset. Bins: Selecting a bin width can affect the appearance of the histogram; picking narrower bins may show more detail and broader bins can viably smooth out variations. Outliers: Abnormal increases or deficits in the histogram might signal outliers or exceptional temperature events.
Submission for part B: Vector and Matrix Manipulation
Please follow this structure:
#----------------------------------#
# 1- The code for task 1 goes here
# Ensure the Date column is in Date format
weather$Date <- as.Date(weather$Date)
# Extract the month and year from the Date
weather$Month <- format(weather$Date, "%Y-%m")
# Calculate average temperatures for each month
average_temperatures <- aggregate(Temperature ~ Month, data = weather, FUN = mean)
# Create a vector containing the average temperature readings
average_temperature_vector <- average_temperatures$Temperature
# Print the average temperature vector
print(average_temperature_vector)## [1] 18.86774 18.78387 17.02903 18.04839 17.57419
#----------------------------------#
# 2- The code for task 2 goes here
# Calculate average humidity for each city by month
average_humidity <- aggregate(Humidity ~ Month + Location, data = weather, FUN = mean)
# Create separate vectors for each city
average_humidity_canberra <- average_humidity$Humidity[average_humidity$Location == "Canberra"]
average_humidity_melbourne <- average_humidity$Humidity[average_humidity$Location == "Melbourne"]
average_humidity_sydney <- average_humidity$Humidity[average_humidity$Location == "Sydney"]
# Print the vectors
print(average_humidity_canberra)## [1] 76.45455 71.50000 76.60000 74.72727 73.60000
## [1] 81.20000 80.09091 78.10000 78.30000 74.18182
## [1] 84.00000 74.40000 77.18182 66.00000 78.20000
#----------------------------------#
# 3- The code for task 3 goes here
# Calculate average for each variable by month
average_monthly_data <- aggregate(cbind(Temperature, Humidity, Pressure, WindSpeed) ~ Month, data = weather, FUN = mean)
# Convert to matrix, excluding the Month column
average_monthly_matrix <- as.matrix(average_monthly_data[, -1]) # Exclude the Month column
# Set row names to the months for better readability
rownames(average_monthly_matrix) <- average_monthly_data$Month
# Print the matrix
print(average_monthly_matrix)## Temperature Humidity Pressure WindSpeed
## 2024-03 18.86774 80.41935 1002.387 13.00000
## 2024-05 18.78387 75.48387 1006.452 12.54839
## 2024-07 17.02903 77.29032 1003.903 12.06452
## 2024-10 18.04839 73.06452 1005.161 12.32258
## 2024-12 17.57419 75.29032 1003.484 12.25806
#----------------------------------#
# 4- The code for task 4 goes here
# Calculate averages for each variable by city
average_city_data <- aggregate(cbind(Temperature, Humidity, Pressure, WindSpeed) ~ Location, data = weather, FUN = mean)
# Convert to matrix, excluding the Location column
average_city_matrix <- as.matrix(average_city_data[, -1]) # Exclude the Location column
# Set row names to the city names for better readability
rownames(average_city_matrix) <- average_city_data$Location
# Print the matrix
print(average_city_matrix)## Temperature Humidity Pressure WindSpeed
## Canberra 19.07500 74.61538 1003.788 12.86538
## Melbourne 18.23269 78.32692 1005.077 11.82692
## Sydney 16.85098 75.98039 1003.961 12.62745
#----------------------------------#
# 5- The code for task 5 goes here
# Calculate average for each variable by month and city
average_array_data <- aggregate(cbind(Temperature, Humidity, Pressure, WindSpeed, UVIndex) ~ Month + Location, data = weather, FUN = mean)
# Convert to array
# The dimensions of the array will be: months x variables x cities
average_array <- array(as.matrix(average_array_data[, -c(1, 2)]),
dim = c(length(unique(average_array_data$Month)),
ncol(average_array_data) - 2,
length(unique(average_array_data$Location)))
)
# Set dimnames for better readability
dimnames(average_array) <- list(unique(average_array_data$Month),
c("Temperature", "Humidity", "Pressure", "WindSpeed", "UVIndex"),
unique(average_array_data$Location))
# Print the array
print(average_array)## , , Canberra
##
## Temperature Humidity Pressure WindSpeed UVIndex
## 2024-03 21.15455 17.53000 17.69000 76.45455 81.20000
## 2024-05 17.99000 18.51818 19.87000 71.50000 80.09091
## 2024-07 18.18000 17.77000 15.30909 76.60000 78.10000
## 2024-10 18.31818 16.29000 19.51000 74.72727 78.30000
## 2024-12 19.60000 20.77273 12.03000 73.60000 74.18182
##
## , , Melbourne
##
## Temperature Humidity Pressure WindSpeed UVIndex
## 2024-03 84.00000 1002.636 1002.500 1002.000 11.54545
## 2024-05 74.40000 1005.400 1009.182 1004.500 15.20000
## 2024-07 77.18182 1002.800 1005.000 1003.909 11.80000
## 2024-10 66.00000 1005.818 1004.600 1005.000 12.90909
## 2024-12 78.20000 1002.200 1003.818 1004.400 13.00000
##
## , , Sydney
##
## Temperature Humidity Pressure WindSpeed UVIndex
## 2024-03 12.60000 15.00000 7.909091 5.100000 5.900000
## 2024-05 10.36364 12.30000 8.300000 5.272727 8.400000
## 2024-07 12.50000 11.90909 5.200000 4.500000 6.090909
## 2024-10 11.70000 12.30000 6.090909 6.700000 7.400000
## 2024-12 12.09091 11.70000 7.000000 4.909091 5.800000
#----------------------------------#
# 6- The code for task 6 goes here
# Task 1: Calculate average temperatures for each month
weather$Date <- as.Date(weather$Date)
weather$Month <- format(weather$Date, "%Y-%m")
average_temperatures <- aggregate(Temperature ~ Month, data = weather, FUN = mean)
average_temperature_vector <- average_temperatures$Temperature
# Task 3: Calculate average for each variable by month
average_monthly_data <- aggregate(cbind(Temperature, Humidity, Pressure, WindSpeed) ~ Month, data = weather, FUN = mean)
average_monthly_matrix <- as.matrix(average_monthly_data[, -1]) # Exclude the Month column
# Transpose the average_monthly_matrix
transposed_matrix <- t(average_monthly_matrix)
# Perform matrix multiplication
result <- transposed_matrix %*% average_temperature_vector
# Print the result
print(result)## [,1]
## Temperature 1633.410
## Humidity 6893.254
## Pressure 90690.299
## WindSpeed 1124.263
#----------------------------------#
# 7- The code for task 7 goes here
# Custom function to determine if daily temperature is above or below the monthly average
temperature_status <- function(temperatures, monthly_avg) {
return(ifelse(temperatures > monthly_avg, "a", "b"))
}
# Calculate monthly averages
monthly_avg_data <- aggregate(Temperature ~ Month, data = weather, FUN = mean)
# Initialize a new column in the data frame for status
weather$status <- NA
# Loop through each month to assign status based on average temperature
for (month in unique(weather$Month)) {
# Get the monthly average temperature for the current month
monthly_avg <- monthly_avg_data$Temperature[monthly_avg_data$Month == month]
# Apply the function to the temperatures for the current month
weather$status[weather$Month == month] <- temperature_status(weather$Temperature[weather$Month == month], monthly_avg)
}
# Print the updated data frame
print(head(weather))## Date Temperature Humidity Pressure WindSpeed WindDirection DewPoint
## 1 2024-03-01 26.6 52 1004 14 SE 12.6
## 2 2024-03-02 9.2 82 1007 19 S 11.7
## 3 2024-03-03 23.0 94 1002 9 E 11.7
## 4 2024-03-04 12.1 53 995 12 SE 11.4
## 5 2024-03-05 15.5 82 995 12 S 11.5
## 6 2024-03-06 24.8 95 1005 20 SE 14.8
## CloudCover Precipitation Visibility UVIndex Condition Location Month
## 1 32 7.2 20 12 Rainy Canberra 2024-03
## 2 27 12.9 15 8 Sunny Melbourne 2024-03
## 3 57 9.9 7 4 Sunny Sydney 2024-03
## 4 30 13.1 12 5 Snowy Canberra 2024-03
## 5 97 9.5 16 2 Rainy Melbourne 2024-03
## 6 56 6.8 18 9 Partly Cloudy Sydney 2024-03
## status
## 1 a
## 2 b
## 3 a
## 4 b
## 5 b
## 6 a
Submission for part C: Looping and Conditional Statements
Please follow this structure:
#----------------------------------#
# 1- The code for task 1 goes here
# Initialize a vector to store average pressure readings
average_pressure <- numeric(length(unique(weather$Month)))
# Get unique months
unique_months <- unique(weather$Month)
# Loop through each month to calculate average pressure
for (i in seq_along(unique_months)) {
month <- unique_months[i]
# Calculate average pressure for the current month
avg_pressure <- mean(weather$Pressure[weather$Month == month], na.rm = TRUE)
# Store the result in the vector
average_pressure[i] <- avg_pressure
}
# Set names for the average pressure vector
names(average_pressure) <- unique_months
# Print the average pressure readings
print(average_pressure)## 2024-03 2024-05 2024-07 2024-10 2024-12
## 1002.387 1006.452 1003.903 1005.161 1003.484
#----------------------------------#
# 2- The code for task 2 goes here
# Filter the data for Sydney and check temperatures
days_above_25_sydney <- sum(weather$Temperature[weather$Location == "Sydney"] > 25)
# Display the result
print(days_above_25_sydney)## [1] 13
#----------------------------------#
# 3- The code for task 3 goes here
# Initialize variables to calculate the sum of humidity and count of days
humidity_sum <- 0
count_days <- 0
# Loop through the weather data
for (i in 1:nrow(weather)) {
if (weather$Location[i] == "Canberra" && weather$Temperature[i] < 21) {
humidity_sum <- humidity_sum + weather$Humidity[i] # Add humidity to the sum
count_days <- count_days + 1 # Increment the count
}
}
# Calculate the average humidity if count_days is greater than 0 to avoid division by zero
if (count_days > 0) {
average_humidity <- humidity_sum / count_days
} else {
average_humidity <- NA # No days found
}
# Print the result
print(paste("Average humidity for days below 21°C in Canberra:", average_humidity))## [1] "Average humidity for days below 21°C in Canberra: 77.1333333333333"
#----------------------------------#
## 4- The code for task 4 goes here
# Count the number of days with UV index above 7 in Canberra and Sydney
days_uv_above_7 <- sum(weather$UVIndex[weather$Location %in% c("Canberra", "Sydney")] > 7)
# Print the result
print(paste("Number of days with UV index above 7 in Canberra and Sydney:", days_uv_above_7))## [1] "Number of days with UV index above 7 in Canberra and Sydney: 48"
#----------------------------------#
# 5- The code for task 5 goes here
#Get the unique months and cities
unique_months <- unique(weather$Month)
cities <- c("Canberra", "Melbourne", "Sydney")
#Initialize a matrix to store total precipitation
precipitation_matrix <- matrix(0, nrow = length(unique_months), ncol = length(cities))
#Set row names as months and column names as cities
rownames(precipitation_matrix) <- unique_months
colnames(precipitation_matrix) <- cities
#Loop through each month and each city to calculate total precipitation
for (i in 1:length(unique_months)) {
for (j in 1:length(cities)) {
# Filter for the current month and city
monthly_data <- weather[weather$Month == unique_months[i] & weather$Location == cities[j], ]
# Calculate the total precipitation for that month and city
total_precipitation <- sum(monthly_data$Precipitation, na.rm = TRUE)
# Store the result in the matrix
precipitation_matrix[i, j] <- total_precipitation
}
}
#Print the precipitation matrix
print(precipitation_matrix)## Canberra Melbourne Sydney
## 2024-03 71.1 70.4 91.1
## 2024-05 70.6 95.9 59.9
## 2024-07 46.4 90.8 94.8
## 2024-10 93.2 58.7 82.4
## 2024-12 71.3 73.0 78.7
## Submission for part D: Data Frame Manipulation
#----------------------------------#
# 1- The code for task 1 goes here
# Load necessary library
library(readr)
library(dplyr)##
## 载入程序包:'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Specify the file names (assuming they're in the working directory)
file_names <- c("201808.csv", "201809.csv", "201810.csv", "201811.csv", "201812.csv")
# Import the CSV files into a list of data frames, skipping the first 7 rows
weather_data_list <- lapply(file_names, function(file) {
read_csv(file, skip = 7) # Read each file and skip the first 7 rows
})## Rows: 31 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Date, Direction of maximum wind gust, 9am wind direction, 9am win...
## dbl (13): Minimum temperature, Maximum temperature, Rainfall (mm), Speed of...
## lgl (2): Evaporation (mm), Sunshine (hours)
## time (1): Time of maximum wind gust
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 30 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Date, Direction of maximum wind gust, 9am wind direction, 9am win...
## dbl (13): Minimum temperature, Maximum temperature, Rainfall (mm), Speed of...
## lgl (2): Evaporation (mm), Sunshine (hours)
## time (1): Time of maximum wind gust
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 31 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Date, Direction of maximum wind gust, 9am wind direction, 9am win...
## dbl (13): Minimum temperature, Maximum temperature, Rainfall (mm), Speed of...
## lgl (2): Evaporation (mm), Sunshine (hours)
## time (1): Time of maximum wind gust
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 30 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Date, Direction of maximum wind gust, 9am wind direction, 9am win...
## dbl (13): Minimum temperature, Maximum temperature, Rainfall (mm), Speed of...
## lgl (2): Evaporation (mm), Sunshine (hours)
## time (1): Time of maximum wind gust
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 31 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): Date, Direction of maximum wind gust, 9am wind direction, 9am win...
## dbl (13): Minimum temperature, Maximum temperature, Rainfall (mm), Speed of...
## lgl (2): Evaporation (mm), Sunshine (hours)
## time (1): Time of maximum wind gust
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Combine all data frames into one
combined_weather_data <- bind_rows(weather_data_list)
# Display the combined data frame
print(combined_weather_data)## # A tibble: 153 × 21
## Date `Minimum temperature` `Maximum temperature` `Rainfall (mm)`
## <chr> <dbl> <dbl> <dbl>
## 1 1/08/2018 7.6 15.4 0
## 2 2/08/2018 -3.8 14.3 0
## 3 3/08/2018 -3.6 19.5 0
## 4 4/08/2018 3.7 12.8 13.8
## 5 5/08/2018 -1 15 0
## 6 6/08/2018 1.2 13.7 0
## 7 7/08/2018 2.4 9.7 6.6
## 8 8/08/2018 2.6 12.1 0
## 9 9/08/2018 1.6 13.7 0
## 10 10/08/2018 -2.5 15.6 0.2
## # ℹ 143 more rows
## # ℹ 17 more variables: `Evaporation (mm)` <lgl>, `Sunshine (hours)` <lgl>,
## # `Direction of maximum wind gust` <chr>,
## # `Speed of maximum wind gust (km/h)` <dbl>,
## # `Time of maximum wind gust` <time>, `9am Temperature` <dbl>,
## # `9am relative humidity (%)` <dbl>, `9am cloud amount (oktas)` <dbl>,
## # `9am wind direction` <chr>, `9am wind speed (km/h)` <chr>, …
#----------------------------------#
# 2- The code for task 2 goes here
# Check the dimensions of the combined data frame
dimensions <- dim(combined_weather_data)
# Display the dimensions
print(dimensions)## [1] 153 21
#----------------------------------#
# 3- The code for task 3 goes here
# Loop through each column in the combined data frame
for (col_name in names(combined_weather_data)) {
cat("Column:", col_name, "\n") # Print the column name
cat("Structure:\n")
str(combined_weather_data[[col_name]]) # Check the structure of the column
cat("Summary:\n")
print(summary(combined_weather_data[[col_name]])) # Get the summary of the column
cat("\n") # Add a line break for better readability
}## Column: Date
## Structure:
## chr [1:153] "1/08/2018" "2/08/2018" "3/08/2018" "4/08/2018" "5/08/2018" ...
## Summary:
## Length Class Mode
## 153 character character
##
## Column: Minimum temperature
## Structure:
## num [1:153] 7.6 -3.8 -3.6 3.7 -1 1.2 2.4 2.6 1.6 -2.5 ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -6.400 2.200 6.500 6.829 11.400 17.800
##
## Column: Maximum temperature
## Structure:
## num [1:153] 15.4 14.3 19.5 12.8 15 13.7 9.7 12.1 13.7 15.6 ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.70 15.60 21.90 21.69 26.40 36.80
##
## Column: Rainfall (mm)
## Structure:
## num [1:153] 0 0 0 13.8 0 0 6.6 0 0 0.2 ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.000 0.000 1.766 0.200 33.200
##
## Column: Evaporation (mm)
## Structure:
## logi [1:153] NA NA NA NA NA NA ...
## Summary:
## Mode NA's
## logical 153
##
## Column: Sunshine (hours)
## Structure:
## logi [1:153] NA NA NA NA NA NA ...
## Summary:
## Mode NA's
## logical 153
##
## Column: Direction of maximum wind gust
## Structure:
## chr [1:153] "NW" "NNW" "NW" "NNW" "NW" "NW" "WNW" "WNW" "NNW" "NNW" "NW" ...
## Summary:
## Length Class Mode
## 153 character character
##
## Column: Speed of maximum wind gust (km/h)
## Structure:
## num [1:153] 54 26 72 54 43 61 61 70 30 43 ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 22.00 37.00 44.00 45.23 52.00 81.00
##
## Column: Time of maximum wind gust
## Structure:
## 'hms' num [1:153] 01:38:00 12:23:00 20:56:00 04:50:00 ...
## - attr(*, "units")= chr "secs"
## Summary:
## Length Class1 Class2 Mode
## 153 hms difftime numeric
##
## Column: 9am Temperature
## Structure:
## num [1:153] 10.9 3.3 3.7 7.8 5 9.2 4.9 6.5 6.3 0.5 ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.50 9.60 13.60 13.43 17.00 29.50
##
## Column: 9am relative humidity (%)
## Structure:
## num [1:153] 54 87 84 77 85 72 90 65 94 99 ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 27.00 54.00 65.00 64.05 72.00 99.00
##
## Column: 9am cloud amount (oktas)
## Structure:
## num [1:153] NA 1 NA 8 NA 8 5 8 7 8 ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 3.250 8.000 5.977 8.000 8.000 67
##
## Column: 9am wind direction
## Structure:
## chr [1:153] "WNW" NA NA "NNW" NA "N" "NW" "NW" "SSW" NA "SE" "W" "NNW" ...
## Summary:
## Length Class Mode
## 153 character character
##
## Column: 9am wind speed (km/h)
## Structure:
## chr [1:153] "22" "Calm" "Calm" "24" "Calm" "28" "30" "30" "9" "Calm" "7" ...
## Summary:
## Length Class Mode
## 153 character character
##
## Column: 9am MSL pressure (hPa)
## Structure:
## num [1:153] 1019 1027 1018 1017 1018 ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 997.7 1013.9 1018.2 1017.4 1021.8 1031.2
##
## Column: 3pm Temperature
## Structure:
## num [1:153] 14.7 13.7 17.9 11.6 14.6 10.9 8.8 9.9 12.9 14.9 ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.40 14.10 19.80 19.97 24.80 36.20
##
## Column: 3pm relative humidity (%)
## Structure:
## num [1:153] 32 43 35 52 48 44 54 60 50 42 ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.00 27.00 37.00 39.49 48.00 99.00
##
## Column: 3pm cloud amount (oktas)
## Structure:
## num [1:153] NA NA 4 5 NA 8 NA 8 NA NA ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 2.000 6.000 5.239 8.000 8.000 61
##
## Column: 3pm wind direction
## Structure:
## chr [1:153] "NW" "NE" "NNW" "WNW" "NW" "WNW" "WNW" "WNW" "N" "NW" "NNW" ...
## Summary:
## Length Class Mode
## 153 character character
##
## Column: 3pm wind speed (km/h)
## Structure:
## num [1:153] 19 9 39 22 24 31 44 35 15 28 ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.00 15.00 22.00 21.84 28.00 44.00
##
## Column: 3pm MSL pressure (hPa)
## Structure:
## num [1:153] 1019 1022 1010 1016 1012 ...
## Summary:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 996.7 1010.7 1015.1 1014.4 1018.6 1027.5
#----------------------------------#
# 4- The code for task 4 goes here
# Load dplyr if not already loaded
library(dplyr)
# Remove columns that contain only NA values
cleaned_data <- combined_weather_data %>%
select(where(~ !all(is.na(.))))
# Check the dimensions of the cleaned data frame
dim(cleaned_data)## [1] 153 19
## # A tibble: 153 × 19
## Date `Minimum temperature` `Maximum temperature` `Rainfall (mm)`
## <chr> <dbl> <dbl> <dbl>
## 1 1/08/2018 7.6 15.4 0
## 2 2/08/2018 -3.8 14.3 0
## 3 3/08/2018 -3.6 19.5 0
## 4 4/08/2018 3.7 12.8 13.8
## 5 5/08/2018 -1 15 0
## 6 6/08/2018 1.2 13.7 0
## 7 7/08/2018 2.4 9.7 6.6
## 8 8/08/2018 2.6 12.1 0
## 9 9/08/2018 1.6 13.7 0
## 10 10/08/2018 -2.5 15.6 0.2
## # ℹ 143 more rows
## # ℹ 15 more variables: `Direction of maximum wind gust` <chr>,
## # `Speed of maximum wind gust (km/h)` <dbl>,
## # `Time of maximum wind gust` <time>, `9am Temperature` <dbl>,
## # `9am relative humidity (%)` <dbl>, `9am cloud amount (oktas)` <dbl>,
## # `9am wind direction` <chr>, `9am wind speed (km/h)` <chr>,
## # `9am MSL pressure (hPa)` <dbl>, `3pm Temperature` <dbl>, …
#----------------------------------#
# 5- The code for task 5 goes here
# Change column names to replace spaces or dots with underscores
colnames(cleaned_data) <- gsub("[ .]", "_", colnames(cleaned_data))
# Display the updated column names
print(colnames(cleaned_data))## [1] "Date" "Minimum_temperature"
## [3] "Maximum_temperature" "Rainfall_(mm)"
## [5] "Direction_of_maximum_wind_gust" "Speed_of_maximum_wind_gust_(km/h)"
## [7] "Time_of_maximum_wind_gust" "9am_Temperature"
## [9] "9am_relative_humidity_(%)" "9am_cloud_amount_(oktas)"
## [11] "9am_wind_direction" "9am_wind_speed_(km/h)"
## [13] "9am_MSL_pressure_(hPa)" "3pm_Temperature"
## [15] "3pm_relative_humidity_(%)" "3pm_cloud_amount_(oktas)"
## [17] "3pm_wind_direction" "3pm_wind_speed_(km/h)"
## [19] "3pm_MSL_pressure_(hPa)"
#----------------------------------#
# 6- The code for task 6 goes here
#----------------------------------#
# 7- The code for task 7 goes here
# Load the readr library if not already loaded
library(readr)
# Define the output file path
output_file_path <- "C:/Users/pengw/OneDrive/Desktop/introdution to data sicence/data_for_part_D/data_for_part_D.csv"
# Save the cleaned data frame to a CSV file
write_csv(cleaned_data, output_file_path)
# Print a message to confirm saving
cat("Combined weather data saved to:", output_file_path, "\n")## Combined weather data saved to: C:/Users/pengw/OneDrive/Desktop/introdution to data sicence/data_for_part_D/data_for_part_D.csv
Overall Conclusion
Your overall reflection about this assignment goes here …