Assignment 1 - Solution


Data Description

The following code generates the required dataset for this assignment. It will create a data frame with 155 rows and 13 columns. The variables of this data frame are as following:

  • Date: Date of the reading (in YYYY-MM-DD format)
  • Temperature: Temperature reading in Celsius
  • Humidity: Humidity reading as a percentage
  • Pressure: Atmospheric pressure in millibars
  • WindSpeed: Wind speed in kilometers per hour
  • WindDirection: Wind direction
  • DewPoint: Dew point temperature in Celsius
  • CloudCover: Cloud cover as a percentage
  • Precipitation: Precipitation amount in millimeters
  • Visibility: Visibility distance in kilometers
  • UVIndex: UV index reading
  • condition: The global weather condition over the day
  • Location: The city of the recorded data
# Set the seed for reproducibility
set.seed(246)

# Create a sequence of dates for few months in 2024
date_March <- seq(from = as.Date("2024-03-01"), to = as.Date("2024-03-31"), by = 1)
date_May <- seq(from = as.Date("2024-05-01"), to = as.Date("2024-05-31"), by = 1)
date_July <- seq(from = as.Date("2024-07-01"), to = as.Date("2024-07-31"), by = 1)
date_October <- seq(from = as.Date("2024-10-01"), to = as.Date("2024-10-31"), by = 1)
date_December <- seq(from = as.Date("2024-12-01"), to = as.Date("2024-12-31"), by = 1)

dates = c(date_March, date_May, date_July, date_October, date_December)

# Determine the correct number of repetitions for Location
cities <- c("Canberra", "Melbourne", "Sydney")
Location <- rep(cities, length.out = length(dates))

# Create a data frame to store the weather data
weather <- data.frame(
  Date = dates,
  Temperature = round(runif(length(dates), 2, 37), 1),
  Humidity = sample(50:100, length(dates), replace = TRUE),
  Pressure = sample(995:1015, length(dates), replace = TRUE),
  WindSpeed = sample(5:20, length(dates), replace = TRUE),
  WindDirection = sample(c("N", "NE", "E", "SE", "S", "SW", "W", "NW"), length(dates), replace = TRUE),
  DewPoint = round(runif(length(dates), 10, 15), 1),
  CloudCover = sample(0:100, length(dates), replace = TRUE),
  Precipitation = round(runif(length(dates), 0, 15), 1),
  Visibility = sample(5:20, length(dates), replace = TRUE),
  UVIndex = sample(1:12, length(dates), replace = TRUE),
  Condition = sample(c("Sunny", "Partly Cloudy", "Rainy", "Snowy"), length(dates), replace = TRUE),
  Location = Location
)

Initial Section:

Submission for Part A: Data Understanding

Please follow this structure:


1- The data set description goes here

The dataset encompasses daily weather data for three Australian cities—Canberra, Melbourne, and Sydney—over five months during 2024. It has 13 variables and 155 rows, which address weather metrics that comprise temperature, humidity, wind speed, and cloud cover. This dataset follows atmospheric pressure, precipitation, and general weather conditions to create a detailed daily weather report.

For the analysis of seasonal trends, weather patterns, and variations in local climate, this data offers itself as a resource. Furthermore, it can enhance predictive modeling for weather forecasting, find correlations between variables (e.g., temperature and humidity), and elucidate environmental factors that impinge upon visibility and the UV index.


#----------------------------------#

# 2-  The code for task 2 goes here
# Set the seed for reproducibility
set.seed(246)

# Create a sequence of dates for few months in 2024
date_March <- seq(from = as.Date("2024-03-01"), to = as.Date("2024-03-31"), by = 1)
date_May <- seq(from = as.Date("2024-05-01"), to = as.Date("2024-05-31"), by = 1)
date_July <- seq(from = as.Date("2024-07-01"), to = as.Date("2024-07-31"), by = 1)
date_October <- seq(from = as.Date("2024-10-01"), to = as.Date("2024-10-31"), by = 1)
date_December <- seq(from = as.Date("2024-12-01"), to = as.Date("2024-12-31"), by = 1)

dates = c(date_March, date_May, date_July, date_October, date_December)

# Determine the correct number of repetitions for Location
cities <- c("Canberra", "Melbourne", "Sydney")
Location <- rep(cities, length.out = length(dates))

# Create a data frame to store the weather data
weather <- data.frame(
  Date = dates,
  Temperature = round(runif(length(dates), 2, 37), 1),
  Humidity = sample(50:100, length(dates), replace = TRUE),
  Pressure = sample(995:1015, length(dates), replace = TRUE),
  WindSpeed = sample(5:20, length(dates), replace = TRUE),
  WindDirection = sample(c("N", "NE", "E", "SE", "S", "SW", "W", "NW"), length(dates), replace = TRUE),
  DewPoint = round(runif(length(dates), 10, 15), 1),
  CloudCover = sample(0:100, length(dates), replace = TRUE),
  Precipitation = round(runif(length(dates), 0, 15), 1),
  Visibility = sample(5:20, length(dates), replace = TRUE),
  UVIndex = sample(1:12, length(dates), replace = TRUE),
  Condition = sample(c("Sunny", "Partly Cloudy", "Rainy", "Snowy"), length(dates), replace = TRUE),
  Location = Location
)
# Set the seed for reproducibility
set.seed(246)

# Create a sequence of dates for few months in 2024
date_March <- seq(from = as.Date("2024-03-01"), to = as.Date("2024-03-31"), by = 1)
date_May <- seq(from = as.Date("2024-05-01"), to = as.Date("2024-05-31"), by = 1)
date_July <- seq(from = as.Date("2024-07-01"), to = as.Date("2024-07-31"), by = 1)
date_October <- seq(from = as.Date("2024-10-01"), to = as.Date("2024-10-31"), by = 1)
date_December <- seq(from = as.Date("2024-12-01"), to = as.Date("2024-12-31"), by = 1)

dates = c(date_March, date_May, date_July, date_October, date_December)

# Determine the correct number of repetitions for Location
cities <- c("Canberra", "Melbourne", "Sydney")
Location <- rep(cities, length.out = length(dates))

# Create a data frame to store the weather data
weather <- data.frame(
  Date = dates,
  Temperature = round(runif(length(dates), 2, 37), 1),
  Humidity = sample(50:100, length(dates), replace = TRUE),
  Pressure = sample(995:1015, length(dates), replace = TRUE),
  WindSpeed = sample(5:20, length(dates), replace = TRUE),
  WindDirection = sample(c("N", "NE", "E", "SE", "S", "SW", "W", "NW"), length(dates), replace = TRUE),
  DewPoint = round(runif(length(dates), 10, 15), 1),
  CloudCover = sample(0:100, length(dates), replace = TRUE),
  Precipitation = round(runif(length(dates), 0, 15), 1),
  Visibility = sample(5:20, length(dates), replace = TRUE),
  UVIndex = sample(1:12, length(dates), replace = TRUE),
  Condition = sample(c("Sunny", "Partly Cloudy", "Rainy", "Snowy"), length(dates), replace = TRUE),
  Location = Location
)

# Summary of the weather dataset
summary(weather)
##       Date             Temperature       Humidity         Pressure   
##  Min.   :2024-03-01   Min.   : 2.00   Min.   : 50.00   Min.   : 995  
##  1st Qu.:2024-05-08   1st Qu.: 9.20   1st Qu.: 66.00   1st Qu.:1000  
##  Median :2024-07-16   Median :17.60   Median : 77.00   Median :1004  
##  Mean   :2024-07-28   Mean   :18.06   Mean   : 76.31   Mean   :1004  
##  3rd Qu.:2024-10-23   3rd Qu.:26.35   3rd Qu.: 88.00   3rd Qu.:1009  
##  Max.   :2024-12-31   Max.   :37.00   Max.   :100.00   Max.   :1015  
##    WindSpeed     WindDirection         DewPoint       CloudCover    
##  Min.   : 5.00   Length:155         Min.   :10.00   Min.   :  0.00  
##  1st Qu.: 9.00   Class :character   1st Qu.:11.00   1st Qu.: 24.50  
##  Median :13.00   Mode  :character   Median :12.30   Median : 52.00  
##  Mean   :12.44                      Mean   :12.44   Mean   : 51.24  
##  3rd Qu.:16.00                      3rd Qu.:13.65   3rd Qu.: 78.00  
##  Max.   :20.00                      Max.   :15.00   Max.   :100.00  
##  Precipitation      Visibility       UVIndex        Condition        
##  Min.   : 0.100   Min.   : 5.00   Min.   : 1.000   Length:155        
##  1st Qu.: 3.850   1st Qu.: 9.00   1st Qu.: 3.000   Class :character  
##  Median : 7.600   Median :13.00   Median : 6.000   Mode  :character  
##  Mean   : 7.408   Mean   :12.85   Mean   : 6.297                     
##  3rd Qu.:10.850   3rd Qu.:17.00   3rd Qu.: 9.500                     
##  Max.   :14.800   Max.   :20.00   Max.   :12.000                     
##    Location        
##  Length:155        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
#----------------------------------#

# 3-  The code for task 3 goes here

hist(weather$Temperature, 
     main = "Histogram of Temperature Readings", 
     xlab = "Temperature (°C)", 
     col = "blue", 
     border = "black")

#----------------------------------#

4- The reflection and notes for task 2 and 3 goes here

For task 2,The range of dates confirms even distribution across the selected months.

Temperature(Min/Max,Mean/Median,Quartiles)

Humidity

Pressure

Wind Speed/Directions

Dew Point, Cloud Cover, Precipitation, Visibility, UV Index: Each variable contributes to understanding specific weather characteristics.

Condition: Summary shows the frequency of each weather type, highlighting predominant conditions.

For task 3, The histogram of temperature readings will visually display the distribution of temperatures: Shape: The shape of the histogram (normal, skewed, bimodal, and so on) delivers understanding on the variations in temperature observed in the dataset. Bins: Selecting a bin width can affect the appearance of the histogram; picking narrower bins may show more detail and broader bins can viably smooth out variations. Outliers: Abnormal increases or deficits in the histogram might signal outliers or exceptional temperature events.


Submission for part B: Vector and Matrix Manipulation

Please follow this structure:

#----------------------------------#

# 1-  The code for task 1 goes here
# Ensure the Date column is in Date format
weather$Date <- as.Date(weather$Date)

# Extract the month and year from the Date
weather$Month <- format(weather$Date, "%Y-%m")

# Calculate average temperatures for each month
average_temperatures <- aggregate(Temperature ~ Month, data = weather, FUN = mean)

# Create a vector containing the average temperature readings
average_temperature_vector <- average_temperatures$Temperature

# Print the average temperature vector
print(average_temperature_vector)
## [1] 18.86774 18.78387 17.02903 18.04839 17.57419
#----------------------------------#

# 2-  The code for task 2 goes here
# Calculate average humidity for each city by month
average_humidity <- aggregate(Humidity ~ Month + Location, data = weather, FUN = mean)

# Create separate vectors for each city
average_humidity_canberra <- average_humidity$Humidity[average_humidity$Location == "Canberra"]
average_humidity_melbourne <- average_humidity$Humidity[average_humidity$Location == "Melbourne"]
average_humidity_sydney <- average_humidity$Humidity[average_humidity$Location == "Sydney"]

# Print the vectors
print(average_humidity_canberra)
## [1] 76.45455 71.50000 76.60000 74.72727 73.60000
print(average_humidity_melbourne)
## [1] 81.20000 80.09091 78.10000 78.30000 74.18182
print(average_humidity_sydney)
## [1] 84.00000 74.40000 77.18182 66.00000 78.20000
#----------------------------------#

# 3-  The code for task 3 goes here
# Calculate average for each variable by month
average_monthly_data <- aggregate(cbind(Temperature, Humidity, Pressure, WindSpeed) ~ Month, data = weather, FUN = mean)

# Convert to matrix, excluding the Month column
average_monthly_matrix <- as.matrix(average_monthly_data[, -1])  # Exclude the Month column

# Set row names to the months for better readability
rownames(average_monthly_matrix) <- average_monthly_data$Month

# Print the matrix
print(average_monthly_matrix)
##         Temperature Humidity Pressure WindSpeed
## 2024-03    18.86774 80.41935 1002.387  13.00000
## 2024-05    18.78387 75.48387 1006.452  12.54839
## 2024-07    17.02903 77.29032 1003.903  12.06452
## 2024-10    18.04839 73.06452 1005.161  12.32258
## 2024-12    17.57419 75.29032 1003.484  12.25806
#----------------------------------#

# 4-  The code for task 4 goes here
# Calculate averages for each variable by city
average_city_data <- aggregate(cbind(Temperature, Humidity, Pressure, WindSpeed) ~ Location, data = weather, FUN = mean)

# Convert to matrix, excluding the Location column
average_city_matrix <- as.matrix(average_city_data[, -1])  # Exclude the Location column

# Set row names to the city names for better readability
rownames(average_city_matrix) <- average_city_data$Location

# Print the matrix
print(average_city_matrix)
##           Temperature Humidity Pressure WindSpeed
## Canberra     19.07500 74.61538 1003.788  12.86538
## Melbourne    18.23269 78.32692 1005.077  11.82692
## Sydney       16.85098 75.98039 1003.961  12.62745
#----------------------------------#

# 5-  The code for task 5 goes here
# Calculate average for each variable by month and city
average_array_data <- aggregate(cbind(Temperature, Humidity, Pressure, WindSpeed, UVIndex) ~ Month + Location, data = weather, FUN = mean)

# Convert to array
# The dimensions of the array will be: months x variables x cities
average_array <- array(as.matrix(average_array_data[, -c(1, 2)]), 
                       dim = c(length(unique(average_array_data$Month)), 
                                ncol(average_array_data) - 2, 
                                length(unique(average_array_data$Location)))
                       )

# Set dimnames for better readability
dimnames(average_array) <- list(unique(average_array_data$Month), 
                                 c("Temperature", "Humidity", "Pressure", "WindSpeed", "UVIndex"), 
                                 unique(average_array_data$Location))

# Print the array
print(average_array)
## , , Canberra
## 
##         Temperature Humidity Pressure WindSpeed  UVIndex
## 2024-03    21.15455 17.53000 17.69000  76.45455 81.20000
## 2024-05    17.99000 18.51818 19.87000  71.50000 80.09091
## 2024-07    18.18000 17.77000 15.30909  76.60000 78.10000
## 2024-10    18.31818 16.29000 19.51000  74.72727 78.30000
## 2024-12    19.60000 20.77273 12.03000  73.60000 74.18182
## 
## , , Melbourne
## 
##         Temperature Humidity Pressure WindSpeed  UVIndex
## 2024-03    84.00000 1002.636 1002.500  1002.000 11.54545
## 2024-05    74.40000 1005.400 1009.182  1004.500 15.20000
## 2024-07    77.18182 1002.800 1005.000  1003.909 11.80000
## 2024-10    66.00000 1005.818 1004.600  1005.000 12.90909
## 2024-12    78.20000 1002.200 1003.818  1004.400 13.00000
## 
## , , Sydney
## 
##         Temperature Humidity Pressure WindSpeed  UVIndex
## 2024-03    12.60000 15.00000 7.909091  5.100000 5.900000
## 2024-05    10.36364 12.30000 8.300000  5.272727 8.400000
## 2024-07    12.50000 11.90909 5.200000  4.500000 6.090909
## 2024-10    11.70000 12.30000 6.090909  6.700000 7.400000
## 2024-12    12.09091 11.70000 7.000000  4.909091 5.800000
#----------------------------------#

# 6-  The code for task 6 goes here
# Task 1: Calculate average temperatures for each month
weather$Date <- as.Date(weather$Date)
weather$Month <- format(weather$Date, "%Y-%m")
average_temperatures <- aggregate(Temperature ~ Month, data = weather, FUN = mean)
average_temperature_vector <- average_temperatures$Temperature

# Task 3: Calculate average for each variable by month
average_monthly_data <- aggregate(cbind(Temperature, Humidity, Pressure, WindSpeed) ~ Month, data = weather, FUN = mean)
average_monthly_matrix <- as.matrix(average_monthly_data[, -1])  # Exclude the Month column

# Transpose the average_monthly_matrix
transposed_matrix <- t(average_monthly_matrix)

# Perform matrix multiplication
result <- transposed_matrix %*% average_temperature_vector

# Print the result
print(result)
##                  [,1]
## Temperature  1633.410
## Humidity     6893.254
## Pressure    90690.299
## WindSpeed    1124.263
#----------------------------------#

# 7-  The code for task 7 goes here
# Custom function to determine if daily temperature is above or below the monthly average
temperature_status <- function(temperatures, monthly_avg) {
  return(ifelse(temperatures > monthly_avg, "a", "b"))
}
# Calculate monthly averages
monthly_avg_data <- aggregate(Temperature ~ Month, data = weather, FUN = mean)
# Initialize a new column in the data frame for status
weather$status <- NA

# Loop through each month to assign status based on average temperature
for (month in unique(weather$Month)) {
  # Get the monthly average temperature for the current month
  monthly_avg <- monthly_avg_data$Temperature[monthly_avg_data$Month == month]
  
  # Apply the function to the temperatures for the current month
  weather$status[weather$Month == month] <- temperature_status(weather$Temperature[weather$Month == month], monthly_avg)
}
# Print the updated data frame
print(head(weather))
##         Date Temperature Humidity Pressure WindSpeed WindDirection DewPoint
## 1 2024-03-01        26.6       52     1004        14            SE     12.6
## 2 2024-03-02         9.2       82     1007        19             S     11.7
## 3 2024-03-03        23.0       94     1002         9             E     11.7
## 4 2024-03-04        12.1       53      995        12            SE     11.4
## 5 2024-03-05        15.5       82      995        12             S     11.5
## 6 2024-03-06        24.8       95     1005        20            SE     14.8
##   CloudCover Precipitation Visibility UVIndex     Condition  Location   Month
## 1         32           7.2         20      12         Rainy  Canberra 2024-03
## 2         27          12.9         15       8         Sunny Melbourne 2024-03
## 3         57           9.9          7       4         Sunny    Sydney 2024-03
## 4         30          13.1         12       5         Snowy  Canberra 2024-03
## 5         97           9.5         16       2         Rainy Melbourne 2024-03
## 6         56           6.8         18       9 Partly Cloudy    Sydney 2024-03
##   status
## 1      a
## 2      b
## 3      a
## 4      b
## 5      b
## 6      a
#----------------------------------#

Submission for part C: Looping and Conditional Statements

Please follow this structure:

#----------------------------------#

# 1-  The code for task 1 goes here
# Initialize a vector to store average pressure readings
average_pressure <- numeric(length(unique(weather$Month)))

# Get unique months
unique_months <- unique(weather$Month)

# Loop through each month to calculate average pressure
for (i in seq_along(unique_months)) {
  month <- unique_months[i]
  
  # Calculate average pressure for the current month
  avg_pressure <- mean(weather$Pressure[weather$Month == month], na.rm = TRUE)
  
  # Store the result in the vector
  average_pressure[i] <- avg_pressure
}

# Set names for the average pressure vector
names(average_pressure) <- unique_months

# Print the average pressure readings
print(average_pressure)
##  2024-03  2024-05  2024-07  2024-10  2024-12 
## 1002.387 1006.452 1003.903 1005.161 1003.484
#----------------------------------#

# 2-  The code for task 2 goes here

# Filter the data for Sydney and check temperatures
days_above_25_sydney <- sum(weather$Temperature[weather$Location == "Sydney"] > 25)

# Display the result
print(days_above_25_sydney)
## [1] 13
#----------------------------------#

    # 3-  The code for task 3 goes here

# Initialize variables to calculate the sum of humidity and count of days
humidity_sum <- 0
count_days <- 0

# Loop through the weather data
for (i in 1:nrow(weather)) {
    if (weather$Location[i] == "Canberra" && weather$Temperature[i] < 21) {
        humidity_sum <- humidity_sum + weather$Humidity[i]  # Add humidity to the sum
        count_days <- count_days + 1  # Increment the count
    }
}

# Calculate the average humidity if count_days is greater than 0 to avoid division by zero
if (count_days > 0) {
    average_humidity <- humidity_sum / count_days
} else {
    average_humidity <- NA  # No days found
}

# Print the result
print(paste("Average humidity for days below 21°C in Canberra:", average_humidity))
## [1] "Average humidity for days below 21°C in Canberra: 77.1333333333333"
#----------------------------------#

    ## 4-  The code for task 4 goes here
# Count the number of days with UV index above 7 in Canberra and Sydney
days_uv_above_7 <- sum(weather$UVIndex[weather$Location %in% c("Canberra", "Sydney")] > 7)

# Print the result
print(paste("Number of days with UV index above 7 in Canberra and Sydney:", days_uv_above_7))
## [1] "Number of days with UV index above 7 in Canberra and Sydney: 48"
#----------------------------------#

    # 5-  The code for task 5 goes here

    #Get the unique months and cities
    unique_months <- unique(weather$Month)
    cities <- c("Canberra", "Melbourne", "Sydney")

    #Initialize a matrix to store total precipitation
    precipitation_matrix <- matrix(0, nrow = length(unique_months), ncol =                   length(cities))

    #Set row names as months and column names as cities
    rownames(precipitation_matrix) <- unique_months
    colnames(precipitation_matrix) <- cities

    #Loop through each month and each city to calculate total precipitation
    for (i in 1:length(unique_months)) {
    for (j in 1:length(cities)) {
    # Filter for the current month and city
    monthly_data <- weather[weather$Month == unique_months[i] & weather$Location ==         cities[j], ]
    
    # Calculate the total precipitation for that month and city
    total_precipitation <- sum(monthly_data$Precipitation, na.rm = TRUE)
    
    # Store the result in the matrix
    precipitation_matrix[i, j] <- total_precipitation
  }
}

    #Print the precipitation matrix
print(precipitation_matrix)
##         Canberra Melbourne Sydney
## 2024-03     71.1      70.4   91.1
## 2024-05     70.6      95.9   59.9
## 2024-07     46.4      90.8   94.8
## 2024-10     93.2      58.7   82.4
## 2024-12     71.3      73.0   78.7
#----------------------------------#
## Submission for part D: Data Frame Manipulation



#----------------------------------#

# 1-  The code for task 1 goes here

# Load necessary library
library(readr)
library(dplyr)
## 
## 载入程序包:'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Specify the file names (assuming they're in the working directory)
file_names <- c("201808.csv", "201809.csv", "201810.csv", "201811.csv", "201812.csv")

# Import the CSV files into a list of data frames, skipping the first 7 rows
weather_data_list <- lapply(file_names, function(file) {
  read_csv(file, skip = 7)  # Read each file and skip the first 7 rows
})
## Rows: 31 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (5): Date, Direction of maximum wind gust, 9am wind direction, 9am win...
## dbl  (13): Minimum temperature, Maximum temperature, Rainfall (mm), Speed of...
## lgl   (2): Evaporation (mm), Sunshine (hours)
## time  (1): Time of maximum wind gust
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 30 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (5): Date, Direction of maximum wind gust, 9am wind direction, 9am win...
## dbl  (13): Minimum temperature, Maximum temperature, Rainfall (mm), Speed of...
## lgl   (2): Evaporation (mm), Sunshine (hours)
## time  (1): Time of maximum wind gust
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 31 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (5): Date, Direction of maximum wind gust, 9am wind direction, 9am win...
## dbl  (13): Minimum temperature, Maximum temperature, Rainfall (mm), Speed of...
## lgl   (2): Evaporation (mm), Sunshine (hours)
## time  (1): Time of maximum wind gust
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 30 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (5): Date, Direction of maximum wind gust, 9am wind direction, 9am win...
## dbl  (13): Minimum temperature, Maximum temperature, Rainfall (mm), Speed of...
## lgl   (2): Evaporation (mm), Sunshine (hours)
## time  (1): Time of maximum wind gust
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 31 Columns: 21
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (5): Date, Direction of maximum wind gust, 9am wind direction, 9am win...
## dbl  (13): Minimum temperature, Maximum temperature, Rainfall (mm), Speed of...
## lgl   (2): Evaporation (mm), Sunshine (hours)
## time  (1): Time of maximum wind gust
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Combine all data frames into one
combined_weather_data <- bind_rows(weather_data_list)

# Display the combined data frame
print(combined_weather_data)
## # A tibble: 153 × 21
##    Date       `Minimum temperature` `Maximum temperature` `Rainfall (mm)`
##    <chr>                      <dbl>                 <dbl>           <dbl>
##  1 1/08/2018                    7.6                  15.4             0  
##  2 2/08/2018                   -3.8                  14.3             0  
##  3 3/08/2018                   -3.6                  19.5             0  
##  4 4/08/2018                    3.7                  12.8            13.8
##  5 5/08/2018                   -1                    15               0  
##  6 6/08/2018                    1.2                  13.7             0  
##  7 7/08/2018                    2.4                   9.7             6.6
##  8 8/08/2018                    2.6                  12.1             0  
##  9 9/08/2018                    1.6                  13.7             0  
## 10 10/08/2018                  -2.5                  15.6             0.2
## # ℹ 143 more rows
## # ℹ 17 more variables: `Evaporation (mm)` <lgl>, `Sunshine (hours)` <lgl>,
## #   `Direction of maximum wind gust` <chr>,
## #   `Speed of maximum wind gust (km/h)` <dbl>,
## #   `Time of maximum wind gust` <time>, `9am Temperature` <dbl>,
## #   `9am relative humidity (%)` <dbl>, `9am cloud amount (oktas)` <dbl>,
## #   `9am wind direction` <chr>, `9am wind speed (km/h)` <chr>, …
#----------------------------------#

# 2-  The code for task 2 goes here
# Check the dimensions of the combined data frame
dimensions <- dim(combined_weather_data)

# Display the dimensions
print(dimensions)
## [1] 153  21
#----------------------------------#

# 3-  The code for task 3 goes here
# Loop through each column in the combined data frame
for (col_name in names(combined_weather_data)) {
  cat("Column:", col_name, "\n")  # Print the column name
  cat("Structure:\n")
  str(combined_weather_data[[col_name]])  # Check the structure of the column
  cat("Summary:\n")
  print(summary(combined_weather_data[[col_name]]))  # Get the summary of the column
  cat("\n")  # Add a line break for better readability
}
## Column: Date 
## Structure:
##  chr [1:153] "1/08/2018" "2/08/2018" "3/08/2018" "4/08/2018" "5/08/2018" ...
## Summary:
##    Length     Class      Mode 
##       153 character character 
## 
## Column: Minimum temperature 
## Structure:
##  num [1:153] 7.6 -3.8 -3.6 3.7 -1 1.2 2.4 2.6 1.6 -2.5 ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -6.400   2.200   6.500   6.829  11.400  17.800 
## 
## Column: Maximum temperature 
## Structure:
##  num [1:153] 15.4 14.3 19.5 12.8 15 13.7 9.7 12.1 13.7 15.6 ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9.70   15.60   21.90   21.69   26.40   36.80 
## 
## Column: Rainfall (mm) 
## Structure:
##  num [1:153] 0 0 0 13.8 0 0 6.6 0 0 0.2 ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   1.766   0.200  33.200 
## 
## Column: Evaporation (mm) 
## Structure:
##  logi [1:153] NA NA NA NA NA NA ...
## Summary:
##    Mode    NA's 
## logical     153 
## 
## Column: Sunshine (hours) 
## Structure:
##  logi [1:153] NA NA NA NA NA NA ...
## Summary:
##    Mode    NA's 
## logical     153 
## 
## Column: Direction of maximum wind gust 
## Structure:
##  chr [1:153] "NW" "NNW" "NW" "NNW" "NW" "NW" "WNW" "WNW" "NNW" "NNW" "NW" ...
## Summary:
##    Length     Class      Mode 
##       153 character character 
## 
## Column: Speed of maximum wind gust (km/h) 
## Structure:
##  num [1:153] 54 26 72 54 43 61 61 70 30 43 ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   22.00   37.00   44.00   45.23   52.00   81.00 
## 
## Column: Time of maximum wind gust 
## Structure:
##  'hms' num [1:153] 01:38:00 12:23:00 20:56:00 04:50:00 ...
##  - attr(*, "units")= chr "secs"
## Summary:
##   Length   Class1   Class2     Mode 
##      153      hms difftime  numeric 
## 
## Column: 9am Temperature 
## Structure:
##  num [1:153] 10.9 3.3 3.7 7.8 5 9.2 4.9 6.5 6.3 0.5 ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.50    9.60   13.60   13.43   17.00   29.50 
## 
## Column: 9am relative humidity (%) 
## Structure:
##  num [1:153] 54 87 84 77 85 72 90 65 94 99 ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   27.00   54.00   65.00   64.05   72.00   99.00 
## 
## Column: 9am cloud amount (oktas) 
## Structure:
##  num [1:153] NA 1 NA 8 NA 8 5 8 7 8 ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   3.250   8.000   5.977   8.000   8.000      67 
## 
## Column: 9am wind direction 
## Structure:
##  chr [1:153] "WNW" NA NA "NNW" NA "N" "NW" "NW" "SSW" NA "SE" "W" "NNW" ...
## Summary:
##    Length     Class      Mode 
##       153 character character 
## 
## Column: 9am wind speed (km/h) 
## Structure:
##  chr [1:153] "22" "Calm" "Calm" "24" "Calm" "28" "30" "30" "9" "Calm" "7" ...
## Summary:
##    Length     Class      Mode 
##       153 character character 
## 
## Column: 9am MSL pressure (hPa) 
## Structure:
##  num [1:153] 1019 1027 1018 1017 1018 ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   997.7  1013.9  1018.2  1017.4  1021.8  1031.2 
## 
## Column: 3pm Temperature 
## Structure:
##  num [1:153] 14.7 13.7 17.9 11.6 14.6 10.9 8.8 9.9 12.9 14.9 ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.40   14.10   19.80   19.97   24.80   36.20 
## 
## Column: 3pm relative humidity (%) 
## Structure:
##  num [1:153] 32 43 35 52 48 44 54 60 50 42 ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.00   27.00   37.00   39.49   48.00   99.00 
## 
## Column: 3pm cloud amount (oktas) 
## Structure:
##  num [1:153] NA NA 4 5 NA 8 NA 8 NA NA ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   2.000   6.000   5.239   8.000   8.000      61 
## 
## Column: 3pm wind direction 
## Structure:
##  chr [1:153] "NW" "NE" "NNW" "WNW" "NW" "WNW" "WNW" "WNW" "N" "NW" "NNW" ...
## Summary:
##    Length     Class      Mode 
##       153 character character 
## 
## Column: 3pm wind speed (km/h) 
## Structure:
##  num [1:153] 19 9 39 22 24 31 44 35 15 28 ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00   15.00   22.00   21.84   28.00   44.00 
## 
## Column: 3pm MSL pressure (hPa) 
## Structure:
##  num [1:153] 1019 1022 1010 1016 1012 ...
## Summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   996.7  1010.7  1015.1  1014.4  1018.6  1027.5
#----------------------------------#

# 4-  The code for task 4 goes here
# Load dplyr if not already loaded
library(dplyr)

# Remove columns that contain only NA values
cleaned_data <- combined_weather_data %>%
  select(where(~ !all(is.na(.))))

# Check the dimensions of the cleaned data frame
dim(cleaned_data)
## [1] 153  19
# Display the cleaned data frame
print(cleaned_data)
## # A tibble: 153 × 19
##    Date       `Minimum temperature` `Maximum temperature` `Rainfall (mm)`
##    <chr>                      <dbl>                 <dbl>           <dbl>
##  1 1/08/2018                    7.6                  15.4             0  
##  2 2/08/2018                   -3.8                  14.3             0  
##  3 3/08/2018                   -3.6                  19.5             0  
##  4 4/08/2018                    3.7                  12.8            13.8
##  5 5/08/2018                   -1                    15               0  
##  6 6/08/2018                    1.2                  13.7             0  
##  7 7/08/2018                    2.4                   9.7             6.6
##  8 8/08/2018                    2.6                  12.1             0  
##  9 9/08/2018                    1.6                  13.7             0  
## 10 10/08/2018                  -2.5                  15.6             0.2
## # ℹ 143 more rows
## # ℹ 15 more variables: `Direction of maximum wind gust` <chr>,
## #   `Speed of maximum wind gust (km/h)` <dbl>,
## #   `Time of maximum wind gust` <time>, `9am Temperature` <dbl>,
## #   `9am relative humidity (%)` <dbl>, `9am cloud amount (oktas)` <dbl>,
## #   `9am wind direction` <chr>, `9am wind speed (km/h)` <chr>,
## #   `9am MSL pressure (hPa)` <dbl>, `3pm Temperature` <dbl>, …
#----------------------------------#
# 5-  The code for task 5 goes here

# Change column names to replace spaces or dots with underscores
colnames(cleaned_data) <- gsub("[ .]", "_", colnames(cleaned_data))

# Display the updated column names
print(colnames(cleaned_data))
##  [1] "Date"                              "Minimum_temperature"              
##  [3] "Maximum_temperature"               "Rainfall_(mm)"                    
##  [5] "Direction_of_maximum_wind_gust"    "Speed_of_maximum_wind_gust_(km/h)"
##  [7] "Time_of_maximum_wind_gust"         "9am_Temperature"                  
##  [9] "9am_relative_humidity_(%)"         "9am_cloud_amount_(oktas)"         
## [11] "9am_wind_direction"                "9am_wind_speed_(km/h)"            
## [13] "9am_MSL_pressure_(hPa)"            "3pm_Temperature"                  
## [15] "3pm_relative_humidity_(%)"         "3pm_cloud_amount_(oktas)"         
## [17] "3pm_wind_direction"                "3pm_wind_speed_(km/h)"            
## [19] "3pm_MSL_pressure_(hPa)"
#----------------------------------#


# 6-  The code for task 6 goes here





#----------------------------------#

# 7-  The code for task 7 goes here
# Load the readr library if not already loaded
library(readr)

# Define the output file path
output_file_path <- "C:/Users/pengw/OneDrive/Desktop/introdution to data sicence/data_for_part_D/data_for_part_D.csv"


# Save the cleaned data frame to a CSV file
write_csv(cleaned_data, output_file_path)

# Print a message to confirm saving
cat("Combined weather data saved to:", output_file_path, "\n")
## Combined weather data saved to: C:/Users/pengw/OneDrive/Desktop/introdution to data sicence/data_for_part_D/data_for_part_D.csv
#----------------------------------#

Overall Conclusion

Your overall reflection about this assignment goes here …