Question: What are the long-term trends and patterns in the water level of Lake Huron from 1875 to 1972, and is there any evidence of significant changes in water levels over this period?
In this final project I will uterlize the skills i have learned over the past 3 weeks.
I will do Data Exploration by reading data from a file and use the summary function to explore data from lake Huron dataset.
I will use Data wrangling i will first create a subset of the original data, change the names of the columns and replace some data in one column.
I will create 4 charts to display the data graphically
Conclusion: looking at the data on the charts, over the span of the 97 years water levels has fluctuate in a decreasing trend after hitting a maximum high in 1876. water levels were constantly below the median from the early 1900s to erly 1970s, hitting a all time low in 1964.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
water_data <- read.csv('https://vincentarelbundock.github.io/Rdatasets/csv/datasets/LakeHuron.csv')
summary(water_data)
## X time value
## Min. : 1.00 Min. :1875 Min. :576.0
## 1st Qu.:25.25 1st Qu.:1899 1st Qu.:578.1
## Median :49.50 Median :1924 Median :579.1
## Mean :49.50 Mean :1924 Mean :579.0
## 3rd Qu.:73.75 3rd Qu.:1948 3rd Qu.:579.9
## Max. :98.00 Max. :1972 Max. :581.9
head(data )
##
## 1 function (..., list = character(), package = NULL, lib.loc = NULL,
## 2 verbose = getOption("verbose"), envir = .GlobalEnv, overwrite = TRUE)
## 3 {
## 4 fileExt <- function(x) {
## 5 db <- grepl("\\\\.[^.]+\\\\.(gz|bz2|xz)$", x)
## 6 ans <- sub(".*\\\\.", "", x)
# Calculate the number of years water level was at its lowest and highest
lowest_water_level <- min(water_data$value)
highest_water_level <- max(water_data$value)
years_lowest <- water_data[water_data$value == lowest_water_level, ]$time
years_highest <- water_data[water_data$value == highest_water_level, ]$time
num_years_lowest <- length(unique(years_lowest))
num_years_highest <- length(unique(years_highest))
# Print the results
cat('Number of years water level was at its lowest:', num_years_lowest, "\n")
## Number of years water level was at its lowest: 1
cat("Number of years water level was at its highest:", num_years_highest, "\n")
## Number of years water level was at its highest: 1
# Plot scatter plot
ggplot(water_data, aes(x = time, y = value)) +
geom_point() +
labs(title = "Water Level Scatter Plot",
x = "Year",
y = "Water Level")
# Plot box plot
ggplot(water_data, aes(x = factor(time), y = value )) +
geom_boxplot() +
labs(title = "Water Level Box Plot",
x = "Year",
y = "Water Level")
# Plot histogram
ggplot(water_data, aes(x = value)) +
geom_histogram(bins = 20, fill = "green", color = "blue") +
labs(title = "Water Level Histogram",
x = "Water Level",
y = "Frequency")
# Create an interactive plot
plot <- plot_ly(water_data, x = ~time, y = ~value, type = 'scatter', mode = 'lines',
line = list(color = 'blue')) %>%
layout(title = "Water Level of Lake Huron (1875 - 1972)",
xaxis = list(title = "Year"),
yaxis = list(title = "Water Level"))
# Display the interactive plot
plot
water_level_subset <- water_data[ , c("time", "value")] # creating a subset
head(water_level_subset, n=5)
## time value
## 1 1875 580.38
## 2 1876 581.86
## 3 1877 580.97
## 4 1878 580.80
## 5 1879 579.79
water_level_ch <- water_level_subset
colnames(water_level_ch) <- c("Year", "Water_lvl")
head(water_level_ch, n=5)
## Year Water_lvl
## 1 1875 580.38
## 2 1876 581.86
## 3 1877 580.97
## 4 1878 580.80
## 5 1879 579.79
## Replacing values in water level column
my_modified_data <- water_level_ch %>%
mutate(Water_lvl = ifelse(Water_lvl >= 575.00 & Water_lvl <= 576.80, 575.96, Water_lvl))
summary(my_modified_data)
## Year Water_lvl
## Min. :1875 Min. :576.0
## 1st Qu.:1899 1st Qu.:578.1
## Median :1924 Median :579.1
## Mean :1924 Mean :579.0
## 3rd Qu.:1948 3rd Qu.:579.9
## Max. :1972 Max. :581.9
lowest_water_level2 <- min(my_modified_data$Water_lvl)
highest_water_level2 <- max(my_modified_data$Water_lvl)
years_lowest2 <- my_modified_data[my_modified_data$Water_lvl == lowest_water_level2, ]$Year
years_highest2 <- my_modified_data[my_modified_data$Water_lvl == highest_water_level2, ]$Year
num_years_lowest2 <- length(unique(years_lowest2))
num_years_highest2 <- length(unique(years_highest2))
# Print the results
cat('Number of years water level was at its lowest:', num_years_lowest2, "\n")
## Number of years water level was at its lowest: 5
cat("Number of years water level was at its highest:", num_years_highest2, "\n")
## Number of years water level was at its highest: 1
## ploting modified data on histogram
ggplot(my_modified_data, aes(x = Water_lvl)) +
geom_histogram(bins = 20, fill = "green", color = "blue") +
labs(title = "Water Level Histogram",
x = "Water Level",
y = "Frequency")
## Reading file from my github
mygithubfile <- 'https://raw.githubusercontent.com/MRobinson112/final_assignment/main/LakeHuron.csv'
water_data <- read.csv(mygithubfile)
head(water_data )
## X time value
## 1 1 1875 580.38
## 2 2 1876 581.86
## 3 3 1877 580.97
## 4 4 1878 580.80
## 5 5 1879 579.79
## 6 6 1880 580.39
You can also embed plots, for example:
Note that the echo = FALSE
parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.