Module07_Exercise_Part1: Sea Temperature and Salinity Analysis
Author
Cienna Kim
Introduction: Data Loading and Setup
library(ggplot2)# Load the dataset (Ensure Temperature.csv is in your working directory)temp <-read.csv("./data/Temperature.csv")# Check the structure and variable names of the datasetstr(temp)
Histogram of Salinity Values This section visualizes the overall distribution of salinity values across the entire dataset. The x-axis is mapped to salinity using aes(), and geom_histogram() is applied to generate the frequency distribution.
ggplot(temp, aes(x = Salinity)) +geom_histogram(binwidth =0.1, color ="navy", fill ="lightblue") +labs(title ="Histogram of Salinity Values", x ="Salinity", y ="Count") +theme_minimal()
Warning: Removed 798 rows containing non-finite outside the scale range
(`stat_bin()`).
Histograms of Salinity Values by Year and Month It is a question that divides the graph by year and month to compare the changes in the salt distribution over time.
The facet_wrap(~ Year) and facet_wrap(~ Month) functions are added to process surface divisions into subplots by year/month.
# Salinity histograms wrapped by Yearggplot(temp, aes(x = Salinity)) +geom_histogram(binwidth =0.1, color ="navy", fill ="salmon") +facet_wrap(~ Year) +labs(title ="Salinity Histograms by Year", x ="Salinity", y ="Count") +theme_minimal()
Warning: Removed 798 rows containing non-finite outside the scale range
(`stat_bin()`).
# Salinity histograms wrapped by Monthggplot(temp, aes(x = Salinity)) +geom_histogram(binwidth =0.1, color ="navy", fill ="seagreen") +facet_wrap(~ Month) +labs(title ="Salinity Histograms by Month", x ="Salinity", y ="Count") +theme_minimal()
Warning: Removed 798 rows containing non-finite outside the scale range
(`stat_bin()`).
Boxplots of Temperature Values by Station (Ordered by Median) This section creates a box-and-whisker plot to compare the temperature distributions across the 31 sampling stations. To satisfy the bonus criteria, the stations on the x-axis are reordered by their median temperatures using the reorder(station, temperature, median) function. The x-axis text is rotated 45 degrees (angle = 45) to enhance legibility and prevent the station names from overlapping.
ggplot(temp, aes(x =reorder(Station, Temperature, median), y = Temperature)) +geom_boxplot(fill ="orange", alpha =0.7) +labs(title ="Temperature Boxplots Ordered by Median per Station", x ="Station (Ordered from Low to High Median Temperature)", y ="Temperature") +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1))
Warning: Removed 927 rows containing non-finite outside the scale range
(`stat_boxplot()`).
Exporting the Boxplot Figure to a PNG File This is the process of exporting the last box plot picture I just created as a physical image file to my folder.
Use the ggsave() function to specify the file name and the desired horizontal/vertal size (in inches).
# Save the last active plot as a PNG fileggsave("station_temp_boxplot.png")
Saving 7 x 5 in image
Warning: Removed 927 rows containing non-finite outside the scale range
(`stat_boxplot()`).
Part 2: Time Series Analysis
0. Creating Continuous Time Variable (Decimal Date)
This is a data preparation step for the continuous time-series analysis. Following the assignment guidelines, a cumulative time variable (decdate) is calculated by adding the fractional year component (dDay3 / 365) to the Year variable.
# Create a decimal date variable for continuous time trackingtemp$decdate <- temp$Year + temp$dDay3 /365
Scatterplots of Temperature and Salinity Over Time It is a long-term time series scattering diagram that fixes the decdate, a continuous variable that has just been created, to the x-axis scale and observes changes in temperature and salinity.
Use the geom_point() function to take a dot.
# Temperature time series scatterplotggplot(temp, aes(x = decdate, y = Temperature)) +geom_point(alpha =0.4, color ="plum") +labs(title ="Temperature Trends Over Time", x ="Decimal Date", y ="Temperature") +theme_minimal()
Warning: Removed 927 rows containing missing values or values outside the scale range
(`geom_point()`).
# Salinity time series scatterplotggplot(temp, aes(x = decdate, y = Salinity)) +geom_point(alpha =0.4, color ="pink") +labs(title ="Salinity Trends Over Time", x ="Decimal Date", y ="Salinity") +theme_minimal()
Warning: Removed 798 rows containing missing values or values outside the scale range
(`geom_point()`).
Scatterplot of Salinity Partitioned by Areas This is a question that compares and compares the hourly change of salinity by dividing it into 10 areas (Area).
Create sub-split plots by adding the facet_wrap(~ Area) function.
ggplot(temp, aes(x = decdate, y = Salinity, color = Area)) +geom_point(alpha =0.5) +facet_wrap(~ Area) +labs(title ="Salinity Trends Over Time by Geographic Area", x ="Decimal Date", y ="Salinity") +theme_minimal() +theme(legend.position ="none")
Warning: Removed 798 rows containing missing values or values outside the scale range
(`geom_point()`).
Lineplots of Salinity by Station Grouped into Different Areas This section constructs a time-series line plot (spaghetti diagram) tracking salinity changes at each station over time, paneled by geographic area. To prevent data from blurring across stations, aes(group = station) is specified to draw independent trajectories. For the bonus task, the subset() function filters observations to isolate area ‘OS’ directly inside the ggplot() call.
ggplot(temp, aes(x = decdate, y = Salinity)) +geom_line(aes(group = Station, color = Station), alpha =0.7) +facet_wrap(~ Area) +labs(title ="Salinity Lineplots by Station Wrapped by Area", x ="Decimal Date", y ="Salinity") +theme_minimal() +theme(legend.position ="none")
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_line()`).
**Salinity Lineplot for Area ‘OS’ Only It is a bonus question that draws a time series line graph by drawing only the subset data of the ‘OS’ area out of all 10 area data.
In order to filter the data by condition on the spot, in the data input terminal argument of ggplot()
R built-in data extraction function subset(temp, Area == “OS”) was applied to cleanly separate it.
ggplot(subset(temp, Area =="OS"), aes(x = decdate, y = Salinity)) +geom_line(aes(group = Station, color = Station)) +labs(title ="Salinity Lineplots for Area 'OS' Only", x ="Decimal Date", y ="Salinity") +theme_minimal()