PCC Rail Bridge Analysis

Author

Naomi Surendorj

Introduction:

My dataset is about railroad bridges from the Palouse River and Coulee City (PCC) rail system in Washington State. The data has information on each bridge, like where it is, what type it is, how long it is, the year it was built, latitude and longitude. There are both categorical and numeric variables in this dataset, so it works for this project.

For my project, I decided to explore the bridges to see if anything stood out in the data. I decided to focus on which bridge type is the longest on average because I wanted to compare the types of bridges. I also made a map to see where the bridges are located in the PCC system. I chose this dataset because it was different from what I am usually interested in.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
library(leaflet)
rail <- read_csv("WSDOT_-_Palouse_River_and_Coulee_City_Rail_System_-_Rail_Bridges.csv")
Rows: 164 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (7): LineSegmentNumber, BridgeIdentification, Location, State, Types, C...
dbl (13): X, Y, OBJECTID, ID, Milepost, Latitude, Longitude, BridgeSectionNu...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(rail)
# A tibble: 6 × 20
         X      Y OBJECTID    ID LineSegmentNumber BridgeIdentification Location
     <dbl>  <dbl>    <dbl> <dbl> <chr>             <chr>                <chr>   
1  -1.31e7 6.02e6        1     1 0378              1.38                 Cheney  
2  -1.31e7 6.02e6        2     2 0378              1.63                 Cheney  
3  -1.31e7 6.04e6        3     3 0378              11.96                Cheney  
4  -1.31e7 6.04e6        4     4 0378              12.14                Cheney  
5  -1.31e7 6.04e6        5     5 0378              13.69                Cheney  
6  -1.31e7 6.05e6        6     6 0378              15.54                Reardon 
# ℹ 13 more variables: State <chr>, Milepost <dbl>, Latitude <dbl>,
#   Longitude <dbl>, BridgeSectionNumber <dbl>, NumberOfSpans <dbl>,
#   Types <chr>, Bent <dbl>, YearConstructed <dbl>, SectionLengthFt <dbl>,
#   EstHgtMax <dbl>, Comment <chr>, LineName <chr>
names(rail)
 [1] "X"                    "Y"                    "OBJECTID"            
 [4] "ID"                   "LineSegmentNumber"    "BridgeIdentification"
 [7] "Location"             "State"                "Milepost"            
[10] "Latitude"             "Longitude"            "BridgeSectionNumber" 
[13] "NumberOfSpans"        "Types"                "Bent"                
[16] "YearConstructed"      "SectionLengthFt"      "EstHgtMax"           
[19] "Comment"              "LineName"            
rail_clean <- rail |>
  mutate(Longitude = as.numeric(Longitude),
         Latitude = as.numeric(Latitude))

rail_small <- rail_clean |>
  select(Location, Types, NumberOfSpans, YearConstructed,
         SectionLengthFt, EstHgtMax, Latitude, Longitude)

rail_small <- rail_small |>
  filter(!is.na(SectionLengthFt))

head(rail_small)
# A tibble: 6 × 8
  Location Types NumberOfSpans YearConstructed SectionLengthFt EstHgtMax
  <chr>    <chr>         <dbl>           <dbl>           <dbl>     <dbl>
1 Cheney   ODPT              1            1956              15         2
2 Cheney   WSB               1              NA              16         4
3 Cheney   ODPT              2            1960              30         4
4 Cheney   ODPT              1            1960              17         4
5 Cheney   ODPT              4            1945              64         6
6 Reardon  ODPT              1            1947              15         3
# ℹ 2 more variables: Latitude <dbl>, Longitude <dbl>
rail_small |>
  summarize(
    avg_length = mean(SectionLengthFt),
    shortest = min(SectionLengthFt),
    longest = max(SectionLengthFt),
    avg_year = mean(YearConstructed, na.rm = TRUE))
# A tibble: 1 × 4
  avg_length shortest longest avg_year
       <dbl>    <dbl>   <dbl>    <dbl>
1       72.7        8     355    1949.

Final Visualization

avg_type <- rail_small |>
  group_by(Types) |>
  summarize(avg_length = mean(SectionLengthFt, na.rm = TRUE)) |>
  arrange(desc(avg_length))

ggplot(avg_type, aes(x = reorder(Types, avg_length), y = avg_length, fill = avg_length)) +
  geom_col(width = 0.65) +
  coord_flip() +
  labs(
    title = "Average Bridge Length by Bridge Type",
    x = "Bridge Type",
    y = "Average Length (Feet)",
    fill = "Avg Length",
    caption = "Data Source: WSDOT"
  ) +
  theme_minimal() +
  scale_fill_viridis_c(option = "plasma") +  # purple/magenta gradient
  geom_text(aes(label = round(avg_length, 1)),
            hjust = -0.15,
            size = 3.3,
            color = "black") +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    axis.text.x = element_text(size = 9),
    axis.text.y = element_text(size = 9)
  ) +
  ylim(0, max(avg_type$avg_length) * 1.15)

leaflet(data = rail_small) |>
  addTiles() |>
  addCircleMarkers(
    lat = ~Latitude,
    lng = ~Longitude,
    radius = ~sqrt(SectionLengthFt) / sqrt(max(rail_small$SectionLengthFt, na.rm = TRUE)) * 20,
    color = "purple",
    stroke = TRUE,
    weight = 1,
    fillOpacity = 0.65,
    popup = ~paste(
      "<b>Location:</b>", Location, "<br>",
      "<b>Bridge Type:</b>", Types, "<br>",
      "<b>Length (ft):</b>", SectionLengthFt, "<br>",
      "<b>Year Built:</b>", YearConstructed))