library(readxl)
library(dplyr)
library(ggplot2)
Lab 1: National Bridge Inventory Analysis
Lab 1: Getting Started with R, RStudio and Quarto
Since 1968 the Federal Highway Administration (FHA) has been keeping record of Bridges and tunnels throughout the Unites States (US). This data keeps a record of various bridge information for the surveyed bridges to assist is making sure that the US infrastructure is still safe for public use. The collected data is published for public review and use each year so for transparency to the public. This is a very important study since the United States (US) infrastructure has come under fire in the last decade. According to the Congressional Research Service Kirk (2018), 9% of bridges are classified as structurally deficient as of 2016 and nearly 57% of the deck surface of bridges are structurally deficient as well. This investigation will filter this data set and prepare it for use for this work. Once the data preparation is complete, the age of these bridges is assessed followed the rating of each bridge.
Part 1: Understanding the Data
The data set for this investigation is a compilation of the FHA’s National Bridge Inventory Administration (2024a) (NBI) data set and the National Centers for Environmental Information’s (NCEI) U.S. Climate Normals data set Environmental Information (2023). The NBI data includes information about bridges across the US including State and county information, global coordinates, year of construction, amount of traffic, type of structure, bridge span, and more. The NCEI data compiles climate data for locations throughout the US to act as a baseline climate measurement to see the changes in climate, or assist in weather predictions. The course instructor compiled these two data sets into one set containing 621,547 rows and 33 Columns. The NBI data set is appended by Information regarding precipitation and temperature to give a more complete understanding of the location a given bridge is located.
The columns of the data set include different metrics by which the bridge is classified and the rows represent an individual bridge. The columns contain the information listed below. Administration (2024b)
- STATE_CODE_001: Defines the state or US territory the bridge resides in.
- COUNTY_CODE_003: Defines the US County or parish the bridge resides in.
- STRUCTURE_NUMBER_008: Gives an identification number to a given structure.
- LAT_016: Global Latitude of the structure.
- LONG_017: Global longitude of a given structure
- YEAR_BUILT_027: The year of structure construction.
- ADT_029: Average Daily Traffic, a measure of how many vehicles use the bridge daily.
- YEAR_ADT_030: Year that average daily traffic is recorded.
- STRUCTURE_KIND_043A: Type of material used and its design.
- STRUCTURE_TYPE_043B: The type of structure.
- APPR_TYPE_044B: The type of design and construction.
- MAX_SPAN_LEN_MT_048: Maximum length of bridge span.
- DECK_COND_058: Deck condition rating (0-9).
- SUPERSTRUCTURE_COND_059: Superstructure condition rating (0-9).
- SUBSTRUCTURE_COND_060: Substructure condition rating (0-9).
- CULVERT_COND_062: The condition of the culverts.
- OPR_RATING_METH_063: Method used to determine operating Rating.
- INV_RATING_METH_065: Method used to determine inventory rating.
- INVENTORY_RATING_066: Inventory rating.
- STRUCTURAL_EVAL_067: Overall structural evaluation.
- YEAR_RECONSTRUCTED_106: Year the structured was constructed.
- DECK_STRUCTURE_TYPE_107: Type of materials used for the structure’s deck.
- PERCENT_ADT_TRUCK_109: Average daily truck traffic.
- SCOUR_CRITICAL_113: The amount of sediment on or flowing on the structure.
- BRIDGE_CONDITION: Condition of the bridge as defined by the Code of Federal Regulations (CFR)
- LOWEST_RATING: The lowest rating among the deck, superstructure, and substructure.
- DECK_AREA: Area of the deck of the structure as defined by the Code of Federal Regulations (CFR)
- LAT: Global Latitude (NCEI)
- LONG: Global Longitude (NCEI)
- Weather Data (NCEI):
- ANN-PRCP-NORMAL: Annual precipitation normal.
- DJF-PRCP-NORMAL: Winter precipitation normal (December-January-February).
- ANN-TMAX-NORMAL: Annual maximum temperature normal.
- ANN-TMIN-NORMAL: Annual minimum temperature normal.
Evaluation of the bridge condition is quantified on a scale form 0 to 9 as per the NHA Administration (2024c). The rating criteria is described in the list below.
The condition ratings for bridges are on a scale from 0 to 9:
- 9: Excellent condition.
- 7-8: Good condition.
- 5-6: Fair condition.
- 3-4: Poor condition.
- 0-2: Critical condition requiring immediate attention.
Part II: Import Data into R
This data set is very large, consequently, it is imperative that a computer software is used for both the filtering and post-processing of this data using R-coding to complete these tasks. The first step is to import the necessary libraries for this script. The code block below shows the that the libraries “readxl,” “dyplyr,” and “ggplot2” are imported which handle reading Excel files, filtering data, and plotting results respectively. The line that declares “warning: false” avoids two unnecessary warnings from appearing in the final document.
A working directory is the base directory (or folder) that the script runs off. Defining the working directory is important so that when the script references files, the referenced file paths do not have to be referenced back to the root directory. The next code block first assigns the variable “WD” as the current file location using the “getwd()” function to set the working directory of the script as the script’s current directory. The next line defines an if statement such that if a file path does not exist then then the current file location is the working directory.
<- getwd()
WD if (!is.null(WD)) {setwd(WD)}
The following code block assigns the variable “dataset” to the excel file containing the compiled data set.
<- read_excel("Source Docs/NBI2023.xlsx") dataset
Finally, using the head function, the first five rows and columns are shown below. The “n=c(x,y)” attribute is defined to avoid the R warning that there are y amount more columns beyond the 5 that the margins of the document are limited to.
head(dataset, n = c(5,5))
# A tibble: 5 × 5
STATE_CODE_001 COUNTY_CODE_003 STRUCTURE_NUMBER_008 LAT_016 LONG_017
<chr> <chr> <chr> <chr> <chr>
1 01 053 00000000000S702 31061094 087341348
2 01 053 00000000000S703 31062020 087340890
3 01 113 0000000000M0022 32174330 084583799
4 01 059 000000883039900 34270600 087583100
5 01 079 000001014002450 34485200 087225400
Part III: Filter Data
For this work, the only structure data considered is the dtructure data with corresponding weather data. Not every structure has weather data associated with it so it is necessary to filter the data to show only the structures with weather data. The method for filtering data is shown in the code block below. Here, a new variable “filtered_data” is defined which is initially a copy of the “dataset” variable. The next four lines use the “filter” function from the dyplr library to check whether a row for the columns “ANN-PRCP-NORMAL,” “DJF-PRCP-NORMAL,” “ANN-TMAX-NORMAL,” or “ANN-TMIN-NORMAL” are null. If they are null, then the row is removed, thus, leaving only the structures including weather data. Finally, the head function is used again to show the first five rows and columns. Comparing this data with the previous data shows that rows have been removed, therefore, verifying the filtering step works properly.
<- dataset %>%
filtered_data filter(!is.na(dataset$'ANN-PRCP-NORMAL'),
!is.na(dataset$'DJF-PRCP-NORMAL'),
!is.na(dataset$'ANN-TMAX-NORMAL'),
!is.na(dataset$'ANN-TMIN-NORMAL'))
head(filtered_data, n = c(5,5))
# A tibble: 5 × 5
STATE_CODE_001 COUNTY_CODE_003 STRUCTURE_NUMBER_008 LAT_016 LONG_017
<chr> <chr> <chr> <chr> <chr>
1 01 059 000000883039900 34270600 087583100
2 01 073 000042 33302400 086494800
3 01 073 000075 33311596 086532040
4 01 073 000080 33330000 086480000
5 01 073 000090 33304487 086480888
Part IV: Plot Histogram of Bridge Ages
This data set is from 2023. Therefore, to compute the bridge age, the year the data is recorded is used for calculation. The code block below defines a vector “age” as 2023 minus the column which contains the year the structure was built. The “as.numeric” function makes sure that only integers or floats are included in this vector
<- as.numeric(2023 - filtered_data$YEAR_BUILT_027) age
Now that the age of each structure is computed, it is plotted as a histogram in Figure 1 below. The first line in the code block below defines a label for the figure, the second defines a caption for the figure, and the third declares that warnings will not be shown to keep the output clean. To plot the histogram, the library ggplot2 is used shown in line 5. The attributes of this function include the filtered_data variable to count the structures, the x axis which is the “age” variable, and the declaration of the number of bins. Next, x and y labels are defined followed by the x-limits of the plot which is defined from zero to the oldest bridge.
ggplot(filtered_data, aes(x = age)) +
geom_histogram(bins = 100) +
labs(x = "Age of Bridge (years)", y = "Number of Bridges")+
xlim(0, max(age))
Part V: Plot Bar Chart of the Lowest Ratings
Structure rankings are important to make sure that a structure is safe for operation under public use. Therefore, it is important to know how many of the structure’s rankings are low. To do this, a new variable “Lowest_Ranking” is defined which pulls the column “LOWEST_RANKING” from the filtered data set and uses the “as.numeric” function to make sure that each entry is an integer or float.
<- as.numeric(filtered_data$LOWEST_RATING) Lowest_Ranking
The code block below plots Figure 2 which is the histogram for the number of bridges vs. the lowest bridge rankings. It is very similar to the code block plotting Figure 1 above except the label and caption reflect this new histogram. Furthermore, the x-axis is the used the “Lowest_Ranking” variable, the number of bins is adjusted to 5 bins to reflect the steps of the score, and the x-axis limit is from zero to 9 to reflect the over all range of scores.
ggplot(filtered_data, aes(x = Lowest_Ranking)) +
geom_histogram(bins = 5) +
labs(x = "Lowest Bridge Rankings", y = "Number of Bridges")+
xlim(0, 9)
Part VI: Conclusions
US infrastructure has been criticized for many years. The data presented in this investigation shows that majority of the bridges are under 100 years old, but the rankings show that the majority of bridges are in poor to good condition. In fact, over 50,000 bridges are in poor to fair condition. This investigation shows that there is a need for federal and state governments examine and repair bridges all across the US before the crisis gets to a point where the money required for these repairs is too high, thus, using unnecessary tax-payer dollars. Furthermore, this investigation is a fantastic introduction to the syntax and scripting of R and the user interface of the IDE RStudio. This is an incredibly powerful software that needs to be taught earlier in the academic endeavors of STEM based students.