how — title: DATA 606 Data Project Proposal author: —

library(tidyverse)
## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
## v dplyr     1.1.0     v readr     2.1.5
## v forcats   1.0.0     v stringr   1.5.0
## v ggplot2   3.4.4     v tibble    3.2.1
## v lubridate 1.9.3     v tidyr     1.3.1
## v purrr     1.0.1     
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Data Preparation

# load data
Transit_data <- read.csv("C:\\Users\\Oluwaseyi\\Documents\\Data 606\\Transportation Data\\Monthly_Transportation_Statistics.csv")

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

Are fatalities in different modes of transportation, Air, rail and road connected in any way to government spending on, public usage of and other variables on related infrastructure.

Cases

What are the cases, and how many are there? 924 obs o

Data collection

Dataset was downloaded from kaggle

Type of study

This is an observational study. It is data collected without any influence over the data

Data Source

https://www.kaggle.com/datasets/utkarshx27/monthly-transportation-statistics/data

Dependent Variable

What is the response variable? Is it quantitative or qualitative? fatalities by mode of transport

Independent Variable(s)

Spending on each of the modes of transport and usage of the modes of transport.

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

Highway_Fatality <- summary(Transit_data$Highway.Fatalities)
Rail_Fatality <- summary(Transit_data$Rail.Fatalities)
Air_Fatality <- summary(Transit_data$Air.Safety...General.Aviation.Fatalities)

Air_Traffic <- summary(Transit_data$U.S..Airline.Traffic...Total...Seasonally.Adjusted)
Rail_Traffic <- summary(Transit_data$Passenger.Rail.Passengers)
Vehicle_Miles <- summary(Transit_data$Highway.Vehicle.Miles.Traveled...All.Systems)

Transit_spending <- summary(Transit_data$State.and.Local.Government.Construction.Spending...Mass.Transit)
Highway_Spending <- summary(Transit_data$State.and.Local.Government.Construction.Spending...Highway.and.Street)
Air_spending <- summary(Transit_data$State.and.Local.Government.Construction.Spending...Air)