R Markdown

This is an R Markdown document containing Sean Amato’s work for the week 2 bridge homework.

Question 1: Use the summary function to gain an overview of the data set (USSeatBelts.csv). Display the mean and median for 2 columns (fatalities and age).

Summary:

us_seatbelts <- 
  read.csv('https://raw.githubusercontent.com/samato0624/R/main/USSeatBelts.csv')
summary(us_seatbelts)
##        X          state                year          miles       
##  Min.   :  1   Length:765         Min.   :1983   Min.   :  3099  
##  1st Qu.:192   Class :character   1st Qu.:1986   1st Qu.: 11401  
##  Median :383   Mode  :character   Median :1990   Median : 30319  
##  Mean   :383                      Mean   :1990   Mean   : 41448  
##  3rd Qu.:574                      3rd Qu.:1994   3rd Qu.: 52312  
##  Max.   :765                      Max.   :1997   Max.   :285612  
##                                                                  
##    fatalities          seatbelt        speed65            speed70         
##  Min.   :0.008327   Min.   :0.0600   Length:765         Length:765        
##  1st Qu.:0.017341   1st Qu.:0.4200   Class :character   Class :character  
##  Median :0.021199   Median :0.5500   Mode  :character   Mode  :character  
##  Mean   :0.021490   Mean   :0.5289                                        
##  3rd Qu.:0.024774   3rd Qu.:0.6500                                        
##  Max.   :0.045470   Max.   :0.8700                                        
##                     NA's   :209                                           
##    drinkage           alcohol              income           age       
##  Length:765         Length:765         Min.   : 8372   Min.   :28.23  
##  Class :character   Class :character   1st Qu.:14266   1st Qu.:34.39  
##  Mode  :character   Mode  :character   Median :17624   Median :35.39  
##                                        Mean   :17993   Mean   :35.14  
##                                        3rd Qu.:21080   3rd Qu.:36.13  
##                                        Max.   :35863   Max.   :39.17  
##                                                                       
##    enforce         
##  Length:765        
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

Mean and Median for Fatalities and Age:

mean(us_seatbelts$fatalities)
## [1] 0.02148951
median(us_seatbelts$fatalities)
## [1] 0.02119896
mean(us_seatbelts$age)
## [1] 35.13719
median(us_seatbelts$age)
## [1] 35.39177

Question 2: Create a new data frame with a subset of columns (state, fatalities, age, income) and rows (income >= 17624).

library(dplyr)
filtered_columns <- data.frame(us_seatbelts$state, us_seatbelts$fatalities,
                               us_seatbelts$age,  us_seatbelts$income)
filtered_df <- filtered_columns %>%
  filter(us_seatbelts.income >= 21080)

Question 3: Create new names for each column.

colnames(filtered_df) <- c("STATE", "FATALITIES", "AGE", "INCOME")

Question 4: Repeat question 1 with the new new data frame (filtered_df). Checking for difference in fatalities versus income, this is filtered to the top 25th percentile based on income.

Summary:

summary(filtered_df)
##     STATE             FATALITIES            AGE            INCOME     
##  Length:192         Min.   :0.008327   Min.   :29.83   Min.   :21080  
##  Class :character   1st Qu.:0.013494   1st Qu.:35.41   1st Qu.:22287  
##  Mode  :character   Median :0.015744   Median :36.11   Median :23634  
##                     Mean   :0.016023   Mean   :35.95   Mean   :24456  
##                     3rd Qu.:0.018054   3rd Qu.:36.76   3rd Qu.:25632  
##                     Max.   :0.030117   Max.   :39.17   Max.   :35863

Mean and Median for Fatalities and Age:

mean(filtered_df$FATALITIES)
## [1] 0.01602266
median(filtered_df$FATALITIES)
## [1] 0.01574439
mean(filtered_df$AGE)
## [1] 35.95432
median(filtered_df$AGE)
## [1] 36.10976

Conclusions: Comparing the national fatality rate, which is 2.2%, to the fatality rate of the top 25% of of people by income, which is 1.6%, there is approximately a 26% decrease. I can speculate that people with more money could afford vehicles with more safety features (i.e. seatbelts) and more easily afford the maintenance required to keep their vehicle safe. However, it’s tough to say what’s causing the difference without investigating further. Regarding the mean age, there was only a 2.2% increase in the mean age (35.1 vs 35.9) so I can’t say how much age contributes to the fatality rate, but I can say that typically the older you are the more money you make and that could contribute to the increase.

Question 5: Rename three distinct values in a column (State Column: AK -> Arkansas, Az -> Arizona, CA -> California)

df <- filtered_df %>%
  mutate(STATE = ifelse(STATE == 'AK', 'Arkansas', 
                        ifelse(STATE == 'AZ', 'Arizona', 
                               ifelse(STATE == 'CA', 'California', STATE))))

Question 6: Display enough rows to show example of all the changes made to the data set in steps 1-5.

The data set loaded in question 1 had 13 rows including an index; in question 2 I reduced the columns to 4 and made the income values >= to the 3rd quartile ($21080) from the original data set;
in question 3 I capitalized the column names;
in question 4 there was no change to the data;
in question 5 I changed 3 of the state abbreviations to their actual names.
in question 6 I only show 20 rows to illustrate that CO was not renamed.

top_20_rows <- head(df, n = 20)
print(top_20_rows)
##         STATE FATALITIES      AGE INCOME
## 1    Arkansas 0.02511813 29.82771  21496
## 2    Arkansas 0.02811768 30.21070  22073
## 3    Arkansas 0.03011741 30.46439  22711
## 4    Arkansas 0.02048193 30.75657  23417
## 5    Arkansas 0.02110114 31.17860  23971
## 6    Arkansas 0.01968408 31.44535  24310
## 7    Arkansas 0.01755186 31.60147  24969
## 8     Arizona 0.02186659 35.70044  21998
## 9  California 0.02005206 33.83672  21363
## 10 California 0.01817223 33.79849  21491
## 11 California 0.01596660 33.89958  22191
## 12 California 0.01563016 33.98206  22430
## 13 California 0.01556208 34.05400  22953
## 14 California 0.01516802 34.17737  23983
## 15 California 0.01434670 34.29433  25142
## 16 California 0.01291262 34.48209  26314
## 17         CO 0.01708540 34.55241  22117
## 18         CO 0.01738614 34.78416  23019
## 19         CO 0.01839808 34.98942  24304
## 20         CO 0.01707202 35.19139  25627

Question 7: Have this program read the CSV file from github.

#Used in question 1.
us_seatbelts <- 
  read.csv('https://raw.githubusercontent.com/samato0624/R/main/USSeatBelts.csv')