Kruzlic HW2 Try 1

DACSS 601 Homework 2 Example

Bryn Kruzlic
2022-02-12

Functions: These are the libraries needed for the dataset

Variables: These are the variables in the dataset railroad

  1. state: state represents the 53 states in America
  2. total_employees: total_employees represents the number of employees in each state

This is how to read in a dataset using read_csv

A ‘tibble’ refers to a data frame

library(readr)
Data <- read_csv("C:/Users/Bryn Kruzlic/OneDrive/Desktop/DACSS601/railroad_2012_clean_state.csv")
col_names = c(chr = "x", dbl = "y")
View(Data)
as_tibble(Data) # A tibble: 10 x 2
# A tibble: 53 x 2
   state total_employees
   <chr>           <dbl>
 1 AE                  2
 2 AK                103
 3 AL               4257
 4 AP                  1
 5 AR               3871
 6 AZ               3153
 7 CA              13137
 8 CO               3650
 9 CT               2592
10 DC                279
# ... with 43 more rows
summary(Data)
    state           total_employees
 Length:53          Min.   :    1  
 Class :character   1st Qu.: 1917  
 Mode  :character   Median : 3379  
                    Mean   : 4819  
                    3rd Qu.: 6092  
                    Max.   :19839  

Perform at least 2 basic data-wrangling operations

library(dplyr)
filter(Data, total_employees > 1000) %>%
  arrange(desc(total_employees))
# A tibble: 42 x 2
   state total_employees
   <chr>           <dbl>
 1 TX              19839
 2 IL              19131
 3 NY              17050
 4 NE              13176
 5 CA              13137
 6 PA              12769
 7 OH               9056
 8 GA               8605
 9 IN               8537
10 MO               8419
# ... with 32 more rows

We are using functions filter() and arrange () to filter out the states with more than 1000 total employees and then arranging them in descending order.