── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 2930 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): state, county
dbl (1): total_employees
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#read_csv(here())is not working as the document is not saved in the default working directory. So, the setwd() function is needed to redirect the working directory.railroad
# A tibble: 2,930 × 3
state county total_employees
<chr> <chr> <dbl>
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
7 AK SKAGWAY MUNICIPALITY 88
8 AL AUTAUGA 102
9 AL BALDWIN 143
10 AL BARBOUR 1
# ℹ 2,920 more rows
Rows: 30977 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): Domain Code, Domain, Area, Element, Item, Unit, Flag, Flag Description
dbl (6): Area Code, Element Code, Item Code, Year Code, Year, Value
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#read_csv(here())is not working as the document is not saved in the default working directory. So, the setwd() function is needed to redirect the working directory.as_tibble(birds)
# A tibble: 30,977 × 14
`Domain Code` Domain `Area Code` Area `Element Code` Element `Item Code`
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
1 QA Live Anim… 2 Afgh… 5112 Stocks 1057
2 QA Live Anim… 2 Afgh… 5112 Stocks 1057
3 QA Live Anim… 2 Afgh… 5112 Stocks 1057
4 QA Live Anim… 2 Afgh… 5112 Stocks 1057
5 QA Live Anim… 2 Afgh… 5112 Stocks 1057
6 QA Live Anim… 2 Afgh… 5112 Stocks 1057
7 QA Live Anim… 2 Afgh… 5112 Stocks 1057
8 QA Live Anim… 2 Afgh… 5112 Stocks 1057
9 QA Live Anim… 2 Afgh… 5112 Stocks 1057
10 QA Live Anim… 2 Afgh… 5112 Stocks 1057
# ℹ 30,967 more rows
# ℹ 7 more variables: Item <chr>, `Year Code` <dbl>, Year <dbl>, Unit <chr>,
# Value <dbl>, Flag <chr>, `Flag Description` <chr>
3.Summarize & Description
3.1 railroad data summarize
The summary is mainly focused on the total number of employees. The table “sum_railroad” below shows the sum, median, mean, variability, and standard deviation of the employees in each state group to fully describe and understand the data and show the differences between states. As states have different numbers of counties, another table “summary_railroad_table” contains this information to show these differences better and make the differences between states more clear.
# A tibble: 53 × 7
state n total_number median_number mean_number var_number sd_number
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AE 1 2 2 2 NA NA
2 AK 6 103 2.5 17.2 1209. 34.8
3 AL 67 4257 26 63.5 16943. 130.
4 AP 1 1 1 1 NA NA
5 AR 72 3871 16.5 53.8 17197. 131.
6 AZ 15 3153 94 210. 51885. 228.
7 CA 55 13137 61 239. 301916. 549.
8 CO 57 3650 10 64.0 16320. 128.
9 CT 8 2592 125 324 270606. 520.
10 DC 1 279 279 279 NA NA
# ℹ 43 more rows
3.2 birds data summarize
The summary is divided into four parts- area, year, item, and flag description. After using the filter functions in Excel to analyze columns in the data sets, these are the three variables divided into multiple categories (besides the variable “value.” Other variables basically only have one category or provide numerical code for the categories of these three variables.
The summary connects these four qualitative variables to a quantitative variable, which is “value.” Based on the data set, it is unsure what the “value” refers to, but I treat it as the number of animals/birds.
All the tables below contain the sum, median, mean, variability, and standard deviation of the “value,” which we hope will provide some description and an understanding of the data based on different categorical variables. Besides, the “summary_items_birds_table” table includes the numbers of different kinds of animals (or birds), which provides more information about the data sets and shows the differences between different types of animals(or birds). The table “summary_flag_description_birds_table” also contains the numbers of various flag descriptions for the same purpose as the “summary_items_birds_table” table.
# A tibble: 6 × 6
`Flag Description` flag_total_number flag_median_number flag_mean_number
<chr> <dbl> <dbl> <dbl>
1 Aggregate, may include … 2232340190 8003 345885.
2 Data not available 0 NA NaN
3 FAO data based on imput… 57689560 501 47559.
4 FAO estimate 165898597 465 16578.
5 Official data 346111032 2500 32128.
6 Unofficial figure 174414299 1952 116743.
# ℹ 2 more variables: flag_var_number <dbl>, flag_sd_number <dbl>
# A tibble: 6 × 1
`birds$\`Flag Description\``
<chr>
1 FAO estimate
2 Official data
3 FAO data based on imputation methodology
4 Data not available
5 Unofficial figure
6 Aggregate, may include official, semi-official, estimated or calculated data