Import the data set using a Tidyverse function and NOT with a Base R function. How many rows and columns are in the data set?
I imported my data using the “read_csv” Tidyverse function. There are 3142 rows and 35 columns.
Do any data types need changed? Show any code to change variable types and show code/output for a command after you’re finished.
State and County may need to be transformed into dummy variables?
Are there any missing values? How will you handle missing values? Will you impute a missing value with, for example, a mean or median value for the entire column, or will you remove the entire observation? Give a rationale for your decision and show any code/output to handle missing values.
There are two NA values, one in income and the other is in child_poverty. If we were to drop 2 rows from our data then we would be losing .0006% of our full data frame. I think it is perfectly reasonable to drop the record(s) associated with these 2 NA values.
Use the summary() function to examine any unusual values. Are there any? If so, how will you handle these unusual values? Show any code/output to handle unusual values.
Notes: * For the sake of time, you do not need to create any visualizations or other statistical summaries for every variable—the summary function will suffice for this homework). * You should read the data dictionary for this homework to understand the context behind each variable.
total_pop has a wild range, so does men, women, citizen, and employed. Our data dictionary tells me that our employed variable is supposed to read as a percentage.
How many counties have more women than men? 63.18% or 1,985 counties
How many counties have an unemployment rate lower than 10%? 2420 counties have an unemployment rate lower than 10%
What are the top 10 counties with the highest mean commute?
Census ID Name State mean_commute
42103 Pike Pennsylvania 44
42103 Bronx New York 43
42103 Charles Maryland 42.8
42103 Warren Virginia 42.7
42103 Queens New York 42.6
42103 Richmond New York 42.5
42103 Westmoreland Virginia 42.4
42103 Park Colorado 41.7
42103 Kings New York 41.4
42103 Clay West Virginia 41
Create a new variable that calculates the percentage of women for each county and then find the top 10 counties with the lowest percentages. Show the census ID, county name, state, and the percentage in your final answer (sorted by ascending percentage)
County Name