Assignment 3

Question One

In its original form, the data set is 3,142 rows by 35 columns. This is something I can tell by the standard use of the strucutre function.

Question Two

I have to change the census ID to a string, because it is not an actual number but rather a unique code. Moreover, I change state from a character to a factor because a state is a category as there are only 50. The mutate function allows me to changese this information, creating a new colummn with this updated information. What’s more, the glimpse gives a quick look into the data, and we can see that the values seem to be all set.

Question Three

There are two columns with an NA value, one in with an N/A income and another with an N/A in child poverty. It seems though that this was perhaps just a mistake. In order to keep using the important data that is included elsewhere in these entry points, I will place the mean in for these data points.

I run the colSums function twice, to make sure that I am able to get rid of all the null values.

Question Four

While there are certainly some “outliers,” I don’t think that any values are particularity unusual. Looking at the information with the summary function, I see no negative values or complete zeros for information like transportation. Because we are speaking about counties, all these statistics are sure to vary just as life styles vary across the country. For example, while LA county has well over 10 million people there are small rural counties in West Texas that have just a handful of people. This doesn’t make any value unsual neccesary. Likewise, there are richer counties where the childhood poverty rate would be zero and some where it would be very high.

Question Five

1985 counties have more women than men. The filter function is an easy way to find this out and the summarise function helps me know just how many there are exactly.

Question 6

2420 counties have unemployment rates less than 10%. Much like the function above, the filter function is an easy way to narrow down the numbers we are looking for before asking the summary function to count how many results we get from the output.

Question 7

The 10 counties with longest mean commutes are

Pike Pennsylvania
Bronx New York
Charles Maryland
Warren Virginia
Queens New York
Richmond New York
Westmoreland Virginia
Park Colorado
Kings New York
10.Clay West Virginia

My select function keeps only the variables I mention. From there I am able to grab the top 10 mean commutes as the top_n function allows me to do that easily. For the arrange function, I easily arrange the counties in descending order, which I have to specify because usually it is asscedning.

Question 8

The counties with the lowest percentage of women are

1.Forest Pennsylvania
2.Bent Colorado
3.Sussex Virginia
4.Wheeler Georgia
5.Lassen California
6.Concho Texas
7.Chattahoochee Georgia
8.Aleutians East Borough Alaska
9.West Feliciana Louisiana
10.Pershing Nevada

The mutate function allows me to create a new column that I then multiply by 100 to keep the form of the other data. From there I select function keeps only the variables I mention. From there I am able to grab the top 10 counties with the lowest percentage of women as specified by the -10. Finally, we arrange the counties in ascending order with the arrange function.

Question 9

Step One: I use the mutate function to create a variable that adds all the percentages of each race together.

Step Two. Using the select and top n funcion, I create a string of functions that allow me to pull the bottom 10 counties by sum of the race. Then the arrange function places them in ascending order.

A. The counties with the lowest sum of race percentages are

1 Hawaii Hawaii
2 Maui Hawaii
3 Mayes Oklahoma
4 Honolulu Hawaii
5 Pontotoc Oklahoma
6 Grundy Tennessee
7 Yakutat City and Borough Alaska
8 Johnston Oklahoma
9 Kauai Hawaii
10 Alfalfa Oklahoma

B. Hawaii has the lowest race sum. I am able to find this because we pull the average of all the counties of each state using the summarise function. From there, we then are able to pull the lowest state average using the slice function since we arranged them in asscending order.

C. 11 counties have a perfect make up of 100. We are able to see this using the simple filter function asking to produce values more 100 and the pairing them with the other information we were asked to include

D. Through grouping all counties together by state, and then filtering by average state, we can see that no states have a perfect 100 percent of racial makeup from the groups presented.

Question 10

A I create a new variable called carpool rank with the mutate function

## # A tibble: 10 × 5
##    census_id county   state    carpool carpool_rank
##    <chr>     <chr>    <fct>      <dbl>        <int>
##  1 13061     Clay     Georgia     29.9            1
##  2 18087     LaGrange Indiana     27              2
##  3 13165     Jenkins  Georgia     25.3            3
##  4 5133      Sevier   Arkansas    24.4            4
##  5 20175     Seward   Kansas      23.4            5
##  6 48079     Cochran  Texas       22.8            6
##  7 48247     Jim Hogg Texas       22.6            7
##  8 48393     Roberts  Texas       22.4            8
##  9 39075     Holmes   Ohio        21.8            9
## 10 21197     Powell   Kentucky    21.6           10

## # A tibble: 10 × 5
##    census_id county   state        carpool carpool_rank
##    <chr>     <chr>    <fct>          <dbl>        <int>
##  1 48261     Kenedy   Texas            0           3141
##  2 48269     King     Texas            0           3141
##  3 48235     Irion    Texas            0.9         3140
##  4 31183     Wheeler  Nebraska         1.3         3139
##  5 36061     New York New York         1.9         3138
##  6 13309     Wheeler  Georgia          2.3         3136
##  7 38029     Emmons   North Dakota     2.3         3136
##  8 30019     Daniels  Montana          2.6         3134
##  9 31057     Dundy    Nebraska         2.6         3134
## 10 46069     Hyde     South Dakota     2.8         3132

D. Arizona is best for carpooling

E. 1. Arizona
2. Utah
3. Arkansas
4. Hawaii
5. Alaska

These four code chunks essentially all work the same. Using the group by fucntion, I make sure that all the counties from one state are grouped together before then using functions like summarize and arrange to make sure that the data is presented in the aggregation I would want and then ranked how I want. Finally, the slice function gives us the the top number of each variable, helping us answer questions.