library(tidyverse)AFI 2.4
Begin by loading the tidyverse package in the code chunk above and adding your name as the author.
The readr package includes functions for reading tabular data into R. Each code chuck below should read in the specified data file, storing it in an appropriately named object, and then print the data. The data are stored in the data folder.
Importing Data
Let’s start by reading in the AllCountries.csv data. Modify this code by filling in the ______ to do so:
all_countries <- read_csv("data/AllCountries.csv")
all_countries# A tibble: 217 × 25
Country LandArea Population Density GDP Rural CO2 PumpPrice Military
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Afghanistan 653. 30.6 46.8 665 74.1 35.3 1.28 8.65
2 Albania 27.4 2.90 106. 4460 44.6 12.8 1.81 NA
3 Algeria 2382. 39.2 16.5 5361 30.5 24.6 0.29 NA
4 American Sa… 0.2 0.055 275 NA 12.7 NA NA NA
5 Andorra 0.47 0.079 168. NA 13.8 9.5 1.67 NA
6 Angola 1247. 21.5 17.2 5783 57.5 44.8 0.63 13.8
7 Antigua and… 0.44 0.09 204. 13342 75.4 16.6 NA NA
8 Argentina 2737. 41.4 15.1 14715 8.5 16.9 1.46 NA
9 Armenia 28.5 2.98 105. 3505 37 13.9 1.25 16.8
10 Aruba 0.18 0.103 572. NA 57.9 10.4 NA NA
# ℹ 207 more rows
# ℹ 16 more variables: Health <dbl>, ArmedForces <dbl>, Internet <dbl>,
# Cell <dbl>, HIV <dbl>, Hunger <dbl>, Diabetes <dbl>, BirthRate <dbl>,
# DeathRate <dbl>, ElderlyPop <dbl>, LifeExpectancy <dbl>, FemaleLabor <dbl>,
# Unemployment <dbl>, EnergyUse <dbl>, Electricity <dbl>, Developed <dbl>
That was easy, right? Note that including the #| message: false at the beginning of the code chunk suppresses unneeded messages and cleans up your output for whoever reads it (me!).
Now try the minn_stp_weather.csv data:
minn_stp_weather <- read_csv("data/minn_stp_weather.csv", skip = 17)
minn_stp_weather# A tibble: 1,388 × 12
MonthY MonthS Year LowTemp HighTemp WarmestMin ColdestHigh AveMin AveMax
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1900 -15 51 36 -4 12.6 30
2 2 2 1900 -17 37 20 1 -1.3 18.5
3 3 3 1900 -10 54 31 8 17.2 33.9
4 4 4 1900 26 81 60 34 42.5 63
5 5 5 1900 33 90 68 50 49.9 74.1
6 6 6 1900 46 94 69 67 57.4 80.1
7 7 7 1900 51 95 69 66 60.7 81.2
8 8 8 1900 58 94 75 74 67 86.3
9 9 9 1900 36 92 70 54 52.5 70.5
10 10 10 1900 34 79 60 48 48.3 67.2
# ℹ 1,378 more rows
# ℹ 3 more variables: meanTemp <dbl>, TotPrecip <chr>, Max24hrPrecip <chr>
Did you look at the data file before trying?
Now try the white_nonhisp_death_rates_from_1999_to_2013.txt data:
white_nonhisp_death_rates_from_1999_to_2013 <- read_delim("data/white_nonhisp_death_rates_from_1999_to_2013.txt", delim = "\t")
white_nonhisp_death_rates_from_1999_to_2013# A tibble: 150 × 5
Age Year Deaths Population Rate
<dbl> <dbl> <dbl> <dbl> <dbl>
1 45 1999 8304 3166393 262.
2 45 2000 8604 3207271 268.
3 45 2001 8836 3152637 280.
4 45 2002 9217 3256317 283
5 45 2003 9287 3260376 285.
6 45 2004 9210 3211340 287.
7 45 2005 9352 3279109 285.
8 45 2006 9100 3222835 282.
9 45 2007 8805 3137876 281.
10 45 2008 8751 3074171 285.
# ℹ 140 more rows
Here’s a trickier one. Read in the deaton.txt data:
deaton <- read_table("data/deaton.txt")
deaton# A tibble: 10 × 4
age death_rate_1989 death_rate_2013 change
<dbl> <dbl> <dbl> <dbl>
1 45 262. 261. -1.6
2 46 293. 290. -3.1
3 47 306. 324. 17.6
4 48 337. 343. 5.7
5 49 359 384. 25.5
6 50 377. 422. 45.5
7 51 429 466. 37.1
8 52 445. 481. 36.4
9 53 545. 527. -18.4
10 54 555. 573. 17.4
You might need to search for a function that wasn’t discussed.
Now read in the pga2004.csv data (the help page for read_csv might be useful):
pga2004_col_names <- c("Name", "Age", "Average Drive (Yards)",
"Driving accuracy %", "Greens on regulation (%)",
"Average", "Save Percent", "Money Rank", "Events",
"Total Winnings ($)", "Average Winnings ($)")
pga2004 <- read_csv("data/pga2004.csv", n_max = 195, col_names = pga2004_col_names)
pga2004# A tibble: 195 × 11
Name Age `Average Drive (Yards)` `Driving accuracy %`
<chr> <dbl> <dbl> <dbl>
1 Aaron Baddeley 23 288 53.1
2 Adam Scott 24 295. 57.7
3 Alex Cejka 34 286. 64.2
4 Andre Stolz 34 298. 59
5 Arjun Atwal 31 289. 60.5
6 Arron Oberholser 29 285. 68.8
7 Bart Bryant 42 282. 74.2
8 Ben Crane 28 284. 64.4
9 Ben Curtis 27 282. 64.3
10 Bernhard Langer 47 282. 62.6
# ℹ 185 more rows
# ℹ 7 more variables: `Greens on regulation (%)` <dbl>, Average <dbl>,
# `Save Percent` <dbl>, `Money Rank` <dbl>, Events <dbl>,
# `Total Winnings ($)` <dbl>, `Average Winnings ($)` <dbl>
Lastly, read in the noise.txt data:
noise_col_names <- c("V0", "V1", "V2", "V3", "V4", "V5")
noise <- read_delim("data/noise.txt", delim = " ", skip = 1, col_names = noise_col_names)
noise# A tibble: 3,020 × 6
V0 V1 V2 V3 V4 V5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 -0.675 0.831 -1.24 0.207 0.290
2 2 0.974 -0.0118 -0.415 0.192 -0.167
3 3 -0.745 -1.03 1.79 0.464 -1.88
4 4 1.06 1.01 -0.203 -0.550 1.21
5 5 0.493 -0.215 -0.192 -1.84 0.0446
6 6 -1.21 -0.873 0.582 -0.190 -0.953
7 7 2.00 0.622 -0.967 -1.71 -0.448
8 8 -0.0600 0.920 -0.431 0.350 -0.527
9 9 -1.13 -0.0190 -0.430 -1.12 -0.224
10 10 0.988 -0.322 -0.00720 -1.36 -0.324
# ℹ 3,010 more rows