library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(odbc)
library(DBI)
library(RSQLite)
host_city_locations <- read_csv('host_city_locations.csv')
## Rows: 52 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Season, City
## dbl (3): Year, lon, lat
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
athlete_events <- read_csv('athlete_events.csv')
## Rows: 271116 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Name, Sex, Team, NOC, Games, Season, City, Sport, Event, Medal
## dbl (5): ID, Age, Height, Weight, Year
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(athlete_events)
## # A tibble: 6 × 15
## ID Name Sex Age Height Weight Team NOC Games Year Season City
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 1 A Dijiang M 24 180 80 China CHN 1992… 1992 Summer Barc…
## 2 2 A Lamusi M 23 170 60 China CHN 2012… 2012 Summer Lond…
## 3 3 Gunnar N… M 24 NA NA Denm… DEN 1920… 1920 Summer Antw…
## 4 4 Edgar Li… M 34 NA NA Denm… DEN 1900… 1900 Summer Paris
## 5 5 Christin… F 21 185 82 Neth… NED 1988… 1988 Winter Calg…
## 6 5 Christin… F 21 185 82 Neth… NED 1988… 1988 Winter Calg…
## # ℹ 3 more variables: Sport <chr>, Event <chr>, Medal <chr>
head(host_city_locations)
## # A tibble: 6 × 5
## Year Season City lon lat
## <dbl> <chr> <chr> <dbl> <dbl>
## 1 1896 Summer Athina 23.7 38.0
## 2 1900 Summer Paris 2.35 48.9
## 3 1904 Summer St. Louis -90.2 38.6
## 4 1906 Summer Athina 23.7 38.0
## 5 1908 Summer London -0.128 51.5
## 6 1912 Summer Stockholm 18.1 59.3
Create a SQLite database called olympics.db. Inside this database you must create two tables, one for each dataset. The table created from athlete_events.csv must be named athlete_events, and the table created from host_city_locations.csv must be named host_city_locations. Both tables must be stored inside the same database file olympics.db. (We did something similar in the R activity)
olympics.db <- dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(olympics.db, athlete_events)
copy_to(olympics.db, host_city_locations)
Write SQL queries to answer the questions.
SELECT count(*)
FROM athlete_events
| count(*) |
|---|
| 271116 |
There are a total of 271116 records in athlete_events
SELECT COUNT(DISTINCT athlete_events.Name) as athletes_in_londonnn
FROM athlete_events
JOIN host_city_locations
ON athlete_events.Year = host_city_locations.Year
WHERE host_city_locations.City = "London"
| athletes_in_londonnn |
|---|
| 17590 |
I cannot believe that first line worked. Or maybe it didn’t idk
There were a total of 17590 different athletes that competed in Olympics held in London.
SELECT lat, city
FROM host_city_locations
ORDER BY lat desc
| lat | City |
|---|---|
| 61.11527 | Lillehammer |
| 60.16986 | Helsinki |
| 59.91387 | Oslo |
| 59.32932 | Stockholm |
| 59.32932 | Stockholm |
| 55.75583 | Moskva |
| 52.52001 | Berlin |
| 52.37022 | Amsterdam |
| 51.50735 | London |
| 51.50735 | London |
Lillehammer (Norway) is the furthest north?
SELECT DISTINCT City
FROM host_city_locations
| City |
|---|
| Athina |
| Paris |
| St. Louis |
| London |
| Stockholm |
| Antwerpen |
| Chamonix |
| Amsterdam |
| Sankt Moritz |
| Los Angeles |