library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(odbc)
library(DBI)
library(RSQLite)
host_city_locations <- read_csv('host_city_locations.csv')
## Rows: 52 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Season, City
## dbl (3): Year, lon, lat
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
athlete_events <- read_csv('athlete_events.csv')
## Rows: 271116 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Name, Sex, Team, NOC, Games, Season, City, Sport, Event, Medal
## dbl  (5): ID, Age, Height, Weight, Year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(athlete_events)
## # A tibble: 6 × 15
##      ID Name      Sex     Age Height Weight Team  NOC   Games  Year Season City 
##   <dbl> <chr>     <chr> <dbl>  <dbl>  <dbl> <chr> <chr> <chr> <dbl> <chr>  <chr>
## 1     1 A Dijiang M        24    180     80 China CHN   1992…  1992 Summer Barc…
## 2     2 A Lamusi  M        23    170     60 China CHN   2012…  2012 Summer Lond…
## 3     3 Gunnar N… M        24     NA     NA Denm… DEN   1920…  1920 Summer Antw…
## 4     4 Edgar Li… M        34     NA     NA Denm… DEN   1900…  1900 Summer Paris
## 5     5 Christin… F        21    185     82 Neth… NED   1988…  1988 Winter Calg…
## 6     5 Christin… F        21    185     82 Neth… NED   1988…  1988 Winter Calg…
## # ℹ 3 more variables: Sport <chr>, Event <chr>, Medal <chr>
head(host_city_locations)
## # A tibble: 6 × 5
##    Year Season City          lon   lat
##   <dbl> <chr>  <chr>       <dbl> <dbl>
## 1  1896 Summer Athina     23.7    38.0
## 2  1900 Summer Paris       2.35   48.9
## 3  1904 Summer St. Louis -90.2    38.6
## 4  1906 Summer Athina     23.7    38.0
## 5  1908 Summer London     -0.128  51.5
## 6  1912 Summer Stockholm  18.1    59.3

Part A: Create the database:

Create a SQLite database called olympics.db. Inside this database you must create two tables, one for each dataset. The table created from athlete_events.csv must be named athlete_events, and the table created from host_city_locations.csv must be named host_city_locations. Both tables must be stored inside the same database file olympics.db. (We did something similar in the R activity)

olympics.db <- dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(olympics.db, athlete_events)
copy_to(olympics.db, host_city_locations)

Part B: SQL queries

Write SQL queries to answer the questions.

  1. How many records are in athlete_events?
SELECT count(*) 
FROM athlete_events
1 records
count(*)
271116

There are a total of 271116 records in athlete_events

  1. How many different athletes competed in Olympics held in London?

SELECT COUNT(DISTINCT athlete_events.Name) as athletes_in_londonnn
FROM athlete_events
JOIN host_city_locations
ON athlete_events.Year = host_city_locations.Year
WHERE host_city_locations.City = "London"
1 records
athletes_in_londonnn
17590

I cannot believe that first line worked. Or maybe it didn’t idk

There were a total of 17590 different athletes that competed in Olympics held in London.

  1. Which Olympics host city is furthest north?
SELECT lat, city
FROM host_city_locations
ORDER BY lat desc
Displaying records 1 - 10
lat City
61.11527 Lillehammer
60.16986 Helsinki
59.91387 Oslo
59.32932 Stockholm
59.32932 Stockholm
55.75583 Moskva
52.52001 Berlin
52.37022 Amsterdam
51.50735 London
51.50735 London

Lillehammer (Norway) is the furthest north?

  1. What is the list of host cities (with no duplicates)?
SELECT DISTINCT City
FROM host_city_locations
Displaying records 1 - 10
City
Athina
Paris
St. Louis
London
Stockholm
Antwerpen
Chamonix
Amsterdam
Sankt Moritz
Los Angeles