Lily Chang Olympics SQL Database Assignment

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(odbc)
library(DBI)
library(RSQLite)

host_city_locations <- read_csv('host_city_locations.csv')

## Rows: 52 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Season, City
## dbl (3): Year, lon, lat
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

athlete_events <- read_csv('athlete_events.csv')

## Rows: 271116 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Name, Sex, Team, NOC, Games, Season, City, Sport, Event, Medal
## dbl  (5): ID, Age, Height, Weight, Year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(athlete_events)

## # A tibble: 6 × 15
##      ID Name      Sex     Age Height Weight Team  NOC   Games  Year Season City 
##   <dbl> <chr>     <chr> <dbl>  <dbl>  <dbl> <chr> <chr> <chr> <dbl> <chr>  <chr>
## 1     1 A Dijiang M        24    180     80 China CHN   1992…  1992 Summer Barc…
## 2     2 A Lamusi  M        23    170     60 China CHN   2012…  2012 Summer Lond…
## 3     3 Gunnar N… M        24     NA     NA Denm… DEN   1920…  1920 Summer Antw…
## 4     4 Edgar Li… M        34     NA     NA Denm… DEN   1900…  1900 Summer Paris
## 5     5 Christin… F        21    185     82 Neth… NED   1988…  1988 Winter Calg…
## 6     5 Christin… F        21    185     82 Neth… NED   1988…  1988 Winter Calg…
## # ℹ 3 more variables: Sport <chr>, Event <chr>, Medal <chr>

head(host_city_locations)

## # A tibble: 6 × 5
##    Year Season City          lon   lat
##   <dbl> <chr>  <chr>       <dbl> <dbl>
## 1  1896 Summer Athina     23.7    38.0
## 2  1900 Summer Paris       2.35   48.9
## 3  1904 Summer St. Louis -90.2    38.6
## 4  1906 Summer Athina     23.7    38.0
## 5  1908 Summer London     -0.128  51.5
## 6  1912 Summer Stockholm  18.1    59.3

Part A: Create the database:

Create a SQLite database called olympics.db. Inside this database you must create two tables, one for each dataset. The table created from athlete_events.csv must be named athlete_events, and the table created from host_city_locations.csv must be named host_city_locations. Both tables must be stored inside the same database file olympics.db. (We did something similar in the R activity)

olympics.db <- dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(olympics.db, athlete_events)
copy_to(olympics.db, host_city_locations)

Part B: SQL queries

Write SQL queries to answer the questions.

How many records are in athlete_events?

SELECT count(*) 
FROM athlete_events

1 records
count(*)
271116

There are a total of 271116 records in athlete_events

How many different athletes competed in Olympics held in London?


SELECT COUNT(DISTINCT athlete_events.Name) as athletes_in_londonnn
FROM athlete_events
JOIN host_city_locations
ON athlete_events.Year = host_city_locations.Year
WHERE host_city_locations.City = "London"

1 records
athletes_in_londonnn
17590

I cannot believe that first line worked. Or maybe it didn’t idk

There were a total of 17590 different athletes that competed in Olympics held in London.

Which Olympics host city is furthest north?

You might have to use CAST or CONVERT to be able to work with the data in the right way.
You can answer this question either by creating one query with a subquery, or by creating two different queries and executing them one after the other.

SELECT lat, city
FROM host_city_locations
ORDER BY lat desc

Displaying records 1 - 10
lat	City
61.11527	Lillehammer
60.16986	Helsinki
59.91387	Oslo
59.32932	Stockholm
59.32932	Stockholm
55.75583	Moskva
52.52001	Berlin
52.37022	Amsterdam
51.50735	London
51.50735	London

Lillehammer (Norway) is the furthest north?

What is the list of host cities (with no duplicates)?

SELECT DISTINCT City
FROM host_city_locations

Displaying records 1 - 10
City
Athina
Paris
St. Louis
London
Stockholm
Antwerpen
Chamonix
Amsterdam
Sankt Moritz
Los Angeles

Lily Chang Olympics SQL Database Assignment

Lily C

2026-03-11

Part A: Create the database:

Part B: SQL queries