HW 6: Olympics SQL Database

Author

Lydia Baick

Part A:

Importing Libraries

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(odbc)
Warning: package 'odbc' was built under R version 4.4.3
library(DBI) 
library(RSQLite)
Warning: package 'RSQLite' was built under R version 4.4.3

Loading data

setwd("C:/Users/lydia/OneDrive/Desktop/Data205")
host_city_locations <- read.csv("host_city_locations.csv")
athlete_events <- read.csv("athlete_events.csv")

Connecting to a SQLite database and creating two tables

olympics.db <- dbConnect(RSQLite::SQLite(),"olympics.db")
dbWriteTable(olympics.db, "host_city_locations", host_city_locations, overwrite = TRUE)
dbWriteTable(olympics.db, "athlete_events",athlete_events, overwrite = TRUE)

Part B:

How many records are in athlete_events?

SELECT COUNT(*) AS "Records"
FROM athlete_events;
1 records
Records
271116

How many different athletes competed in Olympics held in London?

SELECT COUNT(DISTINCT Name) AS "Number of Athletes"
FROM athlete_events
WHERE City = "London";
1 records
Number of Athletes
16924

Which Olympics host city is furthest north?

Checking lat data type

class(host_city_locations$lat)
[1] "numeric"
SELECT City AS "City furthest North"
FROM host_city_locations
ORDER BY lat DESC
LIMIT 1;
1 records
City furthest North
Lillehammer

What is the list of host cities (with no duplicates)?

SELECT DISTINCT City AS "Host Cities"
FROM host_city_locations;
Displaying records 1 - 10
Host Cities
Athina
Paris
St. Louis
London
Stockholm
Antwerpen
Chamonix
Amsterdam
Sankt Moritz
Los Angeles

Saving the output of Question 4

host_cities <- dbGetQuery(olympics.db, '
SELECT DISTINCT City
FROM host_city_locations;
')

Converting output of Question 4 query as a CSV file

write.csv(host_cities, file = "host_cities_LydiaBaick.csv", row.names = FALSE)
getwd()
[1] "C:/Users/lydia/OneDrive/Desktop/Data205"

Disconnecting

dbDisconnect(olympics.db)