In this analysis, we use the sqldf package in R to query
and manipulate Olympic medals data. The sqldf package
allows us to run SQL queries directly on R data frames, which is useful
for those familiar with SQL but new to R. This dataset contains
information about Olympic medalists by gender, country, and event.
We’ll work with multiple datasets: medallists,
female_total_medals, male_total_medals,
mixed_total_medals, and open_total_medals.
# Load necessary libraries
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
# Read in the datasets
medallists <- read.csv("medallists.csv")
female_total_medals <- read.csv("female_total.csv")
male_total_medals <- read.csv("total_medal_by_men.csv")
mixed_total_medals <- read.csv("mixed_total.csv")
open_total_medals <- read.csv("open__total.csv")
In this query, we use SQL to filter the medallists dataset and select only the female athletes who won a gold medal.
# Query to filter female athletes who won gold medals
gold_female_medalists <- sqldf("SELECT * FROM medallists WHERE gender = 'Female' AND medal_type = 'Gold Medal'")
print(head(gold_female_medalists))
## medal_date medal_type medal_code name gender country_code
## 1 2024-07-27 Gold Medal 1 BROWN Grace Female AUS
## 2 2024-07-27 Gold Medal 1 KONG Man Wai Vivian Female HKG
## 3 2024-07-27 Gold Medal 1 TSUNODA Natsumi Female JPN
## 4 2024-07-27 Gold Medal 1 TITMUS Ariarne Female AUS
## 5 2024-07-28 Gold Medal 1 FOX Jessica Female AUS
## 6 2024-07-28 Gold Medal 1 FERRAND PREVOT Pauline Female FRA
## country country_long nationality team team_gender
## 1 Australia Australia Australia
## 2 Hong Kong, China Hong Kong, China Hong Kong, China
## 3 Japan Japan Japan
## 4 Australia Australia Australia
## 5 Australia Australia Australia
## 6 France France France
## discipline event event_type
## 1 Cycling Road Women's Individual Time Trial ATH
## 2 Fencing Women's Épée Individual HATH
## 3 Judo Women -48 kg HATH
## 4 Swimming Women's 400m Freestyle ATH
## 5 Canoe Slalom Women's Kayak Single ATH
## 6 Cycling Mountain Bike Women's Cross-country ATH
## url_event
## 1 /en/paris-2024/results/cycling-road/women-s-individual-time-trial/fnl-000100--
## 2 /en/paris-2024/results/fencing/women-s-epee-individual/fnl-000100--
## 3 /en/paris-2024/results/judo/women--48-kg/fnl-000100--
## 4 /en/paris-2024/results/swimming/women-s-400m-freestyle/fnl-000100--
## 5 /en/paris-2024/results/canoe-slalom/women-s-kayak-single/fnl-000100--
## 6 /en/paris-2024/results/cycling-mountain-bike/women-s-cross-country/fnl-000100--
## birth_date code_athlete code_team
## 1 1992-07-07 1940173
## 2 1994-02-08 1963262
## 3 1992-08-06 1896735
## 4 2000-09-07 1946150
## 5 1994-06-11 1940205
## 6 1992-02-10 1895672
Here, we calculate the total number of medals won by men from each country by aggregating the male_total_medals dataset.
# Aggregating total medals by country for male athletes
male_medals_by_country <- sqldf("SELECT country, SUM(total) as total_medals FROM male_total_medals GROUP BY country")
male_medals_by_country
## country total_medals
## 1 0
## 2 Albania 2
## 3 Algeria 1
## 4 Armenia 4
## 5 Austira 2
## 6 Australia 20
## 7 Azerbaijan 7
## 8 Bahrain 2
## 9 Botswana 2
## 10 Brazil 7
## 11 Bulgaria 5
## 12 Cabo Verde 1
## 13 Canada 9
## 14 Chile 1
## 15 Chinese Taipei 3
## 16 Colombia 2
## 17 Cote dIvoire 1
## 18 Croatia 3
## 19 Czechina 3
## 20 Democratic Peopl's Republic of Korea 1
## 21 Denmark 5
## 22 Dominican Republic 2
## 23 Ecuador 1
## 24 Egypt 2
## 25 Ethiopia 2
## 26 Fiji 1
## 27 France 38
## 28 Georgia 7
## 29 Germany 11
## 30 Great Britain 30
## 31 Greece 7
## 32 Grenada 2
## 33 Guatemala 1
## 34 Hong Kong, China 1
## 35 Hungary 11
## 36 India 4
## 37 Indonesia 2
## 38 Ireland 5
## 39 Islamic Republic of Iran 10
## 40 Italy 23
## 41 Jamaica 5
## 42 Japan 23
## 43 Jordan 1
## 44 Kazakhstan 5
## 45 Kenya 4
## 46 Kyrgyzstan 4
## 47 Lithuania 2
## 48 Malaysia 2
## 49 Mexico 3
## 50 Morocco 2
## 51 Netherlands 11
## 52 New Zeland 5
## 53 Norway 4
## 54 Pakistan 1
## 55 People's Republic of China 34
## 56 Peru 1
## 57 Philippines 2
## 58 Poland 2
## 59 Portugal 3
## 60 Puerto Rico 1
## 61 Qatar 1
## 62 Republic of Moldova 3
## 63 Romania 3
## 64 Serbia 3
## 65 Singapore 1
## 66 Slovakia 1
## 67 Slovenia 1
## 68 South Africa 3
## 69 Spain 12
## 70 Sweden 5
## 71 Switzerland 2
## 72 Tajikistan 3
## 73 Thailand 4
## 74 Tunisia 3
## 75 Turkiye 2
## 76 Uganda 1
## 77 Ukraine 6
## 78 United States of America 52
## 79 Uzbekistan 11
## 80 Zambia 1
In this example, we join the female_total_medals and male_total_medals datasets to compare the total medals won by men and women from the same countries.
# Join male and female medal data on country
medal_comparison <- sqldf("
SELECT m.country, f.total as female_total, m.total as male_total
FROM female_total_medals f
JOIN male_total_medals m ON f.country = m.country
")
medal_comparison
## country female_total male_total
## 1 United States of America 67 52
## 2 People's Republic of China 50 34
## 3 Japan 18 23
## 4 Australia 30 20
## 5 France 23 38
## 6 Netherlands 21 11
## 7 Great Britain 28 30
## 8 Germany 15 11
## 9 Canada 17 9
## 10 Italy 15 23
## 11 Uzbekistan 2 11
## 12 Hungary 8 11
## 13 Spain 4 12
## 14 Sweden 5 5
## 15 Kenya 7 4
## 16 Norway 4 4
## 17 Ireland 2 5
## 18 Brazil 12 7
## 19 Islamic Republic of Iran 2 10
## 20 Ukraine 6 6
## 21 Romania 6 3
## 22 Bulgaria 2 5
## 23 Serbia 1 3
## 24 Denmark 3 5
## 25 Croatia 4 3
## 26 Bahrain 2 2
## 27 Slovenia 2 1
## 28 Chinese Taipei 4 3
## 29 Hong Kong, China 3 1
## 30 Philippines 2 2
## 31 Algeria 2 1
## 32 Indonesia 1 2
## 33 Poland 8 2
## 34 Kazakhstan 1 5
## 35 Jamaica 1 5
## 36 South Africa 3 3
## 37 Thailand 3 4
## 38 Ethiopia 2 2
## 39 Switzerland 5 2
## 40 Ecuador 3 1
## 41 Portugal 1 3
## 42 Greece 1 7
## 43 Egypt 1 2
## 44 Chile 1 1
## 45 Uganda 1 1
## 46 Dominican Republic 1 2
## 47 Guatemala 1 1
## 48 Turkiye 5 2
## 49 Mexico 2 3
## 50 Colombia 2 2
## 51 Kyrgyzstan 2 4
## 52 India 1 4
## 53 Republic of Moldova 1 3
## 54 Puerto Rico 1 1
This query helps in comparing the total number of medals won by male and female athletes from the same countries.
This query selects countries that have won gold medals in mixed events by querying the mixed_total_medals dataset.
# Query to find countries with gold medals in mixed events
mixed_gold_medals <- sqldf("SELECT country FROM mixed_total_medals WHERE gold > 0")
mixed_gold_medals
## country
## 1 Peopl's Republic of China
## 2 Italy
## 3 United States of America
## 4 Republic of Korea
## 5 Germany
## 6 Austria
## 7 Czechia
## 8 Spain
## 9 France
## 10 Netherlands
## 11 Serbia
Lastly, we compare the number of gold medals won by countries in open and mixed events by querying both the open_total_medals and mixed_total_medals datasets.
# Query to compare medals between open and mixed events
comparison_open_mixed <- sqldf("
SELECT o.country, o.gold as open_gold, m.gold as mixed_gold
FROM open_total_medals o
LEFT JOIN mixed_total_medals m ON o.country = m.country
")
comparison_open_mixed
## country open_gold mixed_gold
## 1 Germany 4 1
## 2 Great Britain 2 0
## 3 People's Republic of China 1 NA
## 4 United States of America 0 1
## 5 France 0 1
## 6 Australia 0 0
## 7 Denmark 0 NA
## 8 Switzerland 0 NA
## 9 Spain 0 1
## 10 Japan 0 0
## 11 Netherlands 0 1
LEFT JOIN: This SQL statement performs a left join between the open_total_medals and mixed_total_medals datasets. All countries in the open_total_medals dataset will be included, and data from the mixed_total_medals dataset is included where available.
o.gold as open_gold, m.gold as mixed_gold: We select and rename the gold medal counts from both datasets for easy comparison.
This query allows for a side-by-side comparison of gold medals won in open and mixed events.
The sqldf package allows for easy SQL querying in R, making it an excellent tool for SQL users transitioning to R. We’ve shown examples of how to filter, aggregate, and join data across multiple datasets, providing a flexible and familiar way to analyze Olympic medal data. By combining the power of SQL with R, you can gain insights while gradually exploring R’s native functions.