Introduction

In this analysis, we use the sqldf package in R to query and manipulate Olympic medals data. The sqldf package allows us to run SQL queries directly on R data frames, which is useful for those familiar with SQL but new to R. This dataset contains information about Olympic medalists by gender, country, and event.

We’ll work with multiple datasets: medallists, female_total_medals, male_total_medals, mixed_total_medals, and open_total_medals.

Load Libraries and Data

# Load necessary libraries
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
# Read in the datasets
medallists <- read.csv("medallists.csv")
female_total_medals <- read.csv("female_total.csv")
male_total_medals <- read.csv("total_medal_by_men.csv")
mixed_total_medals <- read.csv("mixed_total.csv")
open_total_medals <- read.csv("open__total.csv")

Example 1: Selecting Female Medalists Who Won Gold

In this query, we use SQL to filter the medallists dataset and select only the female athletes who won a gold medal.

# Query to filter female athletes who won gold medals
gold_female_medalists <- sqldf("SELECT * FROM medallists WHERE gender = 'Female' AND medal_type = 'Gold Medal'")
print(head(gold_female_medalists))
##   medal_date medal_type medal_code                   name gender country_code
## 1 2024-07-27 Gold Medal          1            BROWN Grace Female          AUS
## 2 2024-07-27 Gold Medal          1    KONG Man Wai Vivian Female          HKG
## 3 2024-07-27 Gold Medal          1        TSUNODA Natsumi Female          JPN
## 4 2024-07-27 Gold Medal          1         TITMUS Ariarne Female          AUS
## 5 2024-07-28 Gold Medal          1            FOX Jessica Female          AUS
## 6 2024-07-28 Gold Medal          1 FERRAND PREVOT Pauline Female          FRA
##            country     country_long      nationality team team_gender
## 1        Australia        Australia        Australia                 
## 2 Hong Kong, China Hong Kong, China Hong Kong, China                 
## 3            Japan            Japan            Japan                 
## 4        Australia        Australia        Australia                 
## 5        Australia        Australia        Australia                 
## 6           France           France           France                 
##              discipline                         event event_type
## 1          Cycling Road Women's Individual Time Trial        ATH
## 2               Fencing       Women's Épée Individual       HATH
## 3                  Judo                  Women -48 kg       HATH
## 4              Swimming        Women's 400m Freestyle        ATH
## 5          Canoe Slalom          Women's Kayak Single        ATH
## 6 Cycling Mountain Bike         Women's Cross-country        ATH
##                                                                         url_event
## 1  /en/paris-2024/results/cycling-road/women-s-individual-time-trial/fnl-000100--
## 2             /en/paris-2024/results/fencing/women-s-epee-individual/fnl-000100--
## 3                           /en/paris-2024/results/judo/women--48-kg/fnl-000100--
## 4             /en/paris-2024/results/swimming/women-s-400m-freestyle/fnl-000100--
## 5           /en/paris-2024/results/canoe-slalom/women-s-kayak-single/fnl-000100--
## 6 /en/paris-2024/results/cycling-mountain-bike/women-s-cross-country/fnl-000100--
##   birth_date code_athlete code_team
## 1 1992-07-07      1940173          
## 2 1994-02-08      1963262          
## 3 1992-08-06      1896735          
## 4 2000-09-07      1946150          
## 5 1994-06-11      1940205          
## 6 1992-02-10      1895672

Explanation :

  • SELECT *: This SQL statement selects all columns from the dataset.
  • WHERE gender = ‘Female’ AND medal_type = ‘Gold Medal’: We filter the data to include only rows where the gender is “Female” and the medal type is “Gold Medal”. This query is useful for quickly identifying the gold-medal-winning female athletes.

Example 2: Aggregating Medals by Country for Male Athletes

Here, we calculate the total number of medals won by men from each country by aggregating the male_total_medals dataset.

# Aggregating total medals by country for male athletes
male_medals_by_country <- sqldf("SELECT country, SUM(total) as total_medals FROM male_total_medals GROUP BY country")
male_medals_by_country
##                                 country total_medals
## 1                                                  0
## 2                               Albania            2
## 3                               Algeria            1
## 4                               Armenia            4
## 5                               Austira            2
## 6                             Australia           20
## 7                            Azerbaijan            7
## 8                               Bahrain            2
## 9                              Botswana            2
## 10                               Brazil            7
## 11                             Bulgaria            5
## 12                           Cabo Verde            1
## 13                               Canada            9
## 14                                Chile            1
## 15                       Chinese Taipei            3
## 16                             Colombia            2
## 17                         Cote dIvoire            1
## 18                              Croatia            3
## 19                             Czechina            3
## 20 Democratic Peopl's Republic of Korea            1
## 21                              Denmark            5
## 22                   Dominican Republic            2
## 23                              Ecuador            1
## 24                                Egypt            2
## 25                             Ethiopia            2
## 26                                 Fiji            1
## 27                               France           38
## 28                              Georgia            7
## 29                              Germany           11
## 30                        Great Britain           30
## 31                               Greece            7
## 32                              Grenada            2
## 33                            Guatemala            1
## 34                     Hong Kong, China            1
## 35                              Hungary           11
## 36                                India            4
## 37                            Indonesia            2
## 38                              Ireland            5
## 39             Islamic Republic of Iran           10
## 40                                Italy           23
## 41                              Jamaica            5
## 42                                Japan           23
## 43                               Jordan            1
## 44                           Kazakhstan            5
## 45                                Kenya            4
## 46                           Kyrgyzstan            4
## 47                            Lithuania            2
## 48                             Malaysia            2
## 49                               Mexico            3
## 50                              Morocco            2
## 51                          Netherlands           11
## 52                           New Zeland            5
## 53                               Norway            4
## 54                             Pakistan            1
## 55           People's Republic of China           34
## 56                                 Peru            1
## 57                          Philippines            2
## 58                               Poland            2
## 59                             Portugal            3
## 60                          Puerto Rico            1
## 61                                Qatar            1
## 62                  Republic of Moldova            3
## 63                              Romania            3
## 64                               Serbia            3
## 65                            Singapore            1
## 66                             Slovakia            1
## 67                             Slovenia            1
## 68                         South Africa            3
## 69                                Spain           12
## 70                               Sweden            5
## 71                          Switzerland            2
## 72                           Tajikistan            3
## 73                             Thailand            4
## 74                              Tunisia            3
## 75                              Turkiye            2
## 76                               Uganda            1
## 77                              Ukraine            6
## 78             United States of America           52
## 79                           Uzbekistan           11
## 80                               Zambia            1

Explanation :

  • SUM(total): This function calculates the total number of medals won by male athletes per country.
  • GROUP BY country: The data is grouped by the country column, which ensures that the sum is calculated for each country. This query is useful for comparing the overall medal performance of male athletes by country.

Example 3: Comparing Medal Counts Between Male and Female Athletes

In this example, we join the female_total_medals and male_total_medals datasets to compare the total medals won by men and women from the same countries.

# Join male and female medal data on country
medal_comparison <- sqldf("
  SELECT m.country, f.total as female_total, m.total as male_total
  FROM female_total_medals f
  JOIN male_total_medals m ON f.country = m.country
")
medal_comparison
##                       country female_total male_total
## 1    United States of America           67         52
## 2  People's Republic of China           50         34
## 3                       Japan           18         23
## 4                   Australia           30         20
## 5                      France           23         38
## 6                 Netherlands           21         11
## 7               Great Britain           28         30
## 8                     Germany           15         11
## 9                      Canada           17          9
## 10                      Italy           15         23
## 11                 Uzbekistan            2         11
## 12                    Hungary            8         11
## 13                      Spain            4         12
## 14                     Sweden            5          5
## 15                      Kenya            7          4
## 16                     Norway            4          4
## 17                    Ireland            2          5
## 18                     Brazil           12          7
## 19   Islamic Republic of Iran            2         10
## 20                    Ukraine            6          6
## 21                    Romania            6          3
## 22                   Bulgaria            2          5
## 23                     Serbia            1          3
## 24                    Denmark            3          5
## 25                    Croatia            4          3
## 26                    Bahrain            2          2
## 27                   Slovenia            2          1
## 28             Chinese Taipei            4          3
## 29           Hong Kong, China            3          1
## 30                Philippines            2          2
## 31                    Algeria            2          1
## 32                  Indonesia            1          2
## 33                     Poland            8          2
## 34                 Kazakhstan            1          5
## 35                    Jamaica            1          5
## 36               South Africa            3          3
## 37                   Thailand            3          4
## 38                   Ethiopia            2          2
## 39                Switzerland            5          2
## 40                    Ecuador            3          1
## 41                   Portugal            1          3
## 42                     Greece            1          7
## 43                      Egypt            1          2
## 44                      Chile            1          1
## 45                     Uganda            1          1
## 46         Dominican Republic            1          2
## 47                  Guatemala            1          1
## 48                    Turkiye            5          2
## 49                     Mexico            2          3
## 50                   Colombia            2          2
## 51                 Kyrgyzstan            2          4
## 52                      India            1          4
## 53        Republic of Moldova            1          3
## 54                Puerto Rico            1          1

Explanation :

  • JOIN female_total_medals f JOIN male_total_medals m ON f.country = m.country: This SQL statement performs an inner join between the two datasets, matching countries that appear in both datasets.
  • f.total as female_total, m.total as male_total: This selects the total medals for each gender, renaming them as female_total and male_total.

This query helps in comparing the total number of medals won by male and female athletes from the same countries.

Example 4: Finding Countries with Gold Medals in Mixed Events

This query selects countries that have won gold medals in mixed events by querying the mixed_total_medals dataset.

# Query to find countries with gold medals in mixed events
mixed_gold_medals <- sqldf("SELECT country FROM mixed_total_medals WHERE gold > 0")
mixed_gold_medals
##                      country
## 1  Peopl's Republic of China
## 2                      Italy
## 3   United States of America
## 4          Republic of Korea
## 5                    Germany
## 6                    Austria
## 7                    Czechia
## 8                      Spain
## 9                     France
## 10               Netherlands
## 11                    Serbia

Explanation :

  • WHERE gold > 0: This condition filters the dataset to include only countries that have won at least one gold medal in mixed events. This query is useful for identifying which countries have excelled in mixed-gender or mixed-category events.

Example 5: Comparing Medals in Open and Mixed Events

Lastly, we compare the number of gold medals won by countries in open and mixed events by querying both the open_total_medals and mixed_total_medals datasets.

# Query to compare medals between open and mixed events
comparison_open_mixed <- sqldf("
  SELECT o.country, o.gold as open_gold, m.gold as mixed_gold
  FROM open_total_medals o
  LEFT JOIN mixed_total_medals m ON o.country = m.country
")
comparison_open_mixed
##                       country open_gold mixed_gold
## 1                     Germany         4          1
## 2               Great Britain         2          0
## 3  People's Republic of China         1         NA
## 4    United States of America         0          1
## 5                      France         0          1
## 6                   Australia         0          0
## 7                     Denmark         0         NA
## 8                 Switzerland         0         NA
## 9                       Spain         0          1
## 10                      Japan         0          0
## 11                Netherlands         0          1

Explanation :

  • LEFT JOIN: This SQL statement performs a left join between the open_total_medals and mixed_total_medals datasets. All countries in the open_total_medals dataset will be included, and data from the mixed_total_medals dataset is included where available.

  • o.gold as open_gold, m.gold as mixed_gold: We select and rename the gold medal counts from both datasets for easy comparison.

This query allows for a side-by-side comparison of gold medals won in open and mixed events.

Conclusion

The sqldf package allows for easy SQL querying in R, making it an excellent tool for SQL users transitioning to R. We’ve shown examples of how to filter, aggregate, and join data across multiple datasets, providing a flexible and familiar way to analyze Olympic medal data. By combining the power of SQL with R, you can gain insights while gradually exploring R’s native functions.