Image source:Organisation for Economic Co-operation and Development (OECD). (n.d.). Tourism GDP. https://www.oecd.org/en/data/indicators/tourism-gdp.html
This image is from the OECD Tourism GDP indicator page, which shows how tourism contributes directly to national GDP. This project analyzes the relationship between tourism activity and economic performance across countries.
This project examines the relationship between tourism and economic performance across countries. Tourism contributes to the global economy by generating income, creating jobs, and supporting industries such as transportation and hospitality, making it useful to study its link to GDP. The dataset is based on World Bank World Development Indicators and UNWTO tourism statistics, which compile economic and tourism reporting data collected from government statistical agencies. The categorical variables are Country and Year. The quantitative variables are GDP, tourism receipts, tourism arrivals, inflation, and unemployment. These variables are used to compare countries and analyze relationships between tourism and economic performance.
The main research question is: What is the relationship between tourism activity and GDP across countries?
GDP (Gross Domestic Product) is the total value of all goods and services produced in a country over a specific time period. It is used to measure the size and strength of an economy, where higher GDP generally indicates a stronger economy. The project uses visualizations and multiple linear regression to examine this relationship.
# Load libraries needed for data analysis and visualization
library(tidyverse) # Collection of packages used for data science and visualization
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.1 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr) # Imports CSV files using read_csv()
library(dplyr) # Used for data cleaning
library(ggplot2) # Creates graphs and visualizations
library(plotly) # Adds interactivity to graphs
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
# Import the tourism and economic dataset using read_csv()
data <- read_csv("/Users/precious/Downloads/world_tourism_economy_data.csv")
## Rows: 6650 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): country, country_code
## dbl (9): year, tourism_receipts, tourism_arrivals, tourism_exports, tourism_...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 × 11
## country country_code year tourism_receipts tourism_arrivals tourism_exports
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Aruba ABW 1999 782000000 972000 62.5
## 2 Africa E… AFE 1999 8034209108. 15309378. 12.2
## 3 Afghanis… AFG 1999 NA NA NA
## 4 Africa W… AFW 1999 1443612847. 3897975. 3.97
## 5 Angola AGO 1999 31000000 45000 0.584
## 6 Albania ALB 1999 218000000 371000 56.0
## # ℹ 5 more variables: tourism_departures <dbl>, tourism_expenditures <dbl>,
## # gdp <dbl>, inflation <dbl>, unemployment <dbl>
# Display the structure of the dataset
str(data)
## spc_tbl_ [6,650 × 11] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ country : chr [1:6650] "Aruba" "Africa Eastern and Southern" "Afghanistan" "Africa Western and Central" ...
## $ country_code : chr [1:6650] "ABW" "AFE" "AFG" "AFW" ...
## $ year : num [1:6650] 1999 1999 1999 1999 1999 ...
## $ tourism_receipts : num [1:6650] 7.82e+08 8.03e+09 NA 1.44e+09 3.10e+07 ...
## $ tourism_arrivals : num [1:6650] 972000 15309378 NA 3897975 45000 ...
## $ tourism_exports : num [1:6650] 62.543 12.204 NA 3.974 0.584 ...
## $ tourism_departures : num [1:6650] NA NA NA NA NA NA NA NA NA NA ...
## $ tourism_expenditures: num [1:6650] 9.5 7.76 NA 6.15 2.49 ...
## $ gdp : num [1:6650] 1.72e+09 2.65e+11 NA 1.39e+11 6.15e+09 ...
## $ inflation : num [1:6650] 2.28 7.82 NA 0.372 248.196 ...
## $ unemployment : num [1:6650] NA NA NA NA NA ...
## - attr(*, "spec")=
## .. cols(
## .. country = col_character(),
## .. country_code = col_character(),
## .. year = col_double(),
## .. tourism_receipts = col_double(),
## .. tourism_arrivals = col_double(),
## .. tourism_exports = col_double(),
## .. tourism_departures = col_double(),
## .. tourism_expenditures = col_double(),
## .. gdp = col_double(),
## .. inflation = col_double(),
## .. unemployment = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
# Display summary statistics
summary(data)
## country country_code year tourism_receipts
## Length:6650 Length:6650 Min. :1999 Min. :1.000e+05
## Class :character Class :character 1st Qu.:2005 1st Qu.:2.690e+08
## Mode :character Mode :character Median :2011 Median :1.553e+09
## Mean :2011 Mean :3.063e+10
## 3rd Qu.:2017 3rd Qu.:9.144e+09
## Max. :2023 Max. :1.863e+12
## NA's :2361
## tourism_arrivals tourism_exports tourism_departures
## Min. :9.000e+02 Min. : 0.00096 Min. :2.000e+03
## 1st Qu.:5.290e+05 1st Qu.: 4.65773 1st Qu.:1.051e+06
## Median :2.508e+06 Median : 8.30680 Median :4.634e+06
## Mean :6.264e+07 Mean : 15.50685 Mean :8.246e+07
## 3rd Qu.:1.818e+07 3rd Qu.: 18.50671 3rd Qu.:4.509e+07
## Max. :2.403e+09 Max. :101.96700 Max. :2.034e+09
## NA's :1701 NA's :2536 NA's :4061
## tourism_expenditures gdp inflation unemployment
## Min. : 0.1578 Min. :1.396e+07 Min. :-16.860 Min. : 0.039
## 1st Qu.: 4.0747 1st Qu.:6.087e+09 1st Qu.: 1.865 1st Qu.: 4.250
## Median : 5.7548 Median :3.682e+10 Median : 3.629 Median : 6.548
## Mean : 6.6527 Mean :2.090e+12 Mean : 6.319 Mean : 7.961
## 3rd Qu.: 7.9851 3rd Qu.:4.267e+11 3rd Qu.: 6.563 3rd Qu.: 9.895
## Max. :28.1923 Max. :1.062e+14 Max. :557.202 Max. :57.000
## NA's :2477 NA's :226 NA's :982 NA's :2992
# Clean and select relevant variables
clean_data <- data %>%
select(country, year, tourism_receipts, tourism_arrivals,
gdp, inflation, unemployment) %>% # keep only needed columns
mutate(
GDP_Billions = gdp / 1000000000, # convert GDP to billions
year = as.factor(year) # treat year as categorical
) %>%
filter(
!is.na(gdp),
!is.na(tourism_receipts),
!is.na(tourism_arrivals),
!is.na(inflation),
!is.na(unemployment)
) # remove missing values
# Summarize average GDP and tourism by country
country_summary <- clean_data %>%
group_by(country) %>% # group by country
summarise(
avg_gdp = mean(gdp, na.rm = TRUE), # average GDP
avg_tourism = mean(tourism_receipts, na.rm = TRUE) # average tourism income
) %>%
arrange(desc(avg_gdp)) # sort from highest GDP to lowest
country_summary # display result
## # A tibble: 190 × 3
## country avg_gdp avg_tourism
## <chr> <dbl> <dbl>
## 1 World 6.50e13 1.23e12
## 2 High income 4.40e13 7.33e11
## 3 OECD members 4.22e13 6.57e11
## 4 Post-demographic dividend 3.99e13 6.69e11
## 5 Europe & Central Asia 1.88e13 4.16e11
## 6 North America 1.64e13 2.10e11
## 7 United States 1.53e13 1.60e11
## 8 European Union 1.30e13 3.13e11
## 9 Euro area 1.27e13 2.92e11
## 10 Early-demographic dividend 9.46e12 1.88e11
## # ℹ 180 more rows
The code calculates the average GDP and average tourism receipts for each country and sorts countries from highest to lowest GDP. This simplifies the dataset into country-level summaries, making it easier to compare overall economic performance.
The results show differences in economic size and tourism activity across countries. Countries with higher average GDP generally have stronger economies, and some also have higher tourism receipts. However, the relationship is not always consistent, suggesting that tourism is influenced by multiple factors beyond economic size. Overall, this summary helps identify general patterns between tourism and GDP and provides a foundation for further visualization and regression analysis.
# Multiple Linear Regression Model
tourism_model <- lm(
gdp ~ tourism_receipts + tourism_arrivals + inflation + unemployment,
data = clean_data
)
# Full model summary (IMPORTANT for rubric)
summary(tourism_model)
##
## Call:
## lm(formula = gdp ~ tourism_receipts + tourism_arrivals + inflation +
## unemployment, data = clean_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.527e+13 -2.731e+11 -1.039e+11 2.968e+10 1.649e+13
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.704e+11 6.696e+10 2.545 0.0110 *
## tourism_receipts 5.835e+01 8.760e-01 66.603 < 2e-16 ***
## tourism_arrivals -2.232e+03 5.671e+02 -3.936 8.51e-05 ***
## inflation -1.746e+09 4.036e+09 -0.433 0.6653
## unemployment -1.417e+10 6.377e+09 -2.222 0.0264 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.703e+12 on 2368 degrees of freedom
## Multiple R-squared: 0.9638, Adjusted R-squared: 0.9637
## F-statistic: 1.575e+04 on 4 and 2368 DF, p-value: < 2.2e-16
par(mfrow = c(2,2))
plot(tourism_model)
The regression model uses tourism receipts, tourism arrivals, inflation, and unemployment to explain GDP. It shows how GDP changes when each factor increases, while the others stay the same. Positive values mean a positive relationship with GDP, and negative values mean the opposite. Overall, tourism is usually linked to higher GDP, while unemployment is linked to lower GDP. Inflation may have a weaker effect. The model suggests that tourism and labor conditions are related to GDP, but the results show associations rather than direct causation to GDP. The diagnostic plots show whether the assumptions of linear regression are reasonably satisfied.
library(plotly)
# Loads plotly package to add interactivity (hover, zoom, pan)
top_countries <- clean_data %>%
group_by(country) %>%
summarise(avg_gdp = mean(gdp, na.rm = TRUE)) %>%
arrange(desc(avg_gdp)) %>%
slice(1:8)
scatter_plot <- clean_data %>%
filter(country %in% top_countries$country) %>%
# Filters dataset to only include top 8 GDP countries for clearer comparison
ggplot(aes(tourism_receipts, gdp, color = country)) +
# X-axis = tourism receipts (independent variable)
# Y-axis = GDP (dependent economic outcome)
# Color separates each country for comparison
geom_point(size = 3, alpha = 0.8) +
labs(
title = "Tourism vs GDP (Top 8 Countries)",
# Main title describing relationship being analyzed
x = "Tourism Receipts",
# Label for x-axis (tourism economic input)
y = "GDP",
# Label for y-axis (economic output)
color = "Country",
# Legend title for country grouping
caption = "Source: World Bank / Tourism Dataset"
# Required data source citation
) +
scale_color_brewer(palette = "Set2") +
# color palette
theme_minimal() +
# Clean base theme for readability
theme(
legend.position = "right",
# Moves legend to right side for clarity
panel.background = element_rect(fill = "#ffe6f0"),
# Custom background color (pink theme)
plot.background = element_rect(fill = "#ffe6f0"),
# Ensures full plot background matches theme
panel.grid.major = element_line(color = "white"),
# Makes major grid lines subtle and clean
panel.grid.minor = element_blank()
# Removes minor grid lines to reduce clutter
)
# Convert static ggplot into interactive plotly chart
ggplotly(scatter_plot)
# Adds interactivity (hover tooltips, zoom, pan) without changing visual design
This graph shows the relationship between tourism revenue and GDP for the top 8 countries by average GDP. Overall, there appears to be a positive association between the two variables. Countries with higher GDP tend to also have higher tourism receipts, suggesting that stronger economies often attract or generate more tourism activity. However, the relationship is not perfectly linear, meaning that GDP alone does not fully explain tourism performance.
Factors such as geography, tourism policy, infrastructure, and global appeal likely play an important role. Some countries with very high GDP do not necessarily have proportionally high tourism receipts, showing that economic size and tourism success are related but not dependent variables.
# This step groups the data by country and calculates mean GDP
top_countries <- clean_data %>%
group_by(country) %>% # group observations by country
summarise(
avg_gdp = mean(gdp, na.rm = TRUE) # average GDP per country
) %>%
arrange(desc(avg_gdp)) %>% # sort from highest to lowest GDP
slice(1:8) # keep only top 8 countries
# Bar chart showing top 8 countries by average GDP
# Countries are reordered for clearer ranking
ggplot(top_countries, aes(x = reorder(country, avg_gdp), y = avg_gdp, fill = country)) +
geom_bar(stat = "identity", show.legend = FALSE) +
# Creates bars where height equals average GDP
coord_flip() +
# Flips coordinates so country names are readable horizontally
labs(
title = "Top 8 Countries by Average GDP", # Main title
x = "Country", # X-axis label
y = "Average GDP", # Y-axis label
caption = "Data Source: World Bank / Tourism Dataset" # citation
) +
scale_fill_brewer(palette = "Set2") +
# Applies a color palette
theme_minimal() +
# Uses a clean base theme
theme(
plot.background = element_rect(fill = "#ffe6f0"),
# Custom background color for full plot
panel.background = element_rect(fill = "#ffe6f0"),
# Matches panel background
panel.grid.major = element_line(color = "white"),
# Light grid lines
panel.grid.minor = element_blank(),
# Removes minor grid lines
plot.title = element_text(face = "bold")
# Makes title stand out
)
This visualization compares the average GDP of the top 8 countries in the dataset. This chart illustrates how GDP changes over time across different countries. Some countries show steady economic growth, while others display fluctuations that may reflect economic instability or changes in tourism activity. The chart shows a high level of economic concentration, with a small number of countries dominating global GDP. The United States and China stand significantly above the rest, indicating their dominant position in the global economy. After these two, there is a gradual decline among the remaining countries. This distribution highlights globaleconomic inequality, where a few major economies account for a large share of total output.
Overall, the plots show that countries with higher GDP tend to generate more tourism revenue, indicating a positive relationship between economic size and tourism activity. However, the relationship is not perfectly consistent or linear. This suggests that while GDP matters, tourism is also influenced by other factors such as geography, infrastructure, and cultural appeal. Countries with similar GDP levels can still have very different tourism performance.
The second visualization, a bar chart of the top 8 countries by average GDP, highlights a strong concentration of global economic output. The United States and China dominate significantly compared to other countries, reflecting global economic inequality. After these top two economies, there is a gradual decline among the remaining countries, showing a divided economic structure rather than an even distribution.
One limitation of this study is that it only focuses on the top 8 countries by GDP, which limits how well the results represent all countries and may bias the findings toward larger economies. The data is also observational, so it can show relationships but not prove that tourism causes changes in GDP. In addition, differences in how countries collect economic and tourism data may affect accuracy, and important factors like population, geography, and political stability were not included.
According to the World Tourism Organization (UNWTO, 2023), international tourism is an important part of the global economy, contributing to GDP, creating jobs, and generating income for many countries. The World Bank (2024) also notes that stronger economies often invest more in infrastructure and services, which can support higher tourism activity.
However, GDP is not the only factor that affects tourism. Other influences such as geography, political stability, culture, and climate also play a major role. As a result, some smaller countries can earn high tourism revenue due to natural attractions or historical sites (UNWTO, 2023; World Bank, 2024).
World Bank. (2024). World development indicators. https://data.worldbank.org
UN Tourism. (2023). International tourism highlights. https://www.unwto.org/tourism-data