Our initial motivation to work on this project was the fact that travel and tourism is one of the largest world economy sectors, supporting 292 million jobs and generating 10.2% of the global GDP (World Travel & Tourism Council - WTTC data). It is undoubtedly a key industry for many countries, having a significant contribution to the national GDP.
Countries like Seychelles (21 percent of GDP comes from tourism), Malta (14 percent), Mauritius (11 percent) and Barbados (11 percent) are especially reliant on foreign visitors. Cape Verde, Croatia and Cambodia are also dependent for more than 10 percent. “For many of these countries, if they didn’t have travel and tourism, they wouldn’t have the GDP or economies that they have,” said Rochelle Turner, director of research at WTTC (see the first article in the Related work part).
Famous touristic attractions across the world
Therefore, it is of great interest to see how the tourism performance evolved over the last decades, with the appearance of the Internet and social media. It is obvious that people are traveling more, and there are always various social, economic, cultural or geographical reasons that influence people’s choice in terms of travel destinations. But in this project we are analyzing how investments in travel and tourism impact the tourism performance. We consider it important to know if these investments were efficient over time, and if for some countries it is not the case, to try to find an explanation.
Our research is a quantitative analysis on the countries’ tourism performance and focuses on the impact of investments made in this industry. We would like to see if these investments are proven to be efficient over time and if there is a high correlation between the investments and the actual tourism performance.
More precisely, we are trying to find answers to the following questions:
The two large databases used up to this point are “Capital investment in Travel and Tourism” from the World Bank Data (GovData360.csv), and the InboundTourism.csv (which was initially converted from xls format to csv) from the United Nations World Tourism Data. We have chosen these databases, as they are known to be trustworthy and they provide the values we need for the majority of the world’s countries, so we can see global trends. The sources are as following:
Our first dataset comes from the World Bank, an international financial institution that provides leveraged loans to developing countries for investment projects.
Over the past 25 years, the World Bank has worked to measure financial investments made from 1995 to 2028 (predicted) in 177 countries around the world. The project was piloted by the World Travel & Tourism Council, a non-profit organization that aims to raise awareness of the importance of the Travel & Tourism sector as one of the world’s largest economic sectors.
The GovData360 dataset contains annual measures in the following categories:
Since our research focuses on government investments, we have chosen to keep this category for our data, focusing on government investments in tourism infrastructure and promotion, in nominal dollars.
Government spending indicates how much the government of a country is spending on tourism, per year. Note that it is not as a percentage of the GDP, but an absolute value in billions of US dollars and in nominal prices, so that countries can be later compared. This is one of the most important variables in our research, because we are focusing primarily on the impact of government spending in travel and tourism (suggested even by the title of our project).
In our explanatory data analysis, we benchmarked Government Spending against Capital Investments. Capital investment represents the total amount of money invested in the tourism sector, both by government and private investors (hotels, restaurants, etc.). This measure gives a broader value to total investment in the country of interest. In order to show you this dataset we took the example of the USA (below). We can see that government investment in tourism for the country has grown significantly over the past 25 years.
Located in New York, USA, Times Square is one of the world’s most visited tourist attractions, drawing an estimated 50 million visitors annually
| Country ISO3 | Country Name | Indicator Id | Indicator | Subindicator Type | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | 2020 | 2021 | 2022 | 2023 | 2024 | 2025 | 2026 | 2027 | 2028 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| USA | United States | 24658 | Government spending on travel and Tourism service | Local currency _in bn (Nominal prices) | 2.711 | 0.993 | 3.581 | 4.80 | 5.02 | 7.71 | 2.566 | 0.104 | 7.56 | 11.75 | 12.51 | 13.27 | 14.10 | 15.19 | 15.73 | 16.33 | 16.47 | 16.64 | 16.59 | 16.929 | 17.33 | 17.73 | 18.27 | 18.753 | 19.36 | 20.04 | 20.76 | 21.58 | 22.44 | 23.34 | 24.29 | 25.27 | 26.30 | 27.40 |
| USA | United States | 24659 | Government spending on travel and Tourism service | Local currency in bn (Real prices) | 4.084 | 1.470 | 5.209 | 6.90 | 7.11 | 10.68 | 3.476 | 0.138 | 9.89 | 14.96 | 15.43 | 15.88 | 16.43 | 17.36 | 17.85 | 18.31 | 18.09 | 17.95 | 17.61 | 17.651 | 17.88 | 18.06 | 18.27 | 18.428 | 18.74 | 19.07 | 19.39 | 19.76 | 20.15 | 20.55 | 20.97 | 21.40 | 21.83 | 22.30 |
| USA | United States | 24660 | Government spending on travel and Tourism service | US$ in bn (Nominal prices) | 2.711 | 0.993 | 3.581 | 4.80 | 5.02 | 7.71 | 2.566 | 0.104 | 7.56 | 11.75 | 12.51 | 13.27 | 14.10 | 15.19 | 15.73 | 16.33 | 16.47 | 16.64 | 16.59 | 16.929 | 17.33 | 17.73 | 18.27 | 18.753 | 19.36 | 20.04 | 20.76 | 21.58 | 22.44 | 23.34 | 24.29 | 25.27 | 26.30 | 27.40 |
| USA | United States | 24661 | Government spending on travel and Tourism service | US$ in bn (Real prices) | 4.084 | 1.470 | 5.209 | 6.90 | 7.11 | 10.68 | 3.476 | 0.138 | 9.89 | 14.96 | 15.43 | 15.88 | 16.43 | 17.36 | 17.85 | 18.31 | 18.09 | 17.95 | 17.61 | 17.651 | 17.88 | 18.06 | 18.27 | 18.428 | 18.74 | 19.07 | 19.39 | 19.76 | 20.15 | 20.55 | 20.97 | 21.40 | 21.83 | 22.30 |
| USA | United States | 24662 | Government spending on travel and Tourism service | % growth | 4.710 | -64.009 | 254.389 | 32.51 | 3.04 | 50.18 | -67.455 | -96.023 | 7051.54 | 51.27 | 3.15 | 2.91 | 3.51 | 5.64 | 2.84 | 2.55 | -1.20 | -0.80 | -1.89 | 0.253 | 1.29 | 1.03 | 1.17 | 0.847 | 1.67 | 1.80 | 1.67 | 1.91 | 1.96 | 2.00 | 2.05 | 2.02 | 2.00 | 2.15 |
| USA | United States | 24663 | Government spending on travel and Tourism service | % share of total tourism expenditure | 0.643 | 0.229 | 0.794 | 1.02 | 1.00 | 1.45 | 0.451 | 0.020 | 1.17 | 1.71 | 1.71 | 1.72 | 1.73 | 1.74 | 1.75 | 1.76 | 1.77 | 1.77 | 1.78 | 1.793 | 1.80 | 1.81 | 1.82 | 1.827 | 1.84 | 1.85 | 1.86 | 1.86 | 1.87 | 1.88 | 1.89 | 1.90 | 1.91 | 1.92 |
Source: http://data.un.org/DocumentData.aspx?id=438
This dataset comes from the World Tourism Organisation, the United Nations agency that promotes tourism development on a global scale. This dataset contains factual measures of passenger and financial flows generated by tourists over the past 25 years from 1995 to 2019.
The dataset contains observations by country divided into two groups, which are further divided into different categories:
Total Arrivals measures the number of people entering the country of interest from every mean of transportation (car, bus, airplanes, boat..). It is our first KPI to measure tourism performance. The variable is per year and in thousands for each country and each year from 1995 to 2019.
The Total Arrivals group is divided into tourists arriving only for the day, those stopping over on a cruise and finally those staying overnight in the country visited.
For reasons of consistency we have chosen to use the variable Total Arrivals of visitors Staying overnight in the rest of this report in order to exclude day-trippers and stopovers by boat which correspond more to indirect tourism than the primary purpose of travelers.
Indeed, when you travel to a country (especially a distant one) you usually stay there at least one night to sleep. The excursions without hotel correspond for example to the visit of Bosnia and Herzegovina from Dubrovnik in one day. But tourists choose to visit Croatia first. This is what we want to measure.
The second part of the dataset is devoted to the tourist income per country per year, and is divided into several categories:
In order to be consistent with our choice to only take Total Arrivals of tourists who stay at least one night, we have chosen to focus here on Total Tourism Expenditures for the countries (related to passenger transportations AND travel).
In addition, we decided to measure Tourism Expenditures in nominal rather than real terms to facilitate comparison between countries that have not all had the same GDP and inflation over the past 25 years.
Here is an overview of these two categories in our dataset, taking for example the beautiful island of St Vincent and the Grenadines in the Caribbean. You will see that some data are missing for some countries but that overall countries have managed to provide meaningful data under the leadership of the United Nations.
St-Vincent and the Grenadines, whose Tourism metrics are presented below
| Country | Arrivals | Total | Overnights | Of | Units | Notes | Series | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 | Unnamed: 33 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| SAINT VINCENT AND THE GRENADINES | Arrivals | Total arrivals | Overnights | of which, nationals residing abroad | Thousands | 1/ | VF | 218 | 216 | 200 | 202 | 223 | 256 | 254 | 247 | 242 | 262 | 256 | 306 | 328 | 250 | 271 | 231 | 208 | 200 | 200 | 205 | 207 | 227 | 303 | 356 | .. | NA |
| SAINT VINCENT AND THE GRENADINES | Arrivals | Total arrivals | Overnights visitors (tourists) | of which, nationals residing abroad | Thousands | 1/ | TF | 60 | 58 | 65 | 67 | 68 | 73 | 71 | 78 | 79 | 87 | 96 | 97 | 90 | 84 | 75 | 72 | 74 | 74 | 72 | 71 | 75 | 79 | 76 | 80 | .. | NA |
| SAINT VINCENT AND THE GRENADINES | Arrivals | Total arrivals | Same-day visitors (excursionists) | of which, nationals residing abroad | Thousands | 1/ | TF | 158 | 158 | 135 | 135 | 155 | 183 | 183 | 170 | 163 | 175 | 161 | 209 | 238 | 166 | 196 | 159 | 134 | 126 | 128 | 134 | 131 | 148 | 227 | 276 | .. | NA |
| SAINT VINCENT AND THE GRENADINES | Arrivals | Total arrivals | of which, cruise passengers | of which, cruise passengers | Thousands | 2/ | TF | 127 | 128 | 107 | 114 | 137 | 162 | 168 | 157 | 149 | 162 | 152 | 200 | 231 | 160 | 190 | 153 | 130 | 122 | 126 | 132 | 130 | 147 | 226 | 275 | .. | NA |
| SAINT VINCENT AND THE GRENADINES | Arrivals by region | Total arrivals | of which, cruise passengers | of which, cruise passengers | Thousands | 2/ | TF | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
| SAINT VINCENT AND THE GRENADINES | Tourism expenditure in the country | Hotels and similar establishments | Overnights | of which, nationals residing abroad | US$ Millions | 1/ | IMF | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | 178.5 | 211.1 | 221.3 | 216.4 | 241 | .. | NA |
| SAINT VINCENT AND THE GRENADINES | Tourism expenditure in the country | Travel | Overnights | of which, nationals residing abroad | US$ Millions | 1/ | IMF | 53 | 64 | 71 | 73 | 85 | 82 | 89 | 91 | 91 | 96 | 104 | 113 | 110 | 96 | 88 | 86 | 92 | 94 | 92 | 175 | 207 | 216 | 211 | 235 | .. | NA |
These databases provide country-specific (more than 177 countries) investment data in infrastructure, hospitality and tourism promotion from 1995 to 2019, and tourism performance indicators (number of tourists annually, total expenditure) from the same countries. The databases contain a large number of variables, for example for government spending we have it in US$ and in local currency, in real and nominal prices, arrivals are divided by region, means of transport, etc. However, our goal is not to use everything but rather totals and focus on the most relevant variables.
In order to facilitate working with these large databases, the datasets were rebuilt and the data was centralized.
We first started by merging/reorganizing columns of the two data sources individually.
#We start by merging csv column titles of categories and subcategories into one string: Indicator
InboundTourism$Indicator <- paste(InboundTourism$Arrivals, " ", InboundTourism$Total, " ", InboundTourism$Overnights, " ", InboundTourism$Of, " ")
#We then rename one country that is different into our dataset : the USA
InboundTourism$Country[InboundTourism$Country == "UNITED STATES OF AMERICA"] <- "USA"
#Lasty we remove unused columns (such as country code) and re-order them to have the indicator at the beginning
FinalInput <- InboundTourism %>% select(-c(2,3,4,5,6,7,8,34)) %>% relocate(27, .before = 2)
Output = GovData360 %>% select(-c(1,3)) %>% rename(Country = 1)
#We start by removing country ISO3 and IndicatorID, not useful for our analysis
Output$Country = toupper(Output$Country)
Output$Country[Output$Country == "UNITED STATES"] <- "USA"
#As in the order dataset, we merge categories and sub-categories names into one string per observation
Output$Indicator <- paste(Output$Indicator, " ", Output$`Subindicator Type`)
#Finally we remove subindicator type and years 2019-2028
FinalOutput = Output %>% select(-c(3,29:37))
#We then assure that the column names from InboundTourism as the same as in GovData360
colnames(FinalInput) <- colnames(FinalOutput)
After having prepared individually Inbound Tourism and Government Spending dataframes, we merged those two sources together by dropping the countries that were not present in both sets:
#We start by matching the countries from BOTH datasets
x = intersect(FinalInput$Country, FinalOutput$Country)
#We then create a new dataframe FinalCombined containing the rbind of the matched countries observations variables
FinalCombined<- rbind(FinalInput[FinalInput$Country %in% x,],FinalOutput[FinalOutput$Country %in% x,])
FinalCombined <- FinalCombined[order(FinalCombined$Country),]
At this stage, we had one big file containing our two datasources, but it was not in the format we wanted. The current dataframe looked like this
| Country | Indicator | 1995 | 1996 | 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 | 2017 | 2018 | 2019 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ALBANIA | Tourism expenditure in the country Hotels and similar establishments Overnights of which, nationals residing abroad | 70 | 94 | 34 | 60 | 218 | 398 | 451 | 492 | 537 | 756 | 880 | 1057 | 1479 | 1850 | 2013 | 1778 | 1833 | 1623 | 1670 | 1849 | 1613 | 1821 | 2050 | 2306 | 2458 |
| ALBANIA | Tourism expenditure in the country Travel Overnights of which, nationals residing abroad | 65 | 77 | 27 | 54 | 211 | 389 | 446 | 487 | 522 | 735 | 854 | 1012 | 1378 | 1715 | 1828 | 1611 | 1632 | 1463 | 1473 | 1700 | 1499 | 1693 | 1943 | 2186 | 2329 |
What we wanted is to have one line for each observation, so we could then group them by country of year for each variable. We had to reshape our data like this
nw2 <- FinalCombined %>%
gather(year, value, -c(Country,Indicator)) %>%
pivot_wider(names_from = Indicator, values_from = value,values_fn = list) %>%
select(-c(3,8))
Now our data looked like this
| Country | year | Arrivals Total arrivals Overnights of which, nationals residing abroad | Arrivals Total arrivals Overnights visitors (tourists) of which, nationals residing abroad | Arrivals Total arrivals Same-day visitors (excursionists) of which, nationals residing abroad | Arrivals Total arrivals of which, cruise passengers of which, cruise passengers | Tourism expenditure in the country Hotels and similar establishments Overnights of which, nationals residing abroad | Tourism expenditure in the country Travel Overnights of which, nationals residing abroad | Business Tourism Spending Local currency in bn (Real prices) | Business Tourism Spending Local currency _in bn (Nominal prices) | Business Tourism Spending % share | Business Tourism Spending US$ in bn (Nominal prices) | Business Tourism Spending US$ in bn (Real prices) | Business Tourism Spending % growth | Travel and Tourism direct contribution to employment Thousands of jobs | Travel and Tourism direct contribution to employment % share of total employment | Travel and Tourism direct contribution to employment % growth | Travel and Tourism direct contribution to GDP Local currency _in bn (Nominal prices) | Travel and Tourism direct contribution to GDP Local currency in bn (Real prices) | Travel and Tourism direct contribution to GDP Percentage share of total GDP | Travel and Tourism direct contribution to GDP US$ in bn (Nominal prices) | Travel and Tourism direct contribution to GDP US$ in bn (Real prices) | Travel and Tourism direct contribution to GDP % growth | Domestic Tourism Spending Local currency _in bn (Nominal prices) | Domestic Tourism Spending Local currency in bn (Real prices) | Domestic Tourism Spending % share | Domestic Tourism Spending US$ in bn (Nominal prices) | Domestic Tourism Spending US$ in bn (Real prices) | Domestic Tourism Spending % growth | Government spending on travel and Tourism service Local currency _in bn (Nominal prices) | Government spending on travel and Tourism service Local currency in bn (Real prices) | Government spending on travel and Tourism service US$ in bn (Nominal prices) | Government spending on travel and Tourism service US$ in bn (Real prices) | Government spending on travel and Tourism service % growth | Government spending on travel and Tourism service % share of total tourism expenditure | Internal Travel and Tourism consumption Local currency _in bn (Nominal prices) | Internal Travel and Tourism consumption Local currency in bn (Real prices) | Internal Travel and Tourism consumption % share | Internal Travel and Tourism consumption US$ in bn (Nominal prices) | Internal Travel and Tourism consumption US$ in bn (Real prices) | Internal Travel and Tourism consumption % growth | Capital investment in Travel and Tourism Local currency _in bn (Nominal prices) | Capital investment in Travel and Tourism Local currency in bn (Real prices) | Capital investment in Travel and Tourism % exports | Capital investment in Travel and Tourism US$ in bn (Nominal prices) | Capital investment in Travel and Tourism US$ in bn (Real prices) | Capital investment in Travel and Tourism % growth | Leisure Tourism Spending Local currency _in bn (Nominal prices) | Leisure Tourism Spending Local currency in bn (Real prices) | Leisure Tourism Spending % share | Leisure Tourism Spending US$ in bn (Nominal prices) | Leisure Tourism Spending US$ in bn (Real prices) | Leisure Tourism Spending % growth | Outbound Travel & Tourism Expenditure Local currency _in bn (Nominal prices) | Outbound Travel & Tourism Expenditure Local currency in bn (Real prices) | Outbound Travel & Tourism Expenditure US$ in bn (Nominal prices) | Outbound Travel & Tourism Expenditure US$ in bn (Real prices) | Outbound Travel & Tourism Expenditure Percentage share of total GDP | Outbound Travel & Tourism Expenditure % growth | Travel and Tourism total contribution to employment % share of total employment | Travel and Tourism total contribution to employment Thousands of jobs | Travel and Tourism total contribution to employment % growth | Travel and Tourism total contribution to GDP Local currency _in bn (Nominal prices) | Travel and Tourism total contribution to GDP Local currency in bn (Real prices) | Travel and Tourism total contribution to GDP Percentage share of total GDP | Travel and Tourism total contribution to GDP US$ in bn (Nominal prices) | Travel and Tourism total contribution to GDP US$ in bn (Real prices) | Travel and Tourism total contribution to GDP % growth | Visitor Exports (Foreign spending) Local currency _in bn (Nominal prices) | Visitor Exports (Foreign spending) Local currency in bn (Real prices) | Visitor Exports (Foreign spending) % exports | Visitor Exports (Foreign spending) US$ in bn (Nominal prices) | Visitor Exports (Foreign spending) US$ in bn (Real prices) | Visitor Exports (Foreign spending) % growth |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ALBANIA | 1995 | 304 | .. | .. | .. | 70 | 65 | 47.6551 | 17.6904 | 3.028 | 0.190835 | 0.399456 | 0.09 | 34.6444 | 3.03861 | -7.06613 | 8.11397 | 21.8578 | 3.55915 | 0.09 | 0.183217 | 7.29443 | 14.2809 | 38.4704 | 6.19741 | 0.154055 | 0.322468 | 3.59493 | 0.1523 | 0.410273 | 0 | 0 | -0.614172 | 3.85652 | 20.7934 | 56.0143 | 6.91556 | 0.224309 | 0.469525 | 3.6836 | 1.41831 | 3.82071 | 2.96254 | 0.02 | 0.03 | 50.417 | 3.10307 | 8.35919 | 0.505073 | 0.03 | 0.07 | 20.4834 | 1.65644 | 4.46219 | 0.02 | 0.04 | 0.726588 | 21.5374 | 9.86143 | 112.434 | -5.55361 | 25.5505 | 68.829 | 11.2076 | 0.275626 | 0.576941 | 9.14668 | 6.51258 | 17.5439 | 26.7791 | 0.07 | 0.147057 | 3.98182 |
| ALGERIA | 1995 | 520 | .. | .. | .. | .. | 32 | 76.7946 | 17.9741 | 0.578689 | 0.377111 | 0.684838 | 2.87845 | 110.114 | 1.96982 | 5.71007 | 46.2418 | 197.569 | 2.30632 | 0.970189 | 1.76187 | 3.82915 | 65.3535 | 279.224 | 3.23823 | 1.37117 | 2.49006 | 3.68055 | 0.42711 | 1.82483 | 0.01 | 0.02 | 5.17672 | 0.974113 | 71.6346 | 306.059 | 2.733 | 1.50295 | 2.72937 | 4.45416 | 9.62787 | 41.1352 | 1.77693 | 0.202 | 0.366835 | 97.6349 | 53.6605 | 229.265 | 1.71388 | 1.12584 | 2.04454 | 5.04138 | 6.90654 | 29.5082 | 0.144904 | 0.263148 | 0.344466 | 37.5466 | 3.95196 | 220.917 | 9.94066 | 90.6793 | 387.428 | 4.52266 | 1.90252 | 3.455 | 9.8067 | 6.28104 | 26.8358 | 1.17843 | 0.131781 | 0.239316 | 13.1267 |
| ANGOLA | 1995 | .. | 9 | .. | .. | 27 | 10 | 37.7184 | 0 | 0.361265 | 0.03 | 0.228458 | 117.511 | 65.2338 | 1.33935 | 5.56726 | 0 | 40.4769 | 0.710171 | 0.03 | 0.245166 | 161.821 | 0 | 42.9786 | 0.754055 | 0.04 | 0.260319 | 155.711 | 0 | 0 | 0 | 0 | -95.2558 | 1.11433 | 0 | 74.1464 | 1.30081 | 0.06 | 0.4491 | 161.945 | 0 | 47.8929 | 0.08 | 0.04 | 0.290084 | -32.2549 | 0 | 36.428 | 0.3489 | 0.03 | 0.220642 | 122.249 | 0 | 63.1529 | 0.05 | 0.382513 | 1.10802 | 163.786 | 3.02448 | 147.309 | -52.4515 | 0 | 99.3388 | 1.74291 | 0.09 | 0.601688 | 7.69452 | 0 | 31.1678 | 7470.96 | 0.03 | 0.188781 | 171.331 |
| ANTIGUA AND BARBUDA | 1995 | 447 | 220 | 227 | 227 | .. | 247 | 0.05 | 0.04 | 1.03732 | 0.01 | 0.02 | 5.17285 | 7.29162 | 27.5571 | -9.27218 | 0.37362 | 0.536657 | 23.9706 | 0.138378 | 0.198762 | -13.5205 | 0.05 | 0.07 | 2.42384 | 0.02 | 0.03 | 1.50687 | 0.01 | 0.02 | 0 | 0.01 | -1.78796 | 31.5751 | 0.855238 | 1.22844 | 31.1703 | 0.316755 | 0.454977 | -14.8444 | 0.15363 | 0.220669 | 40.5677 | 0.06 | 0.08 | 132.489 | 0.818228 | 1.17528 | 22.6136 | 0.303047 | 0.435288 | -16.5609 | 0.07 | 0.09 | 0.02 | 0.03 | 4.1793 | 7.12719 | 81.1592 | 21.4748 | -8.17361 | 1.15334 | 1.65663 | 73.9959 | 0.427164 | 0.613565 | -11.858 | 0.806051 | 1.15779 | 74.3356 | 0.298537 | 0.42881 | -15.6401 |
| ARGENTINA | 1995 | .. | 2289 | .. | .. | 2550 | 2222 | 66.769 | 3.08315 | 0.585484 | 3.08315 | 4.02189 | -0.710446 | 391.844 | 3.22725 | 1.25432 | 10.3034 | 223.132 | 3.57142 | 10.3034 | 13.4405 | 1.45818 | 16.2571 | 352.065 | 5.59084 | 16.2571 | 21.2069 | -0.846802 | 0.127681 | 2.76507 | 0.127681 | 0.166556 | -4.80203 | 2.5391 | 18.8071 | 407.287 | 5.99327 | 18.8071 | 24.5333 | 0.474417 | 1.56245 | 33.8365 | 3.70031 | 1.56245 | 2.03817 | 32.4357 | 15.7239 | 340.519 | 2.96169 | 15.7239 | 20.5114 | -0.138947 | 3.87053 | 83.8206 | 3.87053 | 5.049 | 1.34162 | -5.63925 | 8.38983 | 1018.67 | 3.35049 | 26.4222 | 572.202 | 9.15859 | 26.4222 | 34.4671 | 3.60544 | 2.55 | 55.223 | 10.1539 | 2.55 | 3.3264 | 10.0559 |
All data formats were converted to numeric values and reference countries were added. Thus, we have ended with a new reshaped data frame, called nw4, that was stored as Rda and Csv in our /data folder for further plotting and analysis.
nw4 = data.frame(sapply(nw2, function(x) as.numeric(as.character(x))))
nw4$Country = nw2$Country
Our final cleaned dataframe looked like this
| Country | year | Arrivals…Total.arrivals…Overnights…of.which..nationals.residing.abroad.. | Arrivals…Total.arrivals…Overnights.visitors..tourists….of.which..nationals.residing.abroad.. | Arrivals…Total.arrivals…Same.day.visitors..excursionists….of.which..nationals.residing.abroad.. | Arrivals…Total.arrivals…of.which..cruise.passengers…of.which..cruise.passengers.. | Tourism.expenditure.in.the.country…Hotels.and.similar.establishments…Overnights…of.which..nationals.residing.abroad.. | Tourism.expenditure.in.the.country…Travel…Overnights…of.which..nationals.residing.abroad.. | Business.Tourism.Spending…Local.currency.in.bn..Real.prices. | Business.Tourism.Spending…Local.currency.._in.bn..Nominal.prices. | Business.Tourism.Spending…..share | Business.Tourism.Spending…US..in.bn..Nominal.prices. | Business.Tourism.Spending…US..in.bn..Real.prices. | Business.Tourism.Spending…..growth | Travel.and.Tourism.direct.contribution.to.employment…Thousands.of.jobs | Travel.and.Tourism.direct.contribution.to.employment…..share.of.total.employment | Travel.and.Tourism.direct.contribution.to.employment…..growth | Travel.and.Tourism.direct.contribution.to.GDP…Local.currency.._in.bn..Nominal.prices. | Travel.and.Tourism.direct.contribution.to.GDP…Local.currency.in.bn..Real.prices. | Travel.and.Tourism.direct.contribution.to.GDP…Percentage.share.of.total.GDP | Travel.and.Tourism.direct.contribution.to.GDP…US..in.bn..Nominal.prices. | Travel.and.Tourism.direct.contribution.to.GDP…US..in.bn..Real.prices. | Travel.and.Tourism.direct.contribution.to.GDP…..growth | Domestic.Tourism.Spending…Local.currency.._in.bn..Nominal.prices. | Domestic.Tourism.Spending…Local.currency.in.bn..Real.prices. | Domestic.Tourism.Spending…..share | Domestic.Tourism.Spending…US..in.bn..Nominal.prices. | Domestic.Tourism.Spending…US..in.bn..Real.prices. | Domestic.Tourism.Spending…..growth | Government.spending.on.travel.and.Tourism.service…Local.currency.._in.bn..Nominal.prices. | Government.spending.on.travel.and.Tourism.service…Local.currency.in.bn..Real.prices. | Government.spending.on.travel.and.Tourism.service…US..in.bn..Nominal.prices. | Government.spending.on.travel.and.Tourism.service…US..in.bn..Real.prices. | Government.spending.on.travel.and.Tourism.service…..growth | Government.spending.on.travel.and.Tourism.service…..share.of.total.tourism.expenditure | Internal.Travel.and.Tourism.consumption…Local.currency.._in.bn..Nominal.prices. | Internal.Travel.and.Tourism.consumption…Local.currency.in.bn..Real.prices. | Internal.Travel.and.Tourism.consumption…..share | Internal.Travel.and.Tourism.consumption…US..in.bn..Nominal.prices. | Internal.Travel.and.Tourism.consumption…US..in.bn..Real.prices. | Internal.Travel.and.Tourism.consumption…..growth | Capital.investment.in.Travel.and.Tourism…Local.currency.._in.bn..Nominal.prices. | Capital.investment.in.Travel.and.Tourism…Local.currency.in.bn..Real.prices. | Capital.investment.in.Travel.and.Tourism…..exports | Capital.investment.in.Travel.and.Tourism…US..in.bn..Nominal.prices. | Capital.investment.in.Travel.and.Tourism…US..in.bn..Real.prices. | Capital.investment.in.Travel.and.Tourism…..growth | Leisure.Tourism.Spending…Local.currency.._in.bn..Nominal.prices. | Leisure.Tourism.Spending…Local.currency.in.bn..Real.prices. | Leisure.Tourism.Spending…..share | Leisure.Tourism.Spending…US..in.bn..Nominal.prices. | Leisure.Tourism.Spending…US..in.bn..Real.prices. | Leisure.Tourism.Spending…..growth | Outbound.Travel…Tourism.Expenditure…Local.currency.._in.bn..Nominal.prices. | Outbound.Travel…Tourism.Expenditure…Local.currency.in.bn..Real.prices. | Outbound.Travel…Tourism.Expenditure…US..in.bn..Nominal.prices. | Outbound.Travel…Tourism.Expenditure…US..in.bn..Real.prices. | Outbound.Travel…Tourism.Expenditure…Percentage.share.of.total.GDP | Outbound.Travel…Tourism.Expenditure…..growth | Travel.and.Tourism.total.contribution.to.employment…..share.of.total.employment | Travel.and.Tourism.total.contribution.to.employment…Thousands.of.jobs | Travel.and.Tourism.total.contribution.to.employment…..growth | Travel.and.Tourism.total.contribution.to.GDP…Local.currency.._in.bn..Nominal.prices. | Travel.and.Tourism.total.contribution.to.GDP…Local.currency.in.bn..Real.prices. | Travel.and.Tourism.total.contribution.to.GDP…Percentage.share.of.total.GDP | Travel.and.Tourism.total.contribution.to.GDP…US..in.bn..Nominal.prices. | Travel.and.Tourism.total.contribution.to.GDP…US..in.bn..Real.prices. | Travel.and.Tourism.total.contribution.to.GDP…..growth | Visitor.Exports..Foreign.spending….Local.currency.._in.bn..Nominal.prices. | Visitor.Exports..Foreign.spending….Local.currency.in.bn..Real.prices. | Visitor.Exports..Foreign.spending……exports | Visitor.Exports..Foreign.spending….US..in.bn..Nominal.prices. | Visitor.Exports..Foreign.spending….US..in.bn..Real.prices. | Visitor.Exports..Foreign.spending……growth |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ALBANIA | 1995 | 304 | NA | NA | NA | 70 | 65 | 47.66 | 17.69 | 3.028 | 0.191 | 0.399 | 0.09 | 34.64 | 3.04 | -7.07 | 8.114 | 21.858 | 3.56 | 0.090 | 0.183 | 7.29 | 14.28 | 38.47 | 6.197 | 0.154 | 0.322 | 3.595 | 0.152 | 0.41 | 0.000 | 0.000 | -0.614 | 3.857 | 20.793 | 56.01 | 6.92 | 0.224 | 0.470 | 3.684 | 1.418 | 3.821 | 2.96 | 0.020 | 0.030 | 50.4 | 3.103 | 8.36 | 0.505 | 0.030 | 0.070 | 20.483 | 1.66 | 4.46 | 0.020 | 0.040 | 0.727 | 21.54 | 9.86 | 112.4 | -5.55 | 25.55 | 68.83 | 11.21 | 0.276 | 0.577 | 9.15 | 6.513 | 17.54 | 26.78 | 0.070 | 0.147 | 3.98 |
| ALGERIA | 1995 | 520 | NA | NA | NA | NA | 32 | 76.80 | 17.97 | 0.579 | 0.377 | 0.685 | 2.88 | 110.11 | 1.97 | 5.71 | 46.242 | 197.569 | 2.31 | 0.970 | 1.762 | 3.83 | 65.35 | 279.22 | 3.238 | 1.371 | 2.490 | 3.681 | 0.427 | 1.82 | 0.010 | 0.020 | 5.177 | 0.974 | 71.635 | 306.06 | 2.73 | 1.503 | 2.729 | 4.454 | 9.628 | 41.135 | 1.78 | 0.202 | 0.367 | 97.6 | 53.660 | 229.26 | 1.714 | 1.126 | 2.045 | 5.041 | 6.91 | 29.51 | 0.145 | 0.263 | 0.344 | 37.55 | 3.95 | 220.9 | 9.94 | 90.68 | 387.43 | 4.52 | 1.903 | 3.455 | 9.81 | 6.281 | 26.84 | 1.18 | 0.132 | 0.239 | 13.13 |
| ANGOLA | 1995 | NA | 9 | NA | NA | 27 | 10 | 37.72 | 0.00 | 0.361 | 0.030 | 0.228 | 117.51 | 65.23 | 1.34 | 5.57 | 0.000 | 40.477 | 0.71 | 0.030 | 0.245 | 161.82 | 0.00 | 42.98 | 0.754 | 0.040 | 0.260 | 155.711 | 0.000 | 0.00 | 0.000 | 0.000 | -95.256 | 1.114 | 0.000 | 74.15 | 1.30 | 0.060 | 0.449 | 161.945 | 0.000 | 47.893 | 0.08 | 0.040 | 0.290 | -32.3 | 0.000 | 36.43 | 0.349 | 0.030 | 0.221 | 122.249 | 0.00 | 63.15 | 0.050 | 0.383 | 1.108 | 163.79 | 3.02 | 147.3 | -52.45 | 0.00 | 99.34 | 1.74 | 0.090 | 0.602 | 7.70 | 0.000 | 31.17 | 7470.96 | 0.030 | 0.189 | 171.33 |
| ANTIGUA AND BARBUDA | 1995 | 447 | 220 | 227 | 227 | NA | 247 | 0.05 | 0.04 | 1.037 | 0.010 | 0.020 | 5.17 | 7.29 | 27.56 | -9.27 | 0.374 | 0.537 | 23.97 | 0.138 | 0.199 | -13.52 | 0.05 | 0.07 | 2.424 | 0.020 | 0.030 | 1.507 | 0.010 | 0.02 | 0.000 | 0.010 | -1.788 | 31.575 | 0.855 | 1.23 | 31.17 | 0.317 | 0.455 | -14.844 | 0.154 | 0.221 | 40.57 | 0.060 | 0.080 | 132.5 | 0.818 | 1.18 | 22.614 | 0.303 | 0.435 | -16.561 | 0.07 | 0.09 | 0.020 | 0.030 | 4.179 | 7.13 | 81.16 | 21.5 | -8.17 | 1.15 | 1.66 | 74.00 | 0.427 | 0.614 | -11.86 | 0.806 | 1.16 | 74.34 | 0.299 | 0.429 | -15.64 |
| ARGENTINA | 1995 | NA | 2289 | NA | NA | 2550 | 2222 | 66.77 | 3.08 | 0.585 | 3.083 | 4.022 | -0.71 | 391.84 | 3.23 | 1.25 | 10.303 | 223.132 | 3.57 | 10.303 | 13.441 | 1.46 | 16.26 | 352.06 | 5.591 | 16.257 | 21.207 | -0.847 | 0.128 | 2.77 | 0.128 | 0.167 | -4.802 | 2.539 | 18.807 | 407.29 | 5.99 | 18.807 | 24.533 | 0.474 | 1.562 | 33.837 | 3.70 | 1.562 | 2.038 | 32.4 | 15.724 | 340.52 | 2.962 | 15.724 | 20.511 | -0.139 | 3.87 | 83.82 | 3.871 | 5.049 | 1.342 | -5.64 | 8.39 | 1018.7 | 3.35 | 26.42 | 572.20 | 9.16 | 26.422 | 34.467 | 3.60 | 2.550 | 55.22 | 10.15 | 2.550 | 3.326 | 10.06 |
In order for our regression analysis to be more accurate between countries, we added 5 new datasources to our current dataset, corresponding to new variables (columns) for every row (observation). The datasets come from the following sources:
We have decided to incorporate demographic data, since a country’s social and economical background can have a huge impact on how many people wish to visit it and finally, its touristic performance. Specifically, the variables that are going to be used later in our regression are:
In the datasets mentioned above we have found three other variables: Human Development Index, GDP per capita (measured in constant international $, adjusted for price differences between countries and adjusted for inflation to allow comparisons between countries and over time) and total population of the country. However, we decided not to include these variables in our research, because:
In order to do that we first had to import and clean each dataset separately.
We then merged the data to the original dataset using the left_join function from the dplyr package. Next, we joined new variables from the cleaned files above to our main dataset nw4, by using a matching function for year and countries.
The exception is for the Democracy dataset that didn’t have observations by year but only one value per country (the age of each democracy in years at the end of 2015), so each year from the same country got the same number. This may not be the most optimal way but we really needed to use this data to compare countries as you will see in our explanatory analysis.
#Adding Gini index to nw4
nw4 <-
left_join(nw4,
econ %>% dplyr::select(`Gini index`,Country,year),
by = c("Country", "year"))
#Adding Democracy Age to nw4
nw4 <-
left_join(nw4,
demo %>% dplyr::select(`Democracy Age`,Country),
by = "Country")
#Adding Human statistics to nw4
nw4 <-
left_join(nw4,
human %>% dplyr::select(`Human Development Index (UNDP)`,`GDP per Capita`,`Population, total`,Country,year),
by = c("Country", "year"))
#Adding Life statistics to nw4
nw4 <-
left_join(nw4,
life %>% dplyr::select(`Life expectancy`,`Liberal democracy index`,Country,year),
by = c("Country", "year"))
#Adding Trade statistics to nw4
nw4 <-
left_join(nw4,
trade %>% dplyr::select(`Ratio of exports and imports to GDP (%) (PWT 9.1 (2019))`,Country,year),
by = c("Country", "year"))
At the end of the cleaning and matching process, our new completed dataset looked like this. We kindly invite you to use the search and arrow functions in order to navigate through each country observations.
Many countries have at least some missing data, so deleting all the missing data would mean deleting the biggest part of the database. This is why we decided to move on without deleting them now.
Our final dataset has a big number of variables (74 columns), so we identified the most important variables which will be used from now on:
Our data are plotted in two dimensions, and the most coherent columns to plot we consider to be:
After plotting raw data for several graphs (evolution of total arrivals, evolution of tourism expenditure, government spending, tourism performance) we chose to select 23 countries of interest to have better and more readable graphs. Here is an explanation of why we chose these countries.
The first graph that we plotted shows the evolution of total arrivals per year and per countries.
The total number of arrivals per year is a key indicator for measuring the touristic performance of a country. In our database, total arrivals are given in thousands so we divided the number by 1000 to obtain the value in millions.
It can be observed that there are no changes in the global leaders. France remains in first place over the entire period, proving the already known fact that it is the most visited country in the world. The next top destinations are Spain, the USA, China, Italy, Turkey, Mexico, the United Kingdom, Germany and Thailand. This is the exact list of top 10 world country destinations provided by the World Tourism Rankings, that we have discovered in the introduction part. All countries have increasing tendencies, which should be related with the increase of the world population.
A punctual decrease in touristic countries can be observed around 2003-2004. It might be caused by external factors like the terrorist attacks or the rise of the internet, giving more visibility to smaller countries. After having found the top 10 world leaders in terms of total arrivals, we used the data for 2018 in order to plot the following graph in Excel:
Graph 1.2: Top 10 tourism destinations based on Total Arrivals in 2018
Besides the number of total arrivals, it provides relevant visual information about some of the most visited places in the specific countries (the Eiffel Tower in France, the Statue of Liberty in the USA, the Great Wall of China, the Colosseum in Italy, the Big Ben in the UK, etc.).
Graph 2 shows the evolution of the tourism expenditures per year and per country.
Like total arrivals, tourism expenditures are a complementary performance indicator for tourism.
In our database, tourism expenditure is given in millions US$ so we divided the number by 1000 to obtain the value in billions.
As seen from the graph, the USA has the biggest tourism expenditures, which are considerably higher than for the rest of the countries. The graph shows a general increase over the years.
We can note a punctual decline around 2008. It could be due to the financial crisis of 2007-2008.
Graph 3 shows the evolution of government spending, in billions of US dollars and nominal prices.
In general, we see a slight increase of tourism budget over the years. The USA, China and Japan are particularities. They invest considerably more than the other countries, as they are the largest economies by GDP and even if the percentage of GDP invested in tourism might be the same as for other countries, the amount is much bigger. The US is the country spending the most in travel and tourism (almost 20b USD in 2019), followed by China, which went went from 1b USD to nearly 12b USD in 25 years. Japan has a tourism budget 2 times bigger than France, which is the most touristic country in the world. It can be noted that Japan had to reduce tourism budget in 2011, probably due to the seismic activity they endured and the necessity to spend more money on helping people in need and rebuilding the economy (2011 Tohoku earthquake and tsunami, the most powerful earthquake ever recorded in Japan with a magnitude of 9.0-9.1 and 19,747 deaths).
A major decrease can be noticed from 2000 until 2004. However, such a severe decrease seems extreme, and we think that it could be a problem of data collecting. Exterior factors could also be the cause of such changes in the tourism budget. Terrorist attacks of 2001-2002 (the September 11 attacks against the United States and the 2002 Bali bombings) might have shifted governments to focus more on other sectors such as security for example.
A new variable (called here performance) was created in the dataset, calculated as follow: total expenditure in the country/ total government spending. We also divided by 1000 the tourism expenditure, as originally it is in US$ millions, but the government spending is in billions.
This performance indicator shows the amount of revenue earned for every dollar spent in travel and tourism by local governments. Of course, the government does not get all the revenue that comes from the tourists’ expenses. Only a small amount is returned (taxes, pays for public transport, museums, etc.).
Graph 4 shows the evolution of the return on government spending.
Graph 4 could be very useful to highlight what countries are investing wisely in the tourism business. However, we cannot analyze anything yet, as the major decrease in government spending in 2002 (also seen in graph 3) shifts the ratio to extreme values. We would need to adjust the data, by omitting this year. It is interesting to see however that Croatia has had the highest performance score for over 25 years.
As we have seen previously, there is a problem with the years 2001 and 2002 and the extreme values do not permit to analyze the performance of the countries. Thus, we have rebuilt below the previous graph but without the years 2001 and 2002.
Graph 5 is presented below:
Results show that Croatia has the highest performance score, which is an interesting fact to discover. It is followed by Greece, South Africa, Thailand and India. These countries are not as well-known as France or the USA, but the fact that they have a higher return on their investment can have two reasons: either the government is investing more wisely than in other countries, or these countries are de facto attractive for the tourists, so that they do not need as much investment as other countries. We can also note that all these countries are known for good weather and many beaches, which may incite tourists to choose them as a summer destination even if the government invests less in attracting people.
In order to be able to explore the data from our analysis, we developed the most powerful interactive tool of our project: the interactive map. The tool was developed on Shiny by our team of highly motivated data scientists to enrich the experience of the readers of this report.
The access to the tool can be done in two ways, either via the embedded interface below, or via the following url: https://data-science-geneva.shinyapps.io/mapds2021/. If the embedded window below is empty, please open the report.html from your web-browser as R-Studio can prevent external sources from being loaded in its built-in browser.
The purpose of the map is to visualize the return on investment of Government Spendings on the Tourism Expenditures of a country. Tourism Expenditures corresponds to all the money generated by tourists in the country for the year. To visualize this relationship we have used what we call in this report the Tourism Performance Index of a country, which corresponds to a ROI formula :
\(Yearly\:Performance\:Index = \frac{\color{blue}{Tourism \:Expenditures \:in \:US \:BN}}{\color{red}{Government\:Spendings\: in\: US\: BN\; nominal\: prices}}\)
The left pane of the tool contains the Map, which shows the Yearly Performance Index of every country from our dataset for the year selected on the slider. If you switch the slider, all the Yearly Performance Indexes will update according to the selected year. We encourage you to zoom in, zoom out and play with the tool to analyse the evolution of Yearly Performance Indexes worldwide across the years.
The colors of the map are based on a ColorQuantile function that divides all performance indexes for the year and produces a ranking of the best countries. The greener a country is, the better its performance index is for the year compared to the other countries for the same year.
You can see for example that some countries, such as Croatia or Greece, are almost always in the top performers worldwide across the years.
The right pane contains the years selector (to control the left pane) and contains the interactive plot of the tool.
Please click on one country on the map in order to open it’s corresponding interactive plot.
Whereas the left pane of the tool shows performance indexes for the chosen year, the right pane of the tool provides the interactive plot containing 3 KPIs for the selected countries over the last 25 years.
Those 3 KPIs are :
Government Spendings and Tourism Expenditures are the same variables used in our performance index calculation, as shown above.
For our ROI performance index, as a measure of investments we chosed to use Government Spendings (corresponding to Public investments) instead of Capital Investments (Public & Private investments), because Government Spendings where the initial goal of our research.
However on the interactive graph we also showed you Capital Investments so you could compare it to our other datas.
An interesting fact: If you take the USA for example (by clicking on the country), you will see that in 2008, the country had higher Capital Investments than Tourism Expenditures for the year. This is mostly due to the Financial Crisis. During the year the country therefore invested more in tourism than it got financial returns from it. The country therefore had a negative touristic ROI for the year.
All the source code of the interactive Map is available (with comments) in the Map folder of the project.
In this section we will work on the core of our project : the model building!
The goal of the model is, given informations from a country (demographics, government spending in nominal prices, etc.) at a given time T, to try to predict:
This is very useful because it helps countries to forecast their tourism return on investment in the future based on backtest done in the past.
The most important variables used in our analysis were
For clarity purposes we have decided to skip the years 2001 and 2002 from our analysis because the data for those years was defective. More information about this has been given previously in the EDA part of our analysis.
We first started our analysis by comparing ONLY the effect of Government spending on Total arrivals :
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 4231 | 178 | 23.7 | 0 |
| Government.spending.on.travel.and.Tourism.service…US..in.bn..Nominal.prices. | 5896 | 133 | 44.2 | 0 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.393 | 0.393 | 9604 | 1955 | 0 | 1 | -31988 | 63982 | 64000 | 2.78e+11 | 3019 | 3021 |
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 2889 | 160 | 18.1 | 0 |
| Government.spending.on.travel.and.Tourism.service…US..in.bn..Nominal.prices. | 11482 | 114 | 100.8 | 0 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.793 | 0.793 | 8071 | 10155 | 0 | 1 | -27630 | 55266 | 55284 | 1.73e+11 | 2651 | 2653 |
We ran a multivariate analyses (including Government spending and our other demographic variables) on Tourism Expenditure and Total Arrivals :
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -6973.659 | 2657.30 | -2.6243 | 8.78e-03 |
| Government.spending.on.travel.and.Tourism.service…US..in.bn..Nominal.prices. | 22404.343 | 459.53 | 48.7544 | 0.00e+00 |
Gini index
|
-100.141 | 23.30 | -4.2974 | 1.85e-05 |
Democracy Age
|
-62.415 | 7.06 | -8.8384 | 0.00e+00 |
Life expectancy
|
198.245 | 35.20 | 5.6323 | 2.15e-08 |
Liberal democracy index
|
1455.183 | 1099.58 | 1.3234 | 1.86e-01 |
Ratio of exports and imports to GDP (%) (PWT 9.1 (2019))
|
-0.228 | 4.24 | -0.0537 | 9.57e-01 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.718 | 0.717 | 6972 | 582 | 0 | 6 | -14147 | 28309 | 28351 | 6.66e+10 | 1371 | 1378 |
| VIF | |
|---|---|
| Government.spending.on.travel.and.Tourism.service…US..in.bn..Nominal.prices. | 1.43 |
Gini index
|
1.26 |
Democracy Age
|
2.15 |
Life expectancy
|
2.26 |
Liberal democracy index
|
2.39 |
Ratio of exports and imports to GDP (%) (PWT 9.1 (2019))
|
1.26 |
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -1.10e+04 | 3063.56 | -3.6018 | 3.29e-04 |
| Government.spending.on.travel.and.Tourism.service…US..in.bn..Nominal.prices. | 9.97e+03 | 420.61 | 23.7129 | 0.00e+00 |
Gini index
|
-7.74e+01 | 27.60 | -2.8031 | 5.14e-03 |
Democracy Age
|
-3.79e+00 | 9.36 | -0.4050 | 6.86e-01 |
Life expectancy
|
2.25e+02 | 39.65 | 5.6817 | 1.66e-08 |
Liberal democracy index
|
2.78e+03 | 1333.11 | 2.0872 | 3.71e-02 |
Ratio of exports and imports to GDP (%) (PWT 9.1 (2019))
|
-3.98e-01 | 5.29 | -0.0753 | 9.40e-01 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.462 | 0.459 | 8030 | 177 | 0 | 6 | -12988 | 25992 | 26033 | 8e+10 | 1241 | 1248 |
| VIF | |
|---|---|
| Government.spending.on.travel.and.Tourism.service…US..in.bn..Nominal.prices. | 1.30 |
Gini index
|
1.21 |
Democracy Age
|
2.24 |
Life expectancy
|
1.95 |
Liberal democracy index
|
2.39 |
Ratio of exports and imports to GDP (%) (PWT 9.1 (2019))
|
1.23 |
Finally, we compared the Government Spending VS the Total Capital Investments in a country, as X0 value for our analysis: The result of the last regressions (using Capital Investments instead of Government Spending) are as follow:
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -3697.97 | 2693.06 | -1.37 | 1.70e-01 |
| Capital.investment.in.Travel.and.Tourism…US..in.bn..Nominal.prices. | 1725.82 | 35.92 | 48.05 | 0.00e+00 |
Gini index
|
-131.19 | 23.51 | -5.58 | 2.89e-08 |
Democracy Age
|
-7.32 | 7.00 | -1.05 | 2.95e-01 |
Life expectancy
|
174.80 | 35.64 | 4.90 | 1.05e-06 |
Liberal democracy index
|
-1164.98 | 1114.12 | -1.05 | 2.96e-01 |
Ratio of exports and imports to GDP (%) (PWT 9.1 (2019))
|
-9.46 | 4.21 | -2.25 | 2.49e-02 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.713 | 0.712 | 7036 | 567 | 0 | 6 | -14159 | 28334 | 28376 | 6.79e+10 | 1371 | 1378 |
| VIF | |
|---|---|
| Capital.investment.in.Travel.and.Tourism…US..in.bn..Nominal.prices. | 1.30 |
Gini index
|
1.26 |
Democracy Age
|
2.08 |
Life expectancy
|
2.28 |
Liberal democracy index
|
2.41 |
Ratio of exports and imports to GDP (%) (PWT 9.1 (2019))
|
1.23 |
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -5036.4 | 2510.99 | -2.006 | 4.51e-02 |
| Capital.investment.in.Travel.and.Tourism…US..in.bn..Nominal.prices. | 1214.6 | 31.70 | 38.315 | 0.00e+00 |
Gini index
|
-86.8 | 22.50 | -3.858 | 1.20e-04 |
Democracy Age
|
17.9 | 7.57 | 2.363 | 1.83e-02 |
Life expectancy
|
128.4 | 32.55 | 3.944 | 8.47e-05 |
Liberal democracy index
|
-368.3 | 1092.62 | -0.337 | 7.36e-01 |
Ratio of exports and imports to GDP (%) (PWT 9.1 (2019))
|
11.2 | 4.30 | 2.603 | 9.34e-03 |
| r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual | nobs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.642 | 0.64 | 6552 | 370 | 0 | 6 | -12734 | 25484 | 25525 | 5.33e+10 | 1241 | 1248 |
| VIF | |
|---|---|
| Capital.investment.in.Travel.and.Tourism…US..in.bn..Nominal.prices. | 1.29 |
Gini index
|
1.21 |
Democracy Age
|
2.19 |
Life expectancy
|
1.98 |
Liberal democracy index
|
2.42 |
Ratio of exports and imports to GDP (%) (PWT 9.1 (2019))
|
1.22 |
Our first research question was : “Is there a correlation between government investment in travel and tourism (infrastructure, public transport, travel campaigns, promotion) and the touristic performance of the country ?”. We also wanted to see if tourism is following a certain pattern through the years (question 3) and if investments made in tourism could be used to predict the future tourism performance (question 2). Finally, we were interested to see if there are groups of countries that behave similarly concerning their touristic performance (question 4).
The graphs we have built in the EDA so far provide some answers to our third research question. The graphs show the global trends for tourists to travel more over the years (graph 1 and 2). This can be explained by the increase in the world’s population. Another reason is the rise of social media; people can first see places that they like and then decide to visit them. In addition, the plane has become much more accessible and price-convenient compared to before, which certainly caused an exponential increase in long-distance travel.
The governments are spending more money in travel and tourism (graph 3), and this is probably linked to the constant increase in the national GDP over the last decades. The bigger the budget, the more a country will spend in various sectors including travel and tourism, even if the part of the GDP spent might remain the same or even decrease.
To answer the first research question, we first visualized the relationship between government spending and the tourism expenditure for our selected countries.
We see that there is a significant positive relationship between government spendings and tourism expenditure, but its strength is different among countries, which makes perfect sense, as they all have different resources and economic performances.
If we plot government spending against total arrivals, we obtain the following:
Here we observe again the uneven positive relationship among the selected countries. The two graphs plotted above give the possibility to answer our fourth research question. From the graph 6, we can find several groups of countries similar in the evolution trend of tourism expenditures:
The same analysis can be done using the graph 7:
Next, our aim is to try to go deeper in the research questions and to find the best model that can explain the impact of government spending on travel and tourism. Therefore, we did a few regressions in order to see which one fits our data the best. All countries were included in the regressions, because the more data we have, the more accurate the regression will be. Additionally, this research is meant to be conducted on a global level with no particular focus on specific countries. Therefore, we decided to use all the countries in our linear regressions. Concerning the time period, we excluded 2001 and 2002 but left the other years for the same reason: to have more data in the end.
We decided that all our regressions are going to be linear, since linear regressions are simpler to implement and easier to interpret concerning the output coefficients. The algorithms are less complex compared to other algorithms, and even if the linear regression over-simplifies the real-world problems by assuming a linear relationship among the variables, we decided to use it and see if the models will provide a good fit for our data.
In order to provide an answer to our most important research question and see the direct impact of government spending in travel and tourism, we defined the touristic competitiveness to be measured by total arrivals and by tourism expenditure. So we firstly did two single variable regressions plotting both Key Performance Indicators as an output and Government spending as an input to see which one had the best fit of the data.
The results show that indeed, government spending is significant as a variable, because of its low p-value, 2*10^(-16). However, the R-squared is pretty low (0.39), meaning that only 39% of the data fits the model, which is a rather poor fit. The residual standard error is 9650.
When choosing another output variable, namely the tourism expenditure, we see major changes in our linear regression. First and foremost, the R-squared rose to 0.79, which is much better than the first model. This means that if we take government spending only, it explains much better tourism expenditure than total arrivals. While the significance of the input variable does not change, what changes is the residual standard error, which is now less than in the first regression (8130). A smaller residual standard error means predictions are better.
After doing regression 1 and 2, we decided to add more variables to our model, because we believe that the tourism performance is influenced by many factors, among which may be demographic aspects. This could also help us provide an answer to our fifth research question. These new variables are defined in point 5 of Data Wrangling: Adding new data sets. We wanted to see if adding external data such as the democracy age, Gini index, Life expectancy, Liberal Democracy index or Trade openness would improve the data fit and decrease the standard residual error of our model. This is why we have run several multivariate regressions.
The third regression tries to explain total arrivals using the government spending and the newly added demographic variables.
All variables are significant, except the liberal democracy index and the trade openness. The R-squared equals 0.71. Hence the data fit is much better than in the simple linear regression model 1. When we add demographic data to the model, the data fits much better. The residual standard error is lower than in the single regressions (7050) and the VIF test which measures the multicollinearity gives us values inferior to 5 for all the input variables, meaning that we should not be concerned about serious multicollinearity problems.
Our fourth regression tries to explain the tourism expenditure using government spending and the demographic data: Gini index, life expectancy, democracy age, liberal democracy index and trade openness. The adjusted R-squared equals 0.46. Hence the data fit is worse than in the simple linear regression 2 where we analyzed the same tourism variables just without the demographic ones. The residual standard error is 8120. This time, three out of six input variables are insignificant: the democracy age, liberal democracy index and trade openness. The VIF scores do not suggest the existence of any multicollinearity problems.
We were curious to see the difference between government spending and capital investments in terms of the regression and to see which one explains better the tourism performance. As known, government spending represents the investments of the government only (the public sector), while capital investments count all the investments that are made in the country. This includes the government spendings but also the investments of the private sector (hotels, restaurants), which are considerably higher than the ones of the public sector.
When running the regression, we obtain an R-squared equal to 0.71. This suggests a rather good fit for our data. Surprisingly, the Democracy age becomes an insignificant variable, while trade openness becomes significant. The residual standard error is 7100 and the VIF test shows good results.
Finally, the last regression that has been run explains the tourism expenditure with capital investments and the demographic data. All the variables are significant except the liberal democracy index. The R-squared equals 0.64, a slightly poorer fit than in the other multiple regressions. We must also notice that this regression has the lowest residual standard error (6650), meaning less variation in its data.
After running these six regressions, analyzing the R-squared, the standard residual error and the significance of the input variables, we arrive at the conclusion that the best regressions to use are 3, 5 and 6. Specifically, one of the best decisions is to explain total arrivals by the government spendings plus the demographic variables. Capital Investment appears to be another variable used together with the demographic data to efficiently explain the total arrivals and the tourism expenditure. To see the reason for this, we must recall the difference between government spending and capital investment. The first one represents the public sector investments, while the second one comprises the investments made by both public and private sector and therefore gives a broader value to the total investment in the country of interest. However, the two investment variables should not be used in the same regression model because of multicollinearity problems (government spending is a part of capital investment).
Single regressions should not be used in this case, because they have a high residual standard error. Additionally, if we have extra variables for all the years and all the countries, it would be a pity not to use them, since they add more value to the regression models and explain the touristic performance based on multiple factors. And even if linear regressions simplify the real world, we know that such a complex thing as the touristic performance of a country cannot be explained by only one variable. As stated in the introduction, there are always several reasons that influence people’s choice of travel destinations. External factors such as terrorist attacks or natural calamities (groughts, earthquakes, floods, hurricanes) can also have a big impact. All this to say that taking into account one variable only would provide an irrelevant perspective.
Finally, we have answered our first research question and we have seen that there is a high correlation between the investments made in tourism and the touristic performance of a country. This could be used to suggest governments take into account the dynamic tourism circumstances and not forget to invest in order to attract more tourists. Although some countries like France or the USA seem to be always attractive for anyone, we should not forget about social media and the development of new technologies. Tourists can now travel easier than before, so they are motivated to discover new destinations and this could bring benefits to the small countries.
Our second research question was if we can predict tourism using the investments that were made in it. We believe that in a normal set of circumstances this is possible, because the data we used fits well the regression models. We assume that even more extra data can be added (concerning annual or monthly temperature, number of beaches, number of neighboring countries, etc.) in order to predict more efficiently the total arrivals and total expenditures.
However, at the actual stage, it would be risky and irrelevant to try to predict international tourism indicators, with so many travel restrictions related to the global pandemic and changes taking place every day. As long as we have this uncertain situation and we don’t know how the sanitary situation will evolve, it is very unlikely to be able to predict future tourism. In the last two years, the trend is known to be towards promoting domestic tourism.
In this research, we have studied the global travel and tourism industry. The topic is of great relevance for us, since we live in a very dynamic world in terms of travel. With planes being more accessible and social media gaining great popularity, people travel much more often than before and the travel industry is growing rapidly. The industry is an important player in the world economy, creating many jobs and being essential in the creation of the global and national GDPs.
Our initial questions were related to the countries’ touristic performance. We wanted to see the impact of government spendings on travel and tourism and if these spendings were efficient over time. During the single regression analysis, we have seen that government spending is a significant variable and has a positive coefficient when explaining total arrivals and tourism expenditure. However, it is strongly recommended to include other variables as well. We are sure that the top 10 world leaders in terms of total arrivals are not there only because of the investments. There are many other factors that influence people’s decisions concerning travel (attractiveness of a country, personal preferences, the quality of life, social and political factors, etc.). Therefore, explaining the touristic performance using only investments would be a wrong approach. Government spendings and capital investments can of course be used for explaining and predicting the touristic performance, but only in combination with other variables. We did some brainstorming concerning which are the factors that can influence the tourism performance and here are some of our ideas:
Factors affecting tourism performance
As seen from this illustration, government spending is just one of the multiple factors that has an impact on the tourism performance. Therefore, if we intend to conduct a deep analysis on this topic, more variables must be taken into account.
As seen from the graphs 1, 2 and 3 in the EDA, the size of the tourism industry is growing globally and so does its worth, because more people are traveling, which results in more money spent in this industry. The top 10 world destinations are: France, Spain, the USA, China, Italy, Turkey, Mexico, Germany, the United Kingdom and Thailand.
There is a positive growing relationship between the government spending and total arrivals/tourism expenditure. External factors such as the rise of internet (2000s) and others with negative impact (terrorist attacks, natural catastrophes) can be seen in the evolution of tourism performance.
A majority of countries have similar patterns, specifically showing an increase in total arrivals, tourism expenditure and government spending over the years. There is a slight increase in the return on investment, as the tourism expenditure increases quicker than government spending.
However some countries differ from the mass. The USA is a particular case in our research, because it has very high values for government spending (almost $20 bn in 2019). China is increasing its budget every year and is among the world leaders in terms of total arrivals but not in terms of the tourism expenditure (because of the low price level, people spend relatively less money when travelling to China compared to the USA, for example). Croatia on the other side is an example of a smaller tourism budget, but of a good tourism performance, with high total arrivals.
After having run six different single variable and multivariable linear regressions, we have arrived at the conclusion that the best models are the multivariable ones. The regressions prove that there are more factors that influence the touristic performance of a country, not just the investments made in tourism (government spending or capital investment). These two are significant variables, but if we want to obtain a model with the best fit, more data needs to be added. This also corresponds to the reality: we cannot explain the evolution of total arrivals or tourism expenditures just by the investments. People also have personal preferences which influence their choice of country destinations. Sometimes, there are trends to visit one country specifically because of events like the Eurovision Song Contest, the Olympic Games, football matches or other events that do not take place regularly in the same country. The choice of the countries to visit is a complex one, and it becomes more difficult for governments to attract tourists, since there is a high level of transferability of destinations in the minds of consumers.
The demographic data that was added in our research (Gini index, life expectancy, democracy age, liberal democracy index, trade openness) is valuable in the sense that it provides important insight on what can influence people’s choice in terms of travel destinations. The most significant demographic variables that explain the tourism performance are the Gini index and life expectancy, the first one having a negative impact, and the second a positive impact. We have concluded that the best models are the ones that explain:
Capital investments incorporate both the public and the private sector, therefore it represents all the investments made in the country. However, this report focuses on the impact of government spendings. The government may of course influence private sector investments by its rules and regulations and by encouraging the opening of new businesses. But from a more theoretical perspective, private sector investments are not directly linked to the government. Anyway, it is interesting to see that capital investments can also be used as a variable to explain and analyze the touristic performance of a country.
In our research there were a few limitations which we believe made our work more difficult and maybe the results less coherent. First of all, we had to work with two different main data sets (Inbound Tourism from the UN World Tourism Data and Capital Investments from the World Bank Data). We assume the measurements of the data were not done in the same way, and this could have influenced the data fit for the linear regression models. The two initial data sets are quite large and have values for many countries during the period 1995-2019, so it is easy to see the global trends but harder and more time-consuming to analyze countries separately.
Moreover, there is a lot of missing data which results in a smaller number of observations for the regressions and consequently, less accurate regression models. Another limitation is the fact that it is difficult to find extra data (demographic, economic, social indicators) for all the countries and the years between 1995-2019. We thought about introducing variables such as the number of beaches (to see the summer attractiveness) or the number of embassies (to measure the diplomacy level of a country). Unfortunately, we could not find data for all the countries and the given period, so we moved on with the demographic variables only.
To improve the quality of our results and have more interesting models, the analysis should include more variables: not just demographic data, but also data concerning the economic, social and political situation of the countries. It would also be interesting to incorporate data about the weather: average year or monthly temperature, number of rainy days per year, daytime period, etc. Other variables that could be added are the number of embassies and the number of countries whose nationals need a visa for entry. This measures the diplomacy openness of a country. Additionally, we assume it would be curious to include tourism-specific data such as the number of beaches or the presence/absence of mountains and ski resorts. These factors for sure influence people’s decision in terms of tourism destinations. In order to forecast the countries’ tourism performance, more information should be gathered.
We know that the Covid-19 pandemic has had a great impact on global tourism, which was not analyzed in this report, as it takes into account data until 2019. In another potential research, what can be done is compare the situation before 2019 and in 2020-2021, to see how tourism was affected and if the decrease in total arrivals and tourism expenditure is statistically significant.
It would also be interesting to analyze which countries spend a bigger part of their national GDP on travel and tourism (not the amount, but the percentage). This could show which governments have better budgeting concerning the travel industry, which can also have an impact on the future performance (number of visitors, tourism expenditure).
Finally, our research is a good stepping stone for another project, and our final dataset (nw4) can be used further in many researches, as it has 3800 observations of 82 variables. The dataset has many other interesting variables concerning the touristic performance (business tourism spending, travel and tourism direct contribution to employment in thousands of jobs and share of total employment, travel and tourism direct contribution to GDP, domestic tourism spending, leisure tourism spending, visitor exports foreign spending, etc.). The variables are measured in both local currency and USD dollars in real and nominal prices. We did not change the dataset’s long columns and did not delete the variables that were not used in our project, specifically for this reason: to give it the possibility to be potentially used in the future in another research, by us or another group of highly motivated data scientists.
Our full report can also be found on RPubs https://rpubs.com/jcrn/848943 for remote access.