The Economic Question

How have GDP per capita and life expectancy evolved across different continents since 1952? Which continents have seen the fastest growth, and which countries are outliers?

Part 1: Setup and Data Loading (5 points)

# Load the tidyverse package
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Import the wide Gapminder dataset
gapminder_wide <- read_csv("data/gapminder_wide.csv") #With the assingment symbol i can assing the data to gapminder_wide. Also i changed the file name from gapminder_wide(1).csv to gapminder_wide.csv
## Rows: 142 Columns: 26
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): country, continent
## dbl (24): gdpPercap_1952, gdpPercap_1957, gdpPercap_1962, gdpPercap_1967, gd...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Task 1.1: Use glimpse() to examine the structure of gapminder_wide. In your own words, describe what you see. How many rows and columns are there? What do the column names tell you about the data format?

glimpse(gapminder_wide)
## Rows: 142
## Columns: 26
## $ country        <chr> "Afghanistan", "Albania", "Algeria", "Angola", "Argenti…
## $ continent      <chr> "Asia", "Europe", "Africa", "Africa", "Americas", "Ocea…
## $ gdpPercap_1952 <dbl> 779.4453, 1601.0561, 2449.0082, 3520.6103, 5911.3151, 1…
## $ gdpPercap_1957 <dbl> 820.8530, 1942.2842, 3013.9760, 3827.9405, 6856.8562, 1…
## $ gdpPercap_1962 <dbl> 853.1007, 2312.8890, 2550.8169, 4269.2767, 7133.1660, 1…
## $ gdpPercap_1967 <dbl> 836.1971, 2760.1969, 3246.9918, 5522.7764, 8052.9530, 1…
## $ gdpPercap_1972 <dbl> 739.9811, 3313.4222, 4182.6638, 5473.2880, 9443.0385, 1…
## $ gdpPercap_1977 <dbl> 786.1134, 3533.0039, 4910.4168, 3008.6474, 10079.0267, …
## $ gdpPercap_1982 <dbl> 978.0114, 3630.8807, 5745.1602, 2756.9537, 8997.8974, 1…
## $ gdpPercap_1987 <dbl> 852.3959, 3738.9327, 5681.3585, 2430.2083, 9139.6714, 2…
## $ gdpPercap_1992 <dbl> 649.3414, 2497.4379, 5023.2166, 2627.8457, 9308.4187, 2…
## $ gdpPercap_1997 <dbl> 635.3414, 3193.0546, 4797.2951, 2277.1409, 10967.2820, …
## $ gdpPercap_2002 <dbl> 726.7341, 4604.2117, 5288.0404, 2773.2873, 8797.6407, 3…
## $ gdpPercap_2007 <dbl> 974.5803, 5937.0295, 6223.3675, 4797.2313, 12779.3796, …
## $ lifeExp_1952   <dbl> 28.801, 55.230, 43.077, 30.015, 62.485, 69.120, 66.800,…
## $ lifeExp_1957   <dbl> 30.33200, 59.28000, 45.68500, 31.99900, 64.39900, 70.33…
## $ lifeExp_1962   <dbl> 31.99700, 64.82000, 48.30300, 34.00000, 65.14200, 70.93…
## $ lifeExp_1967   <dbl> 34.02000, 66.22000, 51.40700, 35.98500, 65.63400, 71.10…
## $ lifeExp_1972   <dbl> 36.08800, 67.69000, 54.51800, 37.92800, 67.06500, 71.93…
## $ lifeExp_1977   <dbl> 38.43800, 68.93000, 58.01400, 39.48300, 68.48100, 73.49…
## $ lifeExp_1982   <dbl> 39.854, 70.420, 61.368, 39.942, 69.942, 74.740, 73.180,…
## $ lifeExp_1987   <dbl> 40.822, 72.000, 65.799, 39.906, 70.774, 76.320, 74.940,…
## $ lifeExp_1992   <dbl> 41.674, 71.581, 67.744, 40.647, 71.868, 77.560, 76.040,…
## $ lifeExp_1997   <dbl> 41.763, 72.950, 69.152, 40.963, 73.275, 78.830, 77.510,…
## $ lifeExp_2002   <dbl> 42.129, 75.651, 70.994, 41.003, 74.340, 80.370, 78.980,…
## $ lifeExp_2007   <dbl> 43.828, 76.423, 72.301, 42.731, 75.320, 81.235, 79.829,…

Your answer:

I can see that there are 142 rows and 26 columns. When i run this function i see there is a table which shows columns and their elements. But it seems unorganized. The first column goes by country, continent, gdpPercap_1952 to gdpPercap_2007 and lifeExp_1952 to lifeExp_2007. This looks very wide because the data is stored in many years. And while the rows which start with country and continent are which means character, other rows except these two are which holds numeric data type.


Part 2: Data Tidying with .value (20 points)

In the lab, you learned how to use pivot_longer() with the .value sentinel to reshape wide data into tidy format.

Task 2.1: Write code to transform gapminder_wide into a tidy dataset with columns: country, continent, year, gdpPercap, and lifeExp. Show the first 10 rows of your tidy dataset.

gap_tidy <- gapminder_wide %>% 
  pivot_longer(
    cols = -c(country, continent),
    names_to = c(".value", "year"),
    names_sep = "_",
    values_drop_na = FALSE
  ) %>% 
  mutate(year = as.numeric(year))

glimpse(gap_tidy)
## Rows: 1,704
## Columns: 5
## $ country   <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asi…
## $ year      <dbl> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
head(gap_tidy, 10)
## # A tibble: 10 × 5
##    country     continent  year gdpPercap lifeExp
##    <chr>       <chr>     <dbl>     <dbl>   <dbl>
##  1 Afghanistan Asia       1952      779.    28.8
##  2 Afghanistan Asia       1957      821.    30.3
##  3 Afghanistan Asia       1962      853.    32.0
##  4 Afghanistan Asia       1967      836.    34.0
##  5 Afghanistan Asia       1972      740.    36.1
##  6 Afghanistan Asia       1977      786.    38.4
##  7 Afghanistan Asia       1982      978.    39.9
##  8 Afghanistan Asia       1987      852.    40.8
##  9 Afghanistan Asia       1992      649.    41.7
## 10 Afghanistan Asia       1997      635.    41.8
# pipe operator ( %>%  or |>) tooks data (gapminder_wide) and takes the data to the function on the right.
# with pivot_longer() function i can organize the data. the data was containing very much columns.
# Like gdpPercap_1952, gdpPercap_1957, gdpPercap_1962... lifeExp_1987, lifeExp_1992... 
# We can arrange the dataset into less rows and more columns with that function. It makes the data longer
# cols = -c(country, continent) tells us do not touch to country and continent columns. Because they do not include any information about year. That - symbol excludes these columns. So these cols won't be pivotted.
# names_to = divides cols name into 2 different cols. Like lifeExp_1987 -> lifeExp and 1987
# normally when we divide cols, they will go to new cols but with .value() we can make the first parts name the cols name. So lifeExp and gdpPercap will be the cols name.
# names_to = c(".value", "year") so the first cal will be the name and the second col will be named year.
# so there will be 3 different cols named gdpPercap, lifeExp and year
# names_sep = "_" tells us where to divide the name of the col. we divide them from the _ sign. 
# values_drop_na = FALSE we are keeping the empty/missing values and are not deleting them.
# mutate(year = as.numeric(year)) we need years as numbers so we turn them into numbers. 

Task 2.2: Explain in 2-3 sentences what the .value sentinel does in your code. Why is it the right tool for this dataset?

Your answer:

With .value we can make the first parts name the columns name. It tells us the keep the original column name for name of the column. So lifeExp and gdpPercap will be the cols name. names_to = c(“.value”, “year”) so the first column will be the name and the second column will be named year. So there will be 3 different cols named gdpPercap, lifeExp and year. We can use .value with pivot_longer() function. It makes the data more readable and make it easier to analyse.

Task 2.3: From your tidy dataset, filter to keep only observations from 1970 onwards for the following countries: "Turkey", "Brazil", "Korea, Rep.", "Germany", "United States", "China". Save this filtered dataset as gap_filtered.

gap_filtered <- gap_tidy %>% 
  filter(country %in%  c("Turkey", "Brazil", "Korea, Rep.", "Germany", "United States", "China"),
         year >= 1970)

gap_filtered
## # A tibble: 48 × 5
##    country continent  year gdpPercap lifeExp
##    <chr>   <chr>     <dbl>     <dbl>   <dbl>
##  1 Brazil  Americas   1972     4986.    59.5
##  2 Brazil  Americas   1977     6660.    61.5
##  3 Brazil  Americas   1982     7031.    63.3
##  4 Brazil  Americas   1987     7807.    65.2
##  5 Brazil  Americas   1992     6950.    67.1
##  6 Brazil  Americas   1997     7958.    69.4
##  7 Brazil  Americas   2002     8131.    71.0
##  8 Brazil  Americas   2007     9066.    72.4
##  9 China   Asia       1972      677.    63.1
## 10 China   Asia       1977      741.    64.0
## # ℹ 38 more rows
# filter() keeps rows that meet conditions 
# country %in% c(...) is a country filter. c() creates a vector and %in% checks whether the country is included or not. So this tells us
# only these 6 countries should stay.

Part 3: Grouped Summaries (25 points)

Now you will use group_by() and summarize() to answer questions about continents and countries.

Task 3.1: Calculate the average GDP per capita and average life expectancy for each continent across all years (use the full tidy dataset, not the filtered one).

average_gdp_and_lifeExp <- gap_tidy %>% 
  group_by(continent) %>% 
  summarize(
    average_gdp = mean(gdpPercap, na.rm = TRUE),
    average_lifeExp = mean(lifeExp, na.rm = TRUE),
    .groups = "drop"
  )

average_gdp_and_lifeExp
## # A tibble: 5 × 3
##   continent average_gdp average_lifeExp
##   <chr>           <dbl>           <dbl>
## 1 Africa          2194.            48.9
## 2 Americas        7136.            64.7
## 3 Asia            7902.            60.1
## 4 Europe         14469.            71.9
## 5 Oceania        18622.            74.3
# group_by(continent) tells us to divide the data to groups for continent row
# summarize() summarizes, makes a much column data to a shorter table. Every continent will have its own row.
# with mean() function we can get the average and na.rm = TRUE removes missing variables. 
# .groups drops the groups and makes it seem as a table

Questions to answer: - Which continent has the highest average GDP per capita? - Which continent has the highest average life expectancy? - Are these the same continent? Why might that be?

Your answer:

Ocenia has the highest gdp per capita and average life expectancy. Actually it suprised me. This will be because of all of the countries here being an island country. They could be doing so much sea trade. Avustralia and New Zeland are the most developed countries in this area. When i researched, i saw that Australia is the worlds biggest ore and energy exporter. With its low population this will increase gdp per capita. Also Australia has strict migration rules, they only accept people that will benefit the country. a country with higher gdp will have a higher life expectancy - this will make sense if the government is spending for the public.- These countries have developped and free healhcare systems. Also another fact that I learned was These countries have the highest budget for early diagnosis and healty life campaigns. Governments spending for the publics health will increase average life expectancy.

Task 3.2: Find the 5 countries with the highest average GDP per capita across all years. Show the country name and its average GDP per capita.

highest_avg_gdp_top_5 <- gap_tidy %>% 
  group_by(country) %>% 
  summarise(
    avg_gdp = mean(gdpPercap, na.rm = TRUE),
    .groups = "drop"
  ) %>% 
  slice_max(avg_gdp, n=5)

highest_avg_gdp_top_5
## # A tibble: 5 × 2
##   country       avg_gdp
##   <chr>           <dbl>
## 1 Kuwait         65333.
## 2 Switzerland    27074.
## 3 Norway         26747.
## 4 United States  26261.
## 5 Canada         22411.
# slice_max() keeps rows with the highest value

Look at your result: Do any of these countries surprise you? Why might small, wealthy countries appear at the top?

Your answer:

When I look at the table i see that Kuwait, Norway and Switzerland have a higher gdp than United States and Canada. Canada and The US have huge economies but when we are calculating gdp per capita we are dividing gdp to population. Kuwait, Norway and Switzerland have lower population than Canada and The US. Kuwait is at the top but it did not suprise me because i have known that it has huge oil reserves and it is famous for its living standarts like no taxing and governments payments for their public. When I researched i find out that Norways most exported goods are natural gas and crude oil. This will bring them tons of money. When we come to Switzerland, it does not have natural resources but I found out that they are doing high value-added services like banking, finance etc. Actually The US and Canada have high gdp but their population brings gdp per capita down.

Task 3.3: Calculate the correlation between GDP per capita and life expectancy for each continent. Use the full tidy dataset.

continent_info <- gap_tidy %>% 
  select(country, continent) %>% 
  distinct()

corelation_by_continent <- gap_tidy |>
  group_by(continent) |>
  summarize(
    correlation = cor(gdpPercap, lifeExp, use = "complete.obs"),
    n_obs = n(),
    .groups = "drop"
  )


corelation_by_continent
## # A tibble: 5 × 3
##   continent correlation n_obs
##   <chr>           <dbl> <int>
## 1 Africa          0.426   624
## 2 Americas        0.558   300
## 3 Asia            0.382   396
## 4 Europe          0.781   360
## 5 Oceania         0.956    24
# distinct() deletes repeated rows
# cor() calculates the correlation between 2 variables. Correlation is between -1 and 1. If it is close 1 it has high positive
# correlation. If it is close to 0 it has no correlation and if it is close to -1 it has high negative correlation.
# use = "complete.obs" is used for missing values. It means only use  observations where boht variables are filled in.
# n_obs = n() counts how many rows for each continent

Questions to answer: - In which continent is the relationship strongest (highest correlation)? - In which continent is it weakest? - What might explain the differences between continents?

Your answer: Oceania has the highest correlation and Asia has the lowest. The general rule says as the gdp per capita increases life standarts increase so we can aslo see an increase in the expected life time. Money does not buy lifetime but it buys conditions that increase lifetime like health, hygiene and infrastructure. Strong correlation will tell us that economic growth will directly used for benefit of the public like clean water, clean food, hospitals. Weak correlation will show us the money will not be used for the benefit of the public. It also shows that there will be income inequality. In Asia, we can see strong correlation between gdp per capita and expected life countries like Japan and South Korea but in countries like Indıa, expected life will not be increase as the gdp increases. From this pattern we can see that income inequality may be weaken this correlation. Another factor can be population. Asia is the continent with the highest population in the world. Also Asia is in the 4th continent for the income inequality. Also observations of Oceania is 24. I can inder that Australia and New Zeland dominated that continent.


Part 4: Data Integration (20 points)

Now you will practice joining two separate datasets: one containing only life expectancy, and one containing only GDP per capita.

Task 4.1: Import gap_life.csv and gap_gdp.csv. Use glimpse() to examine each one.

gap_life <- read.csv("data/gap_life.csv")
glimpse(gap_life)
## Rows: 1,618
## Columns: 3
## $ country <chr> "Mali", "Malaysia", "Zambia", "Greece", "Swaziland", "Iran", "…
## $ year    <int> 1992, 1967, 1987, 2002, 1967, 1997, 2007, 2007, 1957, 2002, 19…
## $ lifeExp <dbl> 48.388, 59.371, 50.821, 78.256, 46.633, 68.042, 73.747, 78.098…
gap_gdp <- read.csv("data/gap_gdp.csv")
glimpse(gap_gdp)
## Rows: 1,618
## Columns: 3
## $ country   <chr> "Bangladesh", "Mongolia", "Taiwan", "Burkina Faso", "Angola"…
## $ year      <int> 1987, 1997, 2002, 1962, 1962, 1977, 2007, 1962, 1992, 1972, …
## $ gdpPercap <dbl> 751.9794, 1902.2521, 23235.4233, 722.5120, 4269.2767, 2785.4…

Task 4.2: Use inner_join() to combine them into a dataset called gap_joined. Join by the columns they have in common.

gap_joined <- inner_join(gap_life, gap_gdp, by = c("country", "year"))

# Combines two tables, keeping matching rows

Task 4.3: Answer the following: - How many rows are in gap_joined? - How many unique countries are in gap_joined? - Compare this to the original number of rows in gap_life.csv and gap_gdp.csv. Why might the joined dataset have fewer rows?

nrow(gap_joined) #how many rows - 1535
## [1] 1535
nrow(gap_life) # - 1618 - original dataset
## [1] 1618
nrow(gap_gdp) # - 1618 - original dataset
## [1] 1618
unique(gap_joined$country) #unique countries
##   [1] "Mali"                     "Malaysia"                
##   [3] "Zambia"                   "Greece"                  
##   [5] "Swaziland"                "Iran"                    
##   [7] "Venezuela"                "Portugal"                
##   [9] "Sweden"                   "Brazil"                  
##  [11] "Pakistan"                 "Algeria"                 
##  [13] "Equatorial Guinea"        "Botswana"                
##  [15] "Haiti"                    "Saudi Arabia"            
##  [17] "Korea, Dem. Rep."         "Niger"                   
##  [19] "Congo, Dem. Rep."         "United States"           
##  [21] "Eritrea"                  "Trinidad and Tobago"     
##  [23] "Colombia"                 "Panama"                  
##  [25] "Comoros"                  "Italy"                   
##  [27] "Nicaragua"                "Gambia"                  
##  [29] "Iceland"                  "Bosnia and Herzegovina"  
##  [31] "Hong Kong, China"         "El Salvador"             
##  [33] "Myanmar"                  "Croatia"                 
##  [35] "Finland"                  "South Africa"            
##  [37] "Ireland"                  "United Kingdom"          
##  [39] "Liberia"                  "Libya"                   
##  [41] "Malawi"                   "Norway"                  
##  [43] "India"                    "Guatemala"               
##  [45] "Netherlands"              "Japan"                   
##  [47] "Mauritania"               "Ghana"                   
##  [49] "Taiwan"                   "Paraguay"                
##  [51] "Morocco"                  "Cuba"                    
##  [53] "Guinea"                   "Denmark"                 
##  [55] "Chad"                     "Zimbabwe"                
##  [57] "Yemen, Rep."              "Austria"                 
##  [59] "Bahrain"                  "Egypt"                   
##  [61] "Angola"                   "Reunion"                 
##  [63] "Senegal"                  "Gabon"                   
##  [65] "Albania"                  "Serbia"                  
##  [67] "Lebanon"                  "Germany"                 
##  [69] "Jamaica"                  "Canada"                  
##  [71] "Montenegro"               "Rwanda"                  
##  [73] "New Zealand"              "Syria"                   
##  [75] "Spain"                    "Slovak Republic"         
##  [77] "Kenya"                    "Guinea-Bissau"           
##  [79] "Cote d'Ivoire"            "Sri Lanka"               
##  [81] "Switzerland"              "Afghanistan"             
##  [83] "Mozambique"               "Togo"                    
##  [85] "Namibia"                  "Tunisia"                 
##  [87] "Uganda"                   "Mongolia"                
##  [89] "Bulgaria"                 "Sao Tome and Principe"   
##  [91] "Uruguay"                  "Nepal"                   
##  [93] "West Bank and Gaza"       "Iraq"                    
##  [95] "Oman"                     "Burkina Faso"            
##  [97] "Cameroon"                 "Philippines"             
##  [99] "Kuwait"                   "Vietnam"                 
## [101] "Benin"                    "Dominican Republic"      
## [103] "Turkey"                   "Somalia"                 
## [105] "Tanzania"                 "Puerto Rico"             
## [107] "Jordan"                   "Peru"                    
## [109] "Cambodia"                 "Chile"                   
## [111] "Burundi"                  "China"                   
## [113] "Israel"                   "Australia"               
## [115] "Mexico"                   "Lesotho"                 
## [117] "Madagascar"               "Sierra Leone"            
## [119] "Korea, Rep."              "Ecuador"                 
## [121] "Slovenia"                 "Honduras"                
## [123] "France"                   "Belgium"                 
## [125] "Indonesia"                "Romania"                 
## [127] "Hungary"                  "Thailand"                
## [129] "Central African Republic" "Argentina"               
## [131] "Congo, Rep."              "Poland"                  
## [133] "Singapore"                "Bangladesh"              
## [135] "Bolivia"                  "Sudan"                   
## [137] "Mauritius"                "Nigeria"                 
## [139] "Djibouti"                 "Costa Rica"              
## [141] "Ethiopia"                 "Czech Republic"

Your answer: There are 1535 rows in gap_joined and 142 unique countries. In the original dataset there are 1618 rows boht in gap_life and gap_gdp. There is a difference because join function will only get observations that are boht in these variables. If something is missing it will not be included. To include it we boht need year and country combination.

Task 4.4: Check for missing values in gap_joined. Are there any rows where lifeExp or gdpPercap is NA? If so, list them.

gap_joined %>% 
  filter(is.na(lifeExp) | is.na(gdpPercap))  # there is no NA data
## [1] country   year      lifeExp   gdpPercap
## <0 rows> (or 0-length row.names)

Task 4.5: Propose one way an economist could handle these missing values. What are the trade-offs of your proposed method?

Your answer:

I researched about it and saw many methods. Every method has a benefit but aslo has a side that creates an issue. The first answer i will be search for is for the question: why the data is missing? If it is by coincidence i will delete the missing data but it will also have a consequence. Maybe the year was missing there was a huge case going on - like pandemic. If it is not by coincidence i will choose proxy method which you use another dataset that has a high correlation with the original dataset.


Part 5: Economic Interpretation (15 points)

Write a short paragraph (5‑8 sentences) addressing the following questions. Use evidence from your analysis in Parts 3 and 4 to support your claims.

Your paragraph:

growth_rate <- gap_tidy %>% 
  filter(year%in% c(1952, 2007)) %>%
  group_by(continent, year) %>% 
  summarize(avg_gdp = mean(gdpPercap, na.rm = TRUE),
            .groups = "drop") %>% 
  pivot_wider(names_from = year, values_from = avg_gdp, names_prefix = "gdp_") %>% 
  mutate(growth= gdp_2007 - gdp_1952 / gdp_1952)
  
growth_rate
## # A tibble: 5 × 4
##   continent gdp_1952 gdp_2007 growth
##   <chr>        <dbl>    <dbl>  <dbl>
## 1 Africa       1253.    3089.  3088.
## 2 Americas     4079.   11003. 11002.
## 3 Asia         5195.   12473. 12472.
## 4 Europe       5661.   25054. 25053.
## 5 Oceania     10298.   29810. 29809.
# filter(year%in% c(1952, 2007)) only gets the years 1952 and 2007
# group_by(continent, year) grouping by continent and year
# summarize(avg_gdp = mean(gdpPercap, na.rm = TRUE) calculates average gdp per capita for each continent and year. 
# na.rm = TRUE removes missing variables
# creates a new column called avg_gdp
# .groups = "drop" ends grouping
# pivot_wider(names_from = year, values_from = avg_gdp, names_prefix = "gdp_") makes the table wide
# names_from = year makes new column name year
# values_from = avg_gdp assingns the values to that new column
# names_prefix = "gdp_" adds gdp_ in the columns name (left handed side)
# mutate(growth= gdp_2007 - gdp_1952) creates a column named growth

I have calculated the growth rate with the help of ChatGPT. I did some part of the codes and AI added pipe operator and redesigned pivot_wider() code. Oceania has the highest growth rate. Between these years most Oceania countries started to gain independence in 1950s. Also between these years Oceania got high migration, exported natural resources and service sector developed. Australia has a huge impact on gaining population after the second world war especially with their “populate or perish” policy. In that time Asia wanted to industrialize so they traded their natural resources(especially Austrlia and NZ) to Asia. Also they focused on agriculture. They developed health technologies. Tourism in Pasific Islands gained attention.

From the correlation i did earlier,

continent_info <- gap_tidy %>% 
  select(country, continent) %>% 
  distinct()

corelation_by_continent <- gap_tidy |>
  group_by(continent) |>
  summarize(
    correlation = cor(gdpPercap, lifeExp, use = "complete.obs"),
    n_obs = n(),
    .groups = "drop"
  )


corelation_by_continent
## # A tibble: 5 × 3
##   continent correlation n_obs
##   <chr>           <dbl> <int>
## 1 Africa          0.426   624
## 2 Americas        0.558   300
## 3 Asia            0.382   396
## 4 Europe          0.781   360
## 5 Oceania         0.956    24

I can see that oceania has the highest correlation between gdp per capita and expected life. High gdp in Oceania is expected because of australia and New Zeland. If the government is spending for its publics health, hygene and other things like that we can expect a high correlation between them

The limitation is, data is not showing all the factors had been going through. It only shows gdp per capita and expected life time. It doesn not show factors like war, migration, healht issues, inequalities etc. And it has a limited time between 1952 and 2007. World war II just ended 7 before. inner_join() function dropped some variables. We can see correlation but we cannot see the causes behind it. Data quality is not that bad but its repeating in 5 years row is a limitation. Because of this we cannot see short term trends.


Part 6: Reproducibility (5 points)

Before submitting, check that your document meets these requirements:


Academic Integrity Reminder

You are encouraged to discuss concepts with classmates, but your submitted work must be your own. If you use AI assistants (ChatGPT, Copilot, etc.), you must include an AI Use Log at the end of your document documenting:

Tool Used ————————— Prompt Given ———————————- How You Verified or Modified the Output
Tool Used:Gemini
Prompt Given: what happened in oceania between 1952 and 2007

Tool Used: ChatGpt

Prompt Given: Which continent has seen the most dramatic economic growth since 1952? (Look at the numbers – don’t just guess.), how can I calculate the growth rate

How You Verified or Modified the Output: growth_rate <- gap_tidy %>% filter(year %in% c(1952, 2007)) %>% group_by(continent, year) %>% summarize(avg_gdp = mean(gdpPercap, na.rm = TRUE), .groups = “drop”) %>% pivot_wider(names_from = year, values_from = avg_gdp, names_prefix = “gdp_”) %>% mutate( absolute_growth = gdp_2007 - gdp_1952, growth_rate = ((gdp_2007 - gdp_1952)) ) ————————————————————————————————————————————-

Submission Checklist


Glossary of Functions Used

Function What it does
select() Keeps only specified columns
filter() Keeps rows that meet conditions
mutate() Adds or modifies columns
pivot_longer() Reshapes wide to long
group_by() Groups data for subsequent operations
summarize() Reduces grouped data to summary stats
inner_join() Combines two tables, keeping matching rows
distinct() Keeps unique rows
slice_max() Keeps rows with highest values
arrange() Sorts rows
contains() Helper for selecting columns with a pattern