title: “ASSIGNMENT_CEMRENURHASCAN” author: “CEMRE NUR HASCAN” date: “2026-03-10” output: html_document — ## The Economic Question

How have GDP per capita and life expectancy evolved across different continents since 1952? Which continents have seen the fastest growth, and which countries are outliers?

Part 1: Setup and Data Loading (5 points)

# Load the tidyverse package
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

# Import the wide Gapminder dataset
gapminder_wide <- read_csv("data/gapminder_wide.csv")

## Rows: 142 Columns: 26
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): country, continent
## dbl (24): gdpPercap_1952, gdpPercap_1957, gdpPercap_1962, gdpPercap_1967, gd...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#I imported the data

Task 1.1: Use ⁠ glimpse() ⁠ to examine the structure of ⁠ gapminder_wide ⁠. In your own words, describe what you see. How many rows and columns are there? What do the column names tell you about the data format?

glimpse(gapminder_wide)

## Rows: 142
## Columns: 26
## $ country        <chr> "Afghanistan", "Albania", "Algeria", "Angola", "Argenti…
## $ continent      <chr> "Asia", "Europe", "Africa", "Africa", "Americas", "Ocea…
## $ gdpPercap_1952 <dbl> 779.4453, 1601.0561, 2449.0082, 3520.6103, 5911.3151, 1…
## $ gdpPercap_1957 <dbl> 820.8530, 1942.2842, 3013.9760, 3827.9405, 6856.8562, 1…
## $ gdpPercap_1962 <dbl> 853.1007, 2312.8890, 2550.8169, 4269.2767, 7133.1660, 1…
## $ gdpPercap_1967 <dbl> 836.1971, 2760.1969, 3246.9918, 5522.7764, 8052.9530, 1…
## $ gdpPercap_1972 <dbl> 739.9811, 3313.4222, 4182.6638, 5473.2880, 9443.0385, 1…
## $ gdpPercap_1977 <dbl> 786.1134, 3533.0039, 4910.4168, 3008.6474, 10079.0267, …
## $ gdpPercap_1982 <dbl> 978.0114, 3630.8807, 5745.1602, 2756.9537, 8997.8974, 1…
## $ gdpPercap_1987 <dbl> 852.3959, 3738.9327, 5681.3585, 2430.2083, 9139.6714, 2…
## $ gdpPercap_1992 <dbl> 649.3414, 2497.4379, 5023.2166, 2627.8457, 9308.4187, 2…
## $ gdpPercap_1997 <dbl> 635.3414, 3193.0546, 4797.2951, 2277.1409, 10967.2820, …
## $ gdpPercap_2002 <dbl> 726.7341, 4604.2117, 5288.0404, 2773.2873, 8797.6407, 3…
## $ gdpPercap_2007 <dbl> 974.5803, 5937.0295, 6223.3675, 4797.2313, 12779.3796, …
## $ lifeExp_1952   <dbl> 28.801, 55.230, 43.077, 30.015, 62.485, 69.120, 66.800,…
## $ lifeExp_1957   <dbl> 30.33200, 59.28000, 45.68500, 31.99900, 64.39900, 70.33…
## $ lifeExp_1962   <dbl> 31.99700, 64.82000, 48.30300, 34.00000, 65.14200, 70.93…
## $ lifeExp_1967   <dbl> 34.02000, 66.22000, 51.40700, 35.98500, 65.63400, 71.10…
## $ lifeExp_1972   <dbl> 36.08800, 67.69000, 54.51800, 37.92800, 67.06500, 71.93…
## $ lifeExp_1977   <dbl> 38.43800, 68.93000, 58.01400, 39.48300, 68.48100, 73.49…
## $ lifeExp_1982   <dbl> 39.854, 70.420, 61.368, 39.942, 69.942, 74.740, 73.180,…
## $ lifeExp_1987   <dbl> 40.822, 72.000, 65.799, 39.906, 70.774, 76.320, 74.940,…
## $ lifeExp_1992   <dbl> 41.674, 71.581, 67.744, 40.647, 71.868, 77.560, 76.040,…
## $ lifeExp_1997   <dbl> 41.763, 72.950, 69.152, 40.963, 73.275, 78.830, 77.510,…
## $ lifeExp_2002   <dbl> 42.129, 75.651, 70.994, 41.003, 74.340, 80.370, 78.980,…
## $ lifeExp_2007   <dbl> 43.828, 76.423, 72.301, 42.731, 75.320, 81.235, 79.829,…

Your answer: :

⁠I can see that the dataset has 142 rows and 26 columns. When I run this function, a table appears showing the columns and their values. The first columns are country and continent, followed by GDP per capita and life expectancy for different years (for example gdpPercap_1952 to gdpPercap_2007 and lifeExp_1952 to lifeExp_2007). The data looks a bit complex because the information for each year is stored in separate columns.

Part 2: Data Tidying with ⁠ .value ⁠ (20 points)

In the lab, you learned how to use ⁠ pivot_longer() ⁠ with the ⁠ .value ⁠ sentinel to reshape wide data into tidy format.

Task 2.1: Write code to transform ⁠ gapminder_wide ⁠ into a tidy dataset with columns: ⁠ country ⁠, ⁠ continent ⁠, ⁠ year ⁠, ⁠ gdpPercap ⁠, and ⁠ lifeExp ⁠. Show the first 10 rows of your tidy dataset.

gap_tidy <- gapminder_wide %>% 
  pivot_longer(
    cols = -c(country, continent),
    names_to = c(".value", "year"),
    names_sep = "_",
    values_drop_na = FALSE
  ) %>% 
  mutate(year = as.numeric(year))

glimpse(gap_tidy)

## Rows: 1,704
## Columns: 5
## $ country   <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <chr> "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asia", "Asi…
## $ year      <dbl> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…

head(gap_tidy, 10)

## # A tibble: 10 × 5
##    country     continent  year gdpPercap lifeExp
##    <chr>       <chr>     <dbl>     <dbl>   <dbl>
##  1 Afghanistan Asia       1952      779.    28.8
##  2 Afghanistan Asia       1957      821.    30.3
##  3 Afghanistan Asia       1962      853.    32.0
##  4 Afghanistan Asia       1967      836.    34.0
##  5 Afghanistan Asia       1972      740.    36.1
##  6 Afghanistan Asia       1977      786.    38.4
##  7 Afghanistan Asia       1982      978.    39.9
##  8 Afghanistan Asia       1987      852.    40.8
##  9 Afghanistan Asia       1992      649.    41.7
## 10 Afghanistan Asia       1997      635.    41.8

# The pipe operator (%>%) sends the dataset (gapminder_wide) to the next function.
# pivot_longer() reshapes the data from wide format to a tidy/long format.
# The dataset originally has many columns like gdpPercap_1952, gdpPercap_1957, lifeExp_1987 etc.

# cols = -c(country, continent) means these two columns are excluded from the pivot
# because they do not contain year information.

# names_to = c(".value", "year") splits the column names into two parts.
# For example lifeExp_1987 becomes lifeExp and 1987.
# The .value part keeps lifeExp and gdpPercap as separate column names,
# while the second part becomes the year column.

# names_sep = "_" tells R to split the column names using the underscore.

# values_drop_na = FALSE means missing values are kept in the dataset.

# mutate(year = as.numeric(year)) converts the year column into numeric format.

Task 2.2: Explain in 2-3 sentences what the ⁠ .value ⁠ sentinel does in your code. Why is it the right tool for this dataset?

Your answer:

⁠The ⁠ .value ⁠ argument keeps the first part of the column names as the new column names. This means variables like *lifeExp* and *gdpPercap* stay as separate columns. With ⁠ names_to = c(".value", "year") ⁠, the first part becomes the variable name and the second part becomes the *year* column. As a result, the dataset will have columns such as *gdpPercap, **lifeExp, and **year*. Using ⁠ .value ⁠ with ⁠ pivot_longer() ⁠ helps organize the data in a clearer and more readable format, which makes analysis easier.

Task 2.3: From your tidy dataset, filter to keep only observations from 1970 onwards for the following countries: ⁠ “Turkey” ⁠, ⁠ “Brazil” ⁠, ⁠ “Korea, Rep.” ⁠, ⁠ “Germany” ⁠, ⁠ “United States” ⁠, ⁠ “China” ⁠. Save this filtered dataset as ⁠ gap_filtered ⁠.

gap_filtered <- gap_tidy %>% 
  filter(country %in%  c("Turkey", "Brazil", "Korea, Rep.", "Germany", "United States", "China"),
         year >= 1970)

gap_filtered

## # A tibble: 48 × 5
##    country continent  year gdpPercap lifeExp
##    <chr>   <chr>     <dbl>     <dbl>   <dbl>
##  1 Brazil  Americas   1972     4986.    59.5
##  2 Brazil  Americas   1977     6660.    61.5
##  3 Brazil  Americas   1982     7031.    63.3
##  4 Brazil  Americas   1987     7807.    65.2
##  5 Brazil  Americas   1992     6950.    67.1
##  6 Brazil  Americas   1997     7958.    69.4
##  7 Brazil  Americas   2002     8131.    71.0
##  8 Brazil  Americas   2007     9066.    72.4
##  9 China   Asia       1972      677.    63.1
## 10 China   Asia       1977      741.    64.0
## # ℹ 38 more rows

# filter() keeps only the rows that satisfy certain conditions.
# country %in% c(...) is used to filter specific countries.
# c() creates a list of countries and %in% checks if the country is in that list.
# As a result, the dataset keeps only these six countries.

Part 3: Grouped Summaries (25 points)

Now you will use group_by() and summarize() to answer questions about continents and countries.

Task 3.1: Calculate the average GDP per capita and average life expectancy for each continent across all years (use the full tidy dataset, not the filtered one).

average_gdp_and_lifeExp <- gap_tidy %>% 
  group_by(continent) %>% 
  summarize(
    average_gdp = mean(gdpPercap, na.rm = TRUE),
    average_lifeExp = mean(lifeExp, na.rm = TRUE),
    .groups = "drop"
  )

average_gdp_and_lifeExp

## # A tibble: 5 × 3
##   continent average_gdp average_lifeExp
##   <chr>           <dbl>           <dbl>
## 1 Africa          2194.            48.9
## 2 Americas        7136.            64.7
## 3 Asia            7902.            60.1
## 4 Europe         14469.            71.9
## 5 Oceania        18622.            74.3

# group_by(continent) groups the data by continent.
# summarize() creates a summary table where each continent has its own row.
# mean() calculates the average values, and na.rm = TRUE ignores missing values.
# .groups = "drop" removes the grouping and returns a regular table.

 Questions to answer: - Which continent has the highest average GDP per capita? - Which continent has the highest average life expectancy? - Are these the same continent? Why might that be?

Your answer:

Oceania has the highest average GDP per capita and life expectancy. This result surprised me at first. However, countries like Australia and New Zealand are highly developed economies, which increases the regional average. Australia is one of the world’s largest exporters of minerals and energy, and its relatively small population raises GDP per capita. In addition, these countries have strong healthcare systems and high government spending on public health. Higher income levels often allow governments to invest more in healthcare and social services, which can increase life expectancy.

Task 3.2: Find the 5 countries with the highest average GDP per capita across all years. Show the country name and its average GDP per capita.

highest_avg_gdp_top_5 <- gap_tidy %>% 
  group_by(country) %>% 
  summarise(
    avg_gdp = mean(gdpPercap, na.rm = TRUE),
    .groups = "drop"
  ) %>% 
  slice_max(avg_gdp, n=5)

highest_avg_gdp_top_5

## # A tibble: 5 × 2
##   country       avg_gdp
##   <chr>           <dbl>
## 1 Kuwait         65333.
## 2 Switzerland    27074.
## 3 Norway         26747.
## 4 United States  26261.
## 5 Canada         22411.

# slice_max() keeps rows with the highest value

⁠ Look at your result: Do any of these countries surprise you? Why might small, wealthy countries appear at the top?

Your answer:

When I look at the table, I see that Kuwait, Norway, and Switzerland have higher GDP per capita than the United States and Canada. Although the US and Canada have very large economies, GDP per capita is calculated by dividing total GDP by population. Countries like Kuwait, Norway, and Switzerland have much smaller populations, which increases their GDP per capita. Kuwait being at the top does not surprise me because it has large oil reserves and high living standards. Norway also earns significant income from exporting natural gas and crude oil. Switzerland, on the other hand, does not have many natural resources but has a strong economy based on high value-added services such as banking and finance. The US and Canada have high GDP overall, but their large populations lower their GDP per capita.

Task 3.3: Calculate the correlation between GDP per capita and life expectancy for each continent. Use the full tidy dataset.

```{r}} continent_info <- gap_tidy %>% select(country, continent) %>% distinct()

corelation_by_continent <- gap_tidy |> group_by(continent) |> summarize( correlation = cor(gdpPercap, lifeExp, use = “complete.obs”), n_obs = n(), .groups = “drop” )

corelation_by_continent

distinct() removes duplicated rows.

cor() calculates the correlation between two variables. The value is between -1 and 1.

If it is close to 1, it shows a strong positive correlation.

If it is close to 0, it means there is little or no relationship.

If it is close to -1, it shows a strong negative correlation.

use = “complete.obs” means the function only uses rows where both variables have values.

n_obs = n() counts the number of observations for each continent.

⁠ **Questions to answer:** - In which continent is the relationship strongest (highest correlation)? - In which continent is it weakest? - What might explain the differences between continents?

**Your answer:** 


> Oceania has the highest correlation, while Asia has the lowest. In general, as GDP per capita increases, living standards and life expectancy also increase. Money itself does not buy a longer life, but it can provide better conditions such as healthcare, hygiene, clean water, and infrastructure. A strong correlation suggests that economic growth is being used to improve public welfare. A weaker correlation may indicate that the wealth is not equally distributed or not fully invested in public services. In Asia, some countries like Japan and South Korea show a strong relationship between GDP per capita and life expectancy. However, in countries such as India, life expectancy does not increase at the same rate as GDP per capita. This difference may be explained by income inequality and the very large population of the continent. Oceania has fewer observations, which suggests that countries like Australia and New Zealand dominate the results.

------------------------------------------------------------------------

## Part 4: Data Integration (20 points)

Now you will practice joining two separate datasets: one containing only life expectancy, and one containing only GDP per capita.

**Task 4.1:** Import `gap_life.csv` and `gap_gdp.csv`. Use `glimpse()` to examine each one.


``` r
gap_life <- read.csv("data/gap_life.csv")
glimpse(gap_life)

## Rows: 1,618
## Columns: 3
## $ country <chr> "Mali", "Malaysia", "Zambia", "Greece", "Swaziland", "Iran", "…
## $ year    <int> 1992, 1967, 1987, 2002, 1967, 1997, 2007, 2007, 1957, 2002, 19…
## $ lifeExp <dbl> 48.388, 59.371, 50.821, 78.256, 46.633, 68.042, 73.747, 78.098…

gap_gdp <- read.csv("data/gap_gdp.csv")
glimpse(gap_gdp)

## Rows: 1,618
## Columns: 3
## $ country   <chr> "Bangladesh", "Mongolia", "Taiwan", "Burkina Faso", "Angola"…
## $ year      <int> 1987, 1997, 2002, 1962, 1962, 1977, 2007, 1962, 1992, 1972, …
## $ gdpPercap <dbl> 751.9794, 1902.2521, 23235.4233, 722.5120, 4269.2767, 2785.4…

⁠ Task 4.2: Use inner_join() to combine them into a dataset called gap_joined. Join by the columns they have in common.

gap_joined <- inner_join(gap_life, gap_gdp, by = c("country", "year"))

# Combines two tables, keeping matching rows

⁠ Task 4.3: Answer the following: - How many rows are in gap_joined? - How many unique countries are in gap_joined? - Compare this to the original number of rows in gap_life.csv and gap_gdp.csv. Why might the joined dataset have fewer rows?

nrow(gap_joined) #how many rows - 1535

## [1] 1535

nrow(gap_life) # - 1618 - original dataset

## [1] 1618

nrow(gap_gdp) # - 1618 - original dataset

## [1] 1618

unique(gap_joined$country) #unique countries

##   [1] "Mali"                     "Malaysia"                
##   [3] "Zambia"                   "Greece"                  
##   [5] "Swaziland"                "Iran"                    
##   [7] "Venezuela"                "Portugal"                
##   [9] "Sweden"                   "Brazil"                  
##  [11] "Pakistan"                 "Algeria"                 
##  [13] "Equatorial Guinea"        "Botswana"                
##  [15] "Haiti"                    "Saudi Arabia"            
##  [17] "Korea, Dem. Rep."         "Niger"                   
##  [19] "Congo, Dem. Rep."         "United States"           
##  [21] "Eritrea"                  "Trinidad and Tobago"     
##  [23] "Colombia"                 "Panama"                  
##  [25] "Comoros"                  "Italy"                   
##  [27] "Nicaragua"                "Gambia"                  
##  [29] "Iceland"                  "Bosnia and Herzegovina"  
##  [31] "Hong Kong, China"         "El Salvador"             
##  [33] "Myanmar"                  "Croatia"                 
##  [35] "Finland"                  "South Africa"            
##  [37] "Ireland"                  "United Kingdom"          
##  [39] "Liberia"                  "Libya"                   
##  [41] "Malawi"                   "Norway"                  
##  [43] "India"                    "Guatemala"               
##  [45] "Netherlands"              "Japan"                   
##  [47] "Mauritania"               "Ghana"                   
##  [49] "Taiwan"                   "Paraguay"                
##  [51] "Morocco"                  "Cuba"                    
##  [53] "Guinea"                   "Denmark"                 
##  [55] "Chad"                     "Zimbabwe"                
##  [57] "Yemen, Rep."              "Austria"                 
##  [59] "Bahrain"                  "Egypt"                   
##  [61] "Angola"                   "Reunion"                 
##  [63] "Senegal"                  "Gabon"                   
##  [65] "Albania"                  "Serbia"                  
##  [67] "Lebanon"                  "Germany"                 
##  [69] "Jamaica"                  "Canada"                  
##  [71] "Montenegro"               "Rwanda"                  
##  [73] "New Zealand"              "Syria"                   
##  [75] "Spain"                    "Slovak Republic"         
##  [77] "Kenya"                    "Guinea-Bissau"           
##  [79] "Cote d'Ivoire"            "Sri Lanka"               
##  [81] "Switzerland"              "Afghanistan"             
##  [83] "Mozambique"               "Togo"                    
##  [85] "Namibia"                  "Tunisia"                 
##  [87] "Uganda"                   "Mongolia"                
##  [89] "Bulgaria"                 "Sao Tome and Principe"   
##  [91] "Uruguay"                  "Nepal"                   
##  [93] "West Bank and Gaza"       "Iraq"                    
##  [95] "Oman"                     "Burkina Faso"            
##  [97] "Cameroon"                 "Philippines"             
##  [99] "Kuwait"                   "Vietnam"                 
## [101] "Benin"                    "Dominican Republic"      
## [103] "Turkey"                   "Somalia"                 
## [105] "Tanzania"                 "Puerto Rico"             
## [107] "Jordan"                   "Peru"                    
## [109] "Cambodia"                 "Chile"                   
## [111] "Burundi"                  "China"                   
## [113] "Israel"                   "Australia"               
## [115] "Mexico"                   "Lesotho"                 
## [117] "Madagascar"               "Sierra Leone"            
## [119] "Korea, Rep."              "Ecuador"                 
## [121] "Slovenia"                 "Honduras"                
## [123] "France"                   "Belgium"                 
## [125] "Indonesia"                "Romania"                 
## [127] "Hungary"                  "Thailand"                
## [129] "Central African Republic" "Argentina"               
## [131] "Congo, Rep."              "Poland"                  
## [133] "Singapore"                "Bangladesh"              
## [135] "Bolivia"                  "Sudan"                   
## [137] "Mauritius"                "Nigeria"                 
## [139] "Djibouti"                 "Costa Rica"              
## [141] "Ethiopia"                 "Czech Republic"

⁠ Your answer: The gap_joined dataset has 1,535 rows and 142 unique countries. In the original datasets, both gap_life and gap_gdp have 1,618 rows. The difference occurs because inner_join() only keeps rows that exist in both datasets. Any row missing a match in either dataset is excluded. To be included, each row must have both a matching year and country. Task 4.4: Check for missing values in gap_joined. Are there any rows where lifeExp or gdpPercap is NA? If so, list them.

gap_joined %>% 
  filter(is.na(lifeExp) | is.na(gdpPercap))  # there is no NA data

## [1] country   year      lifeExp   gdpPercap
## <0 rows> (or 0-length row.names)

⁠ Task 4.5: Propose one way an economist could handle these missing values. What are the trade-offs of your proposed method?

Your answer:

I looked into this and found several ways to handle missing data. Each method has its advantages but also potential drawbacks. The first step is to ask why the data is missing. If it is missing randomly, I might remove those rows, but this can have consequences—for example, a missing year could coincide with an important event like a pandemic. If the data is not missing randomly, I would consider using a proxy method, which involves estimating missing values based on another dataset that is strongly correlated with the original data.

Part 5: Economic Interpretation (15 points)

Write a short paragraph (5‑8 sentences) addressing the following questions. Use evidence from your analysis in Parts 3 and 4 to support your claims.

Which continent has seen the most dramatic economic growth since 1952? (Look at the numbers – don’t just guess.)
Is there a clear relationship between GDP per capita and life expectancy across continents? Refer to your correlation results.
What are the main limitations of this analysis? Consider data quality, missing values, time period, and what the data can’t tell us.

Your paragraph:

growth_rate <- gap_tidy %>% 
  filter(year%in% c(1952, 2007)) %>%
  group_by(continent, year) %>% 
  summarize(avg_gdp = mean(gdpPercap, na.rm = TRUE),
            .groups = "drop") %>% 
  pivot_wider(names_from = year, values_from = avg_gdp, names_prefix = "gdp_") %>% 
  mutate(growth= gdp_2007 - gdp_1952 / gdp_1952)
  
growth_rate

## # A tibble: 5 × 4
##   continent gdp_1952 gdp_2007 growth
##   <chr>        <dbl>    <dbl>  <dbl>
## 1 Africa       1253.    3089.  3088.
## 2 Americas     4079.   11003. 11002.
## 3 Asia         5195.   12473. 12472.
## 4 Europe       5661.   25054. 25053.
## 5 Oceania     10298.   29810. 29809.

# filter(year %in% c(1952, 2007)) keeps only rows for 1952 and 2007.
# group_by(continent, year) groups the data by continent and year.
# summarize(avg_gdp = mean(gdpPercap, na.rm = TRUE)) calculates the average GDP per capita for each group.
# na.rm = TRUE ignores missing values.
# .groups = "drop" removes the grouping after summarizing.
# pivot_wider(names_from = year, values_from = avg_gdp, names_prefix = "gdp_") reshapes the table to wide format.
# The column names come from the years and values come from avg_gdp.
# names_prefix = "gdp_" adds "gdp_" before each year in the column name.
# mutate(growth = gdp_2007 - gdp_1952) creates a new column showing GDP growth between 1952 and 2007.

⁠ > I calculated the GDP growth rate with the help of ChatGPT. I wrote some of the code myself, and AI suggested using the pipe operator and modified the pivot_wider() part. Oceania shows the highest growth rate. Between 1952 and 2007, many countries in Oceania gained independence, experienced high migration, and developed their service sectors and resource exports. Australia had a major impact on population growth after World War II, especially through its “populate or perish” policy. During the same period, Asian countries focused on industrialization and imported natural resources from countries like Australia and New Zealand. Oceania also invested in agriculture, health technologies, and tourism, particularly in the Pacific Islands.

From the correlation I calculated earlier,…

continent_info <- gap_tidy %>% 
  select(country, continent) %>% 
  distinct()

corelation_by_continent <- gap_tidy |>
  group_by(continent) |>
  summarize(
    correlation = cor(gdpPercap, lifeExp, use = "complete.obs"),
    n_obs = n(),
    .groups = "drop"
  )



corelation_by_continent

## # A tibble: 5 × 3
##   continent correlation n_obs
##   <chr>           <dbl> <int>
## 1 Africa          0.426   624
## 2 Americas        0.558   300
## 3 Asia            0.382   396
## 4 Europe          0.781   360
## 5 Oceania         0.956    24

⁠Oceania has the highest correlation between GDP per capita and life expectancy, mainly due to Australia and New Zealand. When governments invest in public health, hygiene, and other social services, we can expect a strong relationship between income and life expectancy.

⁠However, there are several limitations to this analysis. The data only includes GDP per capita and life expectancy, and does not account for other important factors such as war, migration, health crises, or inequalities. The time range is limited to 1952–2007, and events like World War II, which ended just seven years before 1952, are not reflected. Using ⁠ inner_join() ⁠ also removed some observations. While we can see correlations, the data does not reveal the underlying causes. Additionally, the data is recorded in five-year intervals, which makes it difficult to observe short-term trends.

Part 6: Reproducibility (5 points)

Before submitting, check that your document meets these requirements:

Your Quarto document renders without errors (click “Render” one last time)
All file paths are relative (e.g., ⁠ data/gapminder_wide.csv ⁠)
Your code includes helpful comments explaining what each major step does
Your name appears in the YAML header

Academic Integrity Reminder

You are encouraged to discuss concepts with classmates, but your submitted work must be your own. If you use AI assistants (ChatGPT, Copilot, etc.), you must include an AI Use Log at the end of your document documenting:

Tool Used ————————— Prompt Given ———————————- How You Verified or Modified the Output
Tool Used:Gemini
Prompt Given: what happened in oceania between 1952 and 2007

Tool Used: ChatGpt

Prompt Given: Which continent has seen the most dramatic economic growth since 1952? (Look at the numbers – don’t just guess.), how can I calculate the growth rate

How You Verified or Modified the Output: growth_rate <- gap_tidy %>% filter(year %in% c(1952, 2007)) %>% group_by(continent, year) %>% summarize(avg_gdp = mean(gdpPercap, na.rm = TRUE), .groups = “drop”) %>% pivot_wider(names_from = year, values_from = avg_gdp, names_prefix = “gdp_”) %>% mutate( absolute_growth = gdp_2007 - gdp_1952, growth_rate = ((gdp_2007 - gdp_1952)) ) ————————————————————————————————————————————-

Submission Checklist

⁠ .qmd ⁠ file renders to HTML without errors
Your name appears in the YAML header
All code chunks run without errors
Code includes helpful comments
You have answered all questions in complete sentences
AI Use Log included (if AI was used)
⁠I completed the homework using RPubs, my ⁠ .qmd ⁠ file, and the notes I took in class. I used AI to help me understand the purpose and meaning of the R code, as well as to research information about countries, such as income inequality and Gini coefficients.

Glossary of Functions Used

Function	What it does
⁠ select() ⁠	Keeps only specified columns
⁠ filter() ⁠	Keeps rows that meet conditions
⁠ mutate() ⁠	Adds or modifies columns
⁠ pivot_longer() ⁠	Reshapes wide to long
⁠ group_by() ⁠	Groups data for subsequent operations
⁠ summarize() ⁠	Reduces grouped data to summary stats
⁠ inner_join() ⁠	Combines two tables, keeping matching rows
⁠ distinct() ⁠	Keeps unique rows
⁠ slice_max() ⁠	Keeps rows with highest values
⁠ arrange() ⁠	Sorts rows
⁠ contains() ⁠	Helper for selecting columns with a pattern

Assignment 1

Cemre Nur Hascan

2026-03-10

How have GDP per capita and life expectancy evolved across different continents since 1952? Which continents have seen the fastest growth, and which countries are outliers?

Part 1: Setup and Data Loading (5 points)

Part 2: Data Tidying with ⁠ .value ⁠ (20 points)

Part 3: Grouped Summaries (25 points)

distinct() removes duplicated rows.

cor() calculates the correlation between two variables. The value is between -1 and 1.

If it is close to 1, it shows a strong positive correlation.

If it is close to 0, it means there is little or no relationship.

If it is close to -1, it shows a strong negative correlation.

use = “complete.obs” means the function only uses rows where both variables have values.

n_obs = n() counts the number of observations for each continent.

Part 5: Economic Interpretation (15 points)

Part 6: Reproducibility (5 points)

Academic Integrity Reminder

Prompt Given: Which continent has seen the most dramatic economic growth since 1952? (Look at the numbers – don’t just guess.), how can I calculate the growth rate

Submission Checklist

Glossary of Functions Used

Assignment 1

Cemre Nur Hascan

2026-03-10

How have GDP per capita and life expectancy evolved across different continents since 1952? Which continents have seen the fastest growth, and which countries are outliers?

Part 1: Setup and Data Loading (5 points)

Part 2: Data Tidying with ⁠ .value ⁠ (20 points)

Part 3: Grouped Summaries (25 points)

distinct() removes duplicated rows.

cor() calculates the correlation between two variables. The value is between -1 and 1.

If it is close to 1, it shows a strong positive correlation.

If it is close to 0, it means there is little or no relationship.

If it is close to -1, it shows a strong negative correlation.

use = “complete.obs” means the function only uses rows where both variables have values.

n_obs = n() counts the number of observations for each continent.

Part 5: Economic Interpretation (15 points)

Part 6: Reproducibility (5 points)

Academic Integrity Reminder

Prompt Given: Which continent has seen the most dramatic economic growth since 1952? (Look at the numbers – don’t just guess.), how can I calculate the growth rate

Submission Checklist

Glossary of Functions Used

Part 2: Data Tidying with ⁠ .value ⁠ (20 points)