The goal is to test your software installation, to demonstrate competency in Markdown, and in the basics of ggplot.

R and RStudio installation

You should successfully install R and R studio in your computer. We will do all of our work in this class with the open source (and free!) programming language R. However, we will use RStudio software application, an Integrated Development Environment (IDE), which allows us to seamlessly interact with R and write code in a pleasant environment.

Install tidyverse and gapminder packages

The basic installation of R is known as base R. (If you haven’t already done so) we need to install a couple of packages, namely tidyverse and gapminder. Go to the packages panel in the bottom right of RStudio, click on “Install,” type tidyverse, and press enter. Once it finishes, install gapminder. You’ll see a bunch of output in the RStudio console as all the packages are installed.

You can also just paste and run these two commands

  • install.packages("tidyverse")
  • install.packages("gapminder")

in the console (bottom left in RStudio) instead of using the packages panel.

You can find details on R packages here

Practice using Markdown

Written assignments will be submitted using Markdown. Markdown is a lightweight text formatting language that easily converts between file formats. It is integrated directly into R Markdown, which combines R code, output, and written text into a single document (.Rmd).

There is a very nice Markdwown tutorial that I suggest you go through before working on your assignment. If you want to use a stand-alone Markdown editor Typora is a lightweight Markdown editor that inherently supports pandoc-flavoured Markdown.

Pandoc

Pandocis a program that converts Markdown files into basically anything else. It was created by John MacFarlane, a philosophy professor at the University of California, Berkeley and is widely used as a writing tool and as a basis for publishing workflow. Kieran Healy’s Plain Text Social Science workflow describes how to use Markdown and then convert your Markdown document to HTML, PDF, word, etc.

You should create a file whose name will be your Name_Surname.Rmd.

Task 1: Short biography written using markdown

You should write within this Rmd file a brief biography of yourself using markdown syntax. I know you have already achieved a lot, but a couple of paragraphs is more than enough.

To achieve full marks, you should include at least 4 of the following elements:

  • Headers
  • Emphasis (italics or bold)
  • Bullet points
  • Links
  • Embeding images

Please write your short biography after this blockquote.

####An introduction of my me, Henrik

My name is Henrik and I am a 25 year old man from Stavanger, Norway. Growing up, I have been extremely passionate about footbal. I even played competitive for several years dreaming of playing in the Premier League one day. My favorite team is Manchester United. In addition to footbal, I have become increasingly passionate about cooking, and I have the most fun when I make food with my gorlfriend, whom used to go to cooking school in Copenhagen several years ago. My favorite cuisines are: * Italian cuisine * Japanese cuisine, and * Mexican cuisine

“Life is a combination of magic and pasta.” – Federico Fellini

Professionally, I have worked for a top-tier management consulting firm in Norway, Implement Consulting Group. At ICG, I specialised in Strategy and M&A, mostly working for our Private Equity clients doing Commerical due diligences and market studies. Before moving to London for studies, I decided to try something new. Therefore I went to Kukula Capital in Zambia to volunteer and gain more hands-on transaction experience. Now however, I am reading for the Global Master in Management programme at London Business School, looking forward to two years of networking, learning, and adventures!

Task 2: gapminder country comparison

You have seen the gapminder dataset that has data on life expectancy, population, and GDP per capita for 142 countries from 1952 to 2007. To get a glipmse of the dataframe, namely to see the variable names, variable types, etc., we use the glimpse function. We also want to have a look at the first 20 rows of data.

glimpse(gapminder)
## Rows: 1,704
## Columns: 6
## $ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
head(gapminder, 20) # look at the first 20 rows of the dataframe
## # A tibble: 20 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## 11 Afghanistan Asia       2002    42.1 25268405      727.
## 12 Afghanistan Asia       2007    43.8 31889923      975.
## 13 Albania     Europe     1952    55.2  1282697     1601.
## 14 Albania     Europe     1957    59.3  1476505     1942.
## 15 Albania     Europe     1962    64.8  1728137     2313.
## 16 Albania     Europe     1967    66.2  1984060     2760.
## 17 Albania     Europe     1972    67.7  2263554     3313.
## 18 Albania     Europe     1977    68.9  2509048     3533.
## 19 Albania     Europe     1982    70.4  2780097     3631.
## 20 Albania     Europe     1987    72    3075321     3739.

I have created the country_data and continent_data with the code below.

country_data <- gapminder %>% 
            filter(country == "Russia") # just choosing Russia, as this is where I come from

continent_data <- gapminder %>% 
            filter(continent == "Europe")

Your task is to produce two graphs of how life expectancy has changed over the years for the country and the continent you come from. First, create a plot of life expectancy over time for the single country you chose. You should use geom_point() to see the actual data points and geom_smooth(se = FALSE) to plot the underlying trendlines. You need to remove the comments # from the lines below for your code to run.

library(ggplot2)
library(dplyr)
library(gapminder)
Norway_data1 <- gapminder %>%
  filter(country == "Norway")
Europe_data1 <- gapminder %>%
  filter(continent == "Europe")

plot_1 <- ggplot(Norway_data1, aes(x = year, y = lifeExp))+
geom_point() +
geom_smooth(se = FALSE)

print(plot_1)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Next we need to add a title. Create a new plot, or extend plot1, using the labs() function to add an informative title to the plot.

plot_2 <- ggplot(Norway_data1, aes(x = year, y = lifeExp))+
geom_point() +
geom_smooth(se = FALSE) + 
  labs(
    subtitle = "Life Expectancy in Norway grew by ~18 years since 1952",
    x = "Year", y = "Life Expectancy"
  )

print(plot_2)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Secondly, produce a plot for all countries in the continent you come from. (Hint: map the country variable to the colour aesthetic).

plot_3 <- ggplot(Europe_data1 ,aes(x = year, y = lifeExp, colour= country))+
geom_line() + 
geom_smooth(se = FALSE) +
  labs(
    subtitle = "Development of Life Expectancy in Europe",
    x = "Year",
    y = "Life Expectancy"
  )
print(plot_3)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Finally, using the original gapminder data, produce a life expectancy over time graph, grouped (or faceted) by continent. We will remove all legends, adding the theme(legend.position = "none") in the end of our ggplot.

plot_4 <- ggplot(gapminder, aes(x = year , y = lifeExp, colour = country ))+
geom_line() + 
geom_smooth(se = FALSE) +
facet_wrap(~continent) +
theme(legend.position="none") #remove all legends

print(plot_4)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Given these trends, what can you say about life expectancy since 1952? Again, don’t just say what’s happening in the graph. Tell some sort of story and speculate about the differences in the patterns.

Type your answer after this blockquote.

####The Global Story Since 1952

From the early 1950s to the 2000s, the world experienced a dramatic rise in life expectancy almost everywhere — but the pace and stability of improvement varied sharply by region.

#####Europe & Oceania: Early Leaders, Steady Gains

  • Europe and Oceania (Australia & New Zealand) already had relatively high life expectancies in 1952 (mid-60s to 70s).

  • Their trajectories are smoother and more consistent, presumingly reflecting stable economies, established healthcare systems, lower infant mortality, and strong public health infrastructure.

  • Small but steady improvements — vaccines, antibiotics, etc. — added years gradually rather than dramatically.

#####Americas: Catching Up

  • The Americas started lower than Europe but rose quickly, converging somewhat by the 2000s.

  • Countries in the Americas are catching up, where improvements in sanitation, and declines in infectious diseases probably were key drivers.

  • Inequalities persisted — some Latin American countries improved more slowly, reflecting political instability and uneven healthcare access.

#####Asia: Rapid Progress with Setbacks

  • Asia shows remarkable improvement, starting from a low baseline in the 1950s (~40–50 years).

  • Post-war recovery, widespread vaccination, and later economic growth in East Asia (Japan, South Korea, Singapore, China) drove big leaps.

  • But there are visible dips — wars (Vietnam, Cambodia, Afghanistan), and later the HIV/AIDS epidemic in parts of Asia most likely slowed improvement for specific countries.

#####Africa: Progress Interrupted

  • Africa shows the most volatility. Life expectancy climbed from the 1950s through the 1970s but stalled — and in some cases fell sharply in the 1980s–1990s.

  • The HIV/AIDS crisis devastated southern Africa, cutting decades off life expectancy in some nations during this period.

  • Political instability, poverty, and limited healthcare access most likely where key drivers for the setbacks. Still, post-2000, there’s a recovery trend as HIV treatment, malaria control, and international aid improves.

####Conclusion: The gap is closing

Globally, the “health gap” narrowed — countries that started far behind made the largest absolute gains. But new regional disparities emerged: while Europe and Oceania pushed into the 80s, many African countries struggled to climb out of the 50s.

Life expectancy is not just biology — it’s shaped by economics, politics, epidemics, wars, and global cooperation.

Task 3: Brexit voting

We will have a quick look at the results of the 2016 Brexit vote in the UK. First we read the data using read_csv() and have a quick glimpse at the data

library(tidyverse)
getwd()
## [1] "C:/Users/henri/AppData/Local/Temp/5ac36198-0921-4292-902d-2603977ab767_pre_programme_assignment-1.zip.767/pre_programme_assignment"
brexit_results <- read_csv("Data/brexit_results.csv")
glimpse(brexit_results)
## Rows: 632
## Columns: 11
## $ Seat        <chr> "Aldershot", "Aldridge-Brownhills", "Altrincham and Sale W…
## $ con_2015    <dbl> 50.592, 52.050, 52.994, 43.979, 60.788, 22.418, 52.454, 22…
## $ lab_2015    <dbl> 18.333, 22.369, 26.686, 34.781, 11.197, 41.022, 18.441, 49…
## $ ld_2015     <dbl> 8.824, 3.367, 8.383, 2.975, 7.192, 14.828, 5.984, 2.423, 1…
## $ ukip_2015   <dbl> 17.867, 19.624, 8.011, 15.887, 14.438, 21.409, 18.821, 21.…
## $ leave_share <dbl> 57.89777, 67.79635, 38.58780, 65.29912, 49.70111, 70.47289…
## $ born_in_uk  <dbl> 83.10464, 96.12207, 90.48566, 97.30437, 93.33793, 96.96214…
## $ male        <dbl> 49.89896, 48.92951, 48.90621, 49.21657, 48.00189, 49.17185…
## $ unemployed  <dbl> 3.637000, 4.553607, 3.039963, 4.261173, 2.468100, 4.742731…
## $ degree      <dbl> 13.870661, 9.974114, 28.600135, 9.336294, 18.775591, 6.085…
## $ age_18to24  <dbl> 9.406093, 7.325850, 6.437453, 7.747801, 5.734730, 8.209863…

The data comes from Elliott Morris, who cleaned it and made it available through his DataCamp class on analysing election and polling data in R.

Our main outcome variable (or y) is leave_share, which is the percent of votes cast in favour of Brexit, or leaving the EU. Each row is a UK parliament constituency.

To get a sense of the spread of the data, plot a histogram and a density plot of the leave share in all constituencies.

ggplot(brexit_results, aes(x = leave_share)) +
  geom_histogram(binwidth = 2.5) +
  labs(title = "Constituencies voting to leave the EU",
       subtitle = "Columns = number of observations",
       x = "Share of constituencies voting to leave EU", y = "Number of constituencies")

ggplot(brexit_results, aes(x = leave_share)) +
  geom_density() +
  labs(title = "Constituencies are positive to leaving the EU",
       subtitle = "Thin line = density",
       x = "Share of constituencies voting to leave EU", y = "Density of leave share")

One common explanation for the Brexit outcome was fear of immigration and opposition to the EU’s more open border policy. We can check the relationship between the proportion of native born residents (born_in_uk) in a constituency and its leave_share. To do this, let us get the correlation between the two variables:

brexit_results %>% 
  select(leave_share, born_in_uk) %>% 
  cor()
##             leave_share born_in_uk
## leave_share   1.0000000  0.4934295
## born_in_uk    0.4934295  1.0000000

The correlation is almost 0.5, which shows that the two variables are positively correlated.

We can also create a scatterplot between these two variables using geom_point. We also add the best fit line, using geom_smooth(method = "lm").

ggplot(brexit_results, aes(x = born_in_uk, y = leave_share)) +
  geom_point(alpha=0.3) +
  geom_smooth(method = "lm") +
  theme_bw() +
  labs(title = "UK-born Constituencies vote to leave the EU",
       subtitle = "Thin line = trend; black dots = observations",
       x = "Share of constituencies born in the UK", y = "Share of constituencies voting for brexit")
## `geom_smooth()` using formula = 'y ~ x'

You have the code for the plots, I would like you to revisit all of them and use the labs() function to add an informative title, sub-title, and axes titles to all plots.

What can you say about the relationship shown above? Again, don’t just say what’s happening in the graph. Tell some sort of story and speculate about the differences in the patterns.

Type your answer after this blockquote.

Looking at the data from the 2016 election, it is supporting the claim that constituencies with a large share of UK born citizens vote for Brexit. The correlation between the two variables is ~0.5, and looking at the graphs we can see that the trend line is positive. The larger the share of UK born citizens in a constituency, the larger share vote for brexit. One explanation can definitely be fear of immigration and opposition to the EU’s more open border policy. That is the easy explanation at least.

One can assume that the reason constituencies with a larger share of UK-born members are more positive to brexit, can be that they believe number of immigrants in an area correlates with criminality in that area. It could be that the handling of immigrants coming to the UK have not been working well. That immigrants have a hard time being integrated in the UK culture, finding a job, and that getto’s quickly tend to form.

Another possible explanation can be that immigrants tend to “steal jobs from UK-born citizens”. This hostile mindset is very common in the US at least, and could be an explanation for why UK-born constituency elects are positive to brexit, and therefore implicitly negative to immigration.

Submit the assignment

Knit the completed R Markdown file as ah HTML or pdf document (use the “Knit” button at the top of the script editor window) and upload it to Canvas.

Rubric

Check minus: Name_Surname.Rmd just has a paragraph or two of plain text with no formatting, bullet points, links, etc. You used the hashtag #, which is a chapter, and the >, which is a blockquote, inappropriately.

Check: something in between.

Check plus: Name_Surname.Rmd provides a proper introduction of student to the class. It also demonstrates experimentation with 4 or more aspects of the Markdown syntax. Examples: section headers, links, bold, italic, bullet points, image embed, etc. The student offers a few reflections on their experience with Markdown (e.g. did you manage to knit both to HTML and pdf?).

Details

If you want to, please answer the following

  • Who did you collaborate with: TYPE NAMES HERE
  • How much time did you spend on each of the 3 Datacamp chapters and on this preliminary assignment completion: ANSWER HERE
  • What, if anything, gave you the most trouble: ANSWER HERE