The goal is to test your software installation, to demonstrate
competency in Markdown, and in the basics of ggplot.
You should successfully install R and R studio in your computer. We will do all of our work in this class with the open source (and free!) programming language R. However, we will use RStudio software application, an Integrated Development Environment (IDE), which allows us to seamlessly interact with R and write code in a pleasant environment.
tidyverse and gapminder
packagesThe basic installation of R is known as base R. (If
you haven’t already done so) we need to install a couple of packages,
namely tidyverse and gapminder. Go to the
packages panel in the bottom right of RStudio, click on “Install,” type
tidyverse, and press enter. Once it finishes, install
gapminder. You’ll see a bunch of output in the RStudio
console as all the packages are installed.
You can also just paste and run these two commands
install.packages("tidyverse")install.packages("gapminder")in the console (bottom left in RStudio) instead of using
the packages panel.
You can find details on R packages here
Written assignments will be submitted using Markdown.
Markdown is a lightweight text formatting language that easily converts
between file formats. It is integrated directly into R Markdown, which combines R
code, output, and written text into a single document
(.Rmd).
There is a very nice Markdwown tutorial that I suggest you go through before working on your assignment. If you want to use a stand-alone Markdown editor Typora is a lightweight Markdown editor that inherently supports pandoc-flavoured Markdown.
Pandocis a program that converts Markdown files into basically anything else. It was created by John MacFarlane, a philosophy professor at the University of California, Berkeley and is widely used as a writing tool and as a basis for publishing workflow. Kieran Healy’s Plain Text Social Science workflow describes how to use Markdown and then convert your Markdown document to HTML, PDF, word, etc.
You should create a file whose name will be your
Name_Surname.Rmd.
You should write within this Rmd file a brief biography of yourself using markdown syntax. I know you have already achieved a lot, but a couple of paragraphs is more than enough.
To achieve full marks, you should include at least 4 of the following elements:
Please write your short biography after this blockquote. # Meet Aboli Mandage
Hi, I’m Aboli Mandage and I pursued mechanical engineering and now studying at London Business School.
gapminder country comparisonYou have seen the gapminder dataset that has data on
life expectancy, population, and GDP per capita for 142 countries from
1952 to 2007. To get a glipmse of the dataframe, namely to see the
variable names, variable types, etc., we use the glimpse
function. We also want to have a look at the first 20 rows of data.
## Rows: 1,704
## Columns: 6
## $ country <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
## # A tibble: 20 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
## 7 Afghanistan Asia 1982 39.9 12881816 978.
## 8 Afghanistan Asia 1987 40.8 13867957 852.
## 9 Afghanistan Asia 1992 41.7 16317921 649.
## 10 Afghanistan Asia 1997 41.8 22227415 635.
## 11 Afghanistan Asia 2002 42.1 25268405 727.
## 12 Afghanistan Asia 2007 43.8 31889923 975.
## 13 Albania Europe 1952 55.2 1282697 1601.
## 14 Albania Europe 1957 59.3 1476505 1942.
## 15 Albania Europe 1962 64.8 1728137 2313.
## 16 Albania Europe 1967 66.2 1984060 2760.
## 17 Albania Europe 1972 67.7 2263554 3313.
## 18 Albania Europe 1977 68.9 2509048 3533.
## 19 Albania Europe 1982 70.4 2780097 3631.
## 20 Albania Europe 1987 72 3075321 3739.
I have created the country_data and
continent_data with the code below.
country_data <- gapminder %>%
filter(country == "India") # just choosing India, as this is where I come from
continent_data <- gapminder %>%
filter(continent == "Asia")Your task is to produce two graphs of how life expectancy has changed
over the years for the country and the
continent you come from. First, create a plot of life
expectancy over time for the single country you chose. You should use
geom_point() to see the actual data points and
geom_smooth(se = FALSE) to plot the underlying trendlines.
You need to remove the comments # from the lines below
for your code to run.
plot_1<- ggplot(data = country_data, mapping = aes(x = year, y = lifeExp))+
geom_point() +
geom_smooth(se = FALSE)
print(plot_1)## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Next we need to add a title. Create a new plot, or extend plot1,
using the labs() function to add an informative title to
the plot.
plot_1<- ggplot(data = country_data, mapping = aes(x = year, y = lifeExp))+
geom_point() +
geom_smooth(se = FALSE) +
labs(title = "Life Expectancy in India (1952-2007)")
print(plot_1)## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Secondly, produce a plot for all countries in the continent
you come from. (Hint: map the country variable to the
colour aesthetic).
ggplot(data = continent_data , mapping = aes(x = year , y = lifeExp , colour= country ))+
geom_point() +
geom_smooth(se = FALSE)## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Finally, using the original gapminder data, produce a
life expectancy over time graph, grouped (or faceted) by continent. We
will remove all legends, adding the
theme(legend.position = "none") in the end of our
ggplot.
ggplot(data = gapminder , mapping = aes(x = year , y = lifeExp , colour= country))+
geom_point() +
geom_smooth(se = FALSE) +
facet_wrap(~continent) +
theme(legend.position="none") #remove all legends## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Given these trends, what can you say about life expectancy since 1952? Again, don’t just say what’s happening in the graph. Tell some sort of story and speculate about the differences in the patterns.
Type your answer after this blockquote. Life expectancy has improved almost everywhere since 1952, but not at the same pace. Africa started lowest and made progress, but growth slowed in the 1980s maybe due to diseases like HIV/AIDS. Asia and the Americas improved rapidly, especially after the 1970s, thanks to better healthcare and living conditions. Europe stayed high with steady gains, while Oceania remained the most stable and consistently high. The gap between continents has narrowed, but it hasn’t disappeared.
We will have a quick look at the results of the 2016 Brexit vote in
the UK. First we read the data using read_csv() and have a
quick glimpse at the data
## Rows: 632
## Columns: 11
## $ Seat <chr> "Aldershot", "Aldridge-Brownhills", "Altrincham and Sale W…
## $ con_2015 <dbl> 50.592, 52.050, 52.994, 43.979, 60.788, 22.418, 52.454, 22…
## $ lab_2015 <dbl> 18.333, 22.369, 26.686, 34.781, 11.197, 41.022, 18.441, 49…
## $ ld_2015 <dbl> 8.824, 3.367, 8.383, 2.975, 7.192, 14.828, 5.984, 2.423, 1…
## $ ukip_2015 <dbl> 17.867, 19.624, 8.011, 15.887, 14.438, 21.409, 18.821, 21.…
## $ leave_share <dbl> 57.89777, 67.79635, 38.58780, 65.29912, 49.70111, 70.47289…
## $ born_in_uk <dbl> 83.10464, 96.12207, 90.48566, 97.30437, 93.33793, 96.96214…
## $ male <dbl> 49.89896, 48.92951, 48.90621, 49.21657, 48.00189, 49.17185…
## $ unemployed <dbl> 3.637000, 4.553607, 3.039963, 4.261173, 2.468100, 4.742731…
## $ degree <dbl> 13.870661, 9.974114, 28.600135, 9.336294, 18.775591, 6.085…
## $ age_18to24 <dbl> 9.406093, 7.325850, 6.437453, 7.747801, 5.734730, 8.209863…
The data comes from Elliott Morris, who cleaned it and made it available through his DataCamp class on analysing election and polling data in R.
Our main outcome variable (or y) is leave_share, which
is the percent of votes cast in favour of Brexit, or leaving the EU.
Each row is a UK parliament
constituency.
To get a sense of the spread of the data, plot a histogram and a density plot of the leave share in all constituencies.
ggplot(brexit_results, aes(x = leave_share)) +
geom_histogram(binwidth = 2.5) +
labs(
title = "Distribution of Brexit Leave Vote Share",
subtitle = "Percentage of leave votes across UK parliamentary constituencies",
x = "Number of constituencies",
y = "Leave vote share(%)"
)ggplot(brexit_results, aes(x = leave_share)) +
geom_density() +
labs(
title = "Density of Brexit Leave Vote Share",
subtitle = "Smooth curve showing the leave vote share density across UK constituencies",
x = "Number of constituencies",
y = "Density of leave vote share"
)One common explanation for the Brexit outcome was fear of immigration
and opposition to the EU’s more open border policy. We can check the
relationship between the proportion of native born residents
(born_in_uk) in a constituency and its
leave_share. To do this, let us get the correlation between
the two variables:
## leave_share born_in_uk
## leave_share 1.0000000 0.4934295
## born_in_uk 0.4934295 1.0000000
The correlation is almost 0.5, which shows that the two variables are positively correlated.
We can also create a scatterplot between these two variables using
geom_point. We also add the best fit line, using
geom_smooth(method = "lm").
ggplot(brexit_results, aes(x = born_in_uk, y = leave_share)) +
geom_point(alpha=0.3) +
geom_smooth(method = "lm") +
theme_bw() +
labs(
title = "Correlation of UK-born Residents and Brexit Leave Vote Share",
subtitle = "Scatterplot with linear trendline",
x = "Residents Born in the UK",
y = "Leave Vote Share (%)"
)## `geom_smooth()` using formula = 'y ~ x'
You have the code for the plots, I would like you to revisit all of
them and use the labs() function to add an informative
title, sub-title, and axes titles to all plots.
What can you say about the relationship shown above? Again, don’t just say what’s happening in the graph. Tell some sort of story and speculate about the differences in the patterns.
Type your answer after this blockquote.
Knit the completed R Markdown file as ah HTML or pdf document (use the “Knit” button at the top of the script editor window) and upload it to Canvas.
Check minus: Name_Surname.Rmd just has a paragraph or
two of plain text with no formatting, bullet points, links, etc. You
used the hashtag #, which is a chapter, and the
>, which is a blockquote, inappropriately.
Check: something in between.
Check plus: Name_Surname.Rmd provides a proper
introduction of student to the class. It also demonstrates
experimentation with 4 or more aspects of the Markdown syntax. Examples:
section headers, links, bold, italic, bullet points, image embed, etc.
The student offers a few reflections on their experience with Markdown
(e.g. did you manage to knit both to HTML and pdf?).
If you want to, please answer the following