Answer the following questions about RMarkdown from the short lessons
For this problem set, you’ll have three possible things to do.
“### Question #” / “xxxxx” = Write your response to each question by replacing the "xxxx. 1-2 sentences is fine.
“### Code #” = Write R code in the chunk corresponding to the instructions.
For this problem set, there are 15 questions and 10 code parts. If you get one wrong, it will be -3 points.
Additional points will be deducted for problems with the formatting as outlined by the problem set 1 instructions (e.g., do not provide .html output, put data into data folder, etc.).
1- Code chunks to run. R code chunks surrounded by ``` s 2- Text to display. Text mixed with simple text formatting 3- YAML metadata to guide the R Markdown build process. YAML header surrounded by — s
echo = FALSE do?‘echo=False’ prevents code, but not the results from appearing in the finished file. This is useful way to embed figures.
Set the output_format argument of render to render my .Rmd file into any of R Markdown’supported formats. For example, below render 1-example.Rmd to a Microsoft Word document.
Load the tidyverse package.
# install.packages('tidyverse',repos="http://cran.us.r-project.org")
library(tidyverse)
library(ggplot2)
tidyverse?tidyverse_packages()
## [1] "broom" "cli" "crayon" "dbplyr" "dplyr"
## [6] "forcats" "ggplot2" "haven" "hms" "httr"
## [11] "jsonlite" "lubridate" "magrittr" "modelr" "pillar"
## [16] "purrr" "readr" "readxl" "reprex" "rlang"
## [21] "rstudioapi" "rvest" "stringr" "tibble" "tidyr"
## [26] "xml2" "tidyverse"
v ggplot2 3.3.2 v purrr 0.3.4 v tibble 3.0.3 v dplyr 1.0.2 v tidyr 1.1.2 v stringr 1.4.0 v readr 1.3.1 v forcats 0.5.0
Read in the corrupt.csv file and assign it to corrupt.
corrupt<-read.csv(file='data/corrupt.csv')
head(corrupt)
## country region year cpi hdi
## 1 Denmark Europe and Central Asia 2015 91 0.925
## 2 New Zealand Asia Pacific 2015 91 0.915
## 3 Finland Europe and Central Asia 2015 90 0.895
## 4 Sweden Europe and Central Asia 2015 89 0.913
## 5 Switzerland Europe and Central Asia 2015 86 0.939
## 6 Norway Europe and Central Asia 2015 88 0.949
Run the glimpse() function on the data to explore the column formats.
glimpse(corrupt)
## Rows: 704
## Columns: 5
## $ country <chr> "Denmark", "New Zealand", "Finland", "Sweden", "Switzerland...
## $ region <chr> "Europe and Central Asia", "Asia Pacific", "Europe and Cent...
## $ year <int> 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015, 2015,...
## $ cpi <int> 91, 91, 90, 89, 86, 88, 85, 84, 83, 81, 85, 81, 79, 79, 77,...
## $ hdi <dbl> 0.925, 0.915, 0.895, 0.913, 0.939, 0.949, 0.925, 0.924, 0.9...
dim(corrupt)
## [1] 704 5
Rows (observation): 704 Columns (variables):5
nrow(corrupt)
## [1] 704
ncol(corrupt)
## [1] 5
summary(corrupt)
## country region year cpi
## Length:704 Length:704 Min. :2012 Min. : 8.00
## Class :character Class :character 1st Qu.:2013 1st Qu.:28.00
## Mode :character Mode :character Median :2014 Median :38.00
## Mean :2014 Mean :42.88
## 3rd Qu.:2014 3rd Qu.:55.00
## Max. :2015 Max. :92.00
## NA's :20
## hdi
## Min. :0.3410
## 1st Qu.:0.5507
## Median :0.7320
## Mean :0.6947
## 3rd Qu.:0.8263
## Max. :0.9490
## NA's :84
However, it’s not clear if there are duplicated records by year (i.e., this is panel data (record and time oriented)).
Run the count() function on corrupt and use year as the 2nd parameter. This will count how many records by each unique category in year (that is, each year)
corrupt%>%
group_by(year)%>%
summarise(count=n())
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 4 x 2
## year count
## <int> <int>
## 1 2012 176
## 2 2013 176
## 3 2014 176
## 4 2015 176
corrupt%>%as.tibble()%>%count(year)
## Warning: `as.tibble()` is deprecated as of tibble 2.0.0.
## Please use `as_tibble()` instead.
## The signature and semantics have changed, see `?as_tibble`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## # A tibble: 4 x 2
## year n
## <int> <int>
## 1 2012 176
## 2 2013 176
## 3 2014 176
## 4 2015 176
There are 4 different years (2012, 2013, 2014, 2015)
unique(corrupt$year)
## [1] 2015 2014 2013 2012
For simplicity, let’s only keep 2015 records.
corrupt <- corrupt %>%filter(year == 2015) %>%
na.omit()
corrupt
## country region year cpi hdi
## 1 Denmark Europe and Central Asia 2015 91 0.925
## 2 New Zealand Asia Pacific 2015 91 0.915
## 3 Finland Europe and Central Asia 2015 90 0.895
## 4 Sweden Europe and Central Asia 2015 89 0.913
## 5 Switzerland Europe and Central Asia 2015 86 0.939
## 6 Norway Europe and Central Asia 2015 88 0.949
## 7 Singapore Asia Pacific 2015 85 0.925
## 8 Netherlands Europe and Central Asia 2015 84 0.924
## 9 Canada Americas 2015 83 0.920
## 10 Germany Europe and Central Asia 2015 81 0.926
## 11 Luxembourg Europe and Central Asia 2015 85 0.898
## 12 United Kingdom Europe and Central Asia 2015 81 0.910
## 13 Australia Asia Pacific 2015 79 0.939
## 14 Iceland Europe and Central Asia 2015 79 0.921
## 15 Belgium Europe and Central Asia 2015 77 0.896
## 17 Austria Europe and Central Asia 2015 76 0.893
## 18 United States Americas 2015 76 0.920
## 19 Ireland Europe and Central Asia 2015 75 0.923
## 20 Japan Asia Pacific 2015 75 0.903
## 21 Uruguay Americas 2015 74 0.795
## 22 Estonia Europe and Central Asia 2015 70 0.865
## 23 France Europe and Central Asia 2015 70 0.897
## 25 Chile Americas 2015 70 0.847
## 26 United Arab Emirates Middle East and North Africa 2015 70 0.840
## 27 Bhutan Asia Pacific 2015 65 0.607
## 28 Israel Middle East and North Africa 2015 61 0.899
## 29 Poland Europe and Central Asia 2015 63 0.855
## 30 Portugal Europe and Central Asia 2015 64 0.843
## 32 Qatar Middle East and North Africa 2015 71 0.856
## 33 Slovenia Europe and Central Asia 2015 60 0.890
## 35 Botswana Sub Saharan Africa 2015 63 0.698
## 40 Lithuania Europe and Central Asia 2015 59 0.848
## 42 Costa Rica Americas 2015 55 0.776
## 43 Spain Europe and Central Asia 2015 58 0.884
## 44 Georgia Europe and Central Asia 2015 52 0.769
## 45 Latvia Europe and Central Asia 2015 56 0.830
## 47 Cyprus Europe and Central Asia 2015 61 0.856
## 48 Czech Republic Europe and Central Asia 2015 56 0.878
## 49 Malta Europe and Central Asia 2015 60 0.856
## 50 Mauritius Sub Saharan Africa 2015 53 0.781
## 51 Rwanda Sub Saharan Africa 2015 54 0.498
## 53 Namibia Sub Saharan Africa 2015 53 0.640
## 54 Slovakia Europe and Central Asia 2015 51 0.845
## 55 Croatia Europe and Central Asia 2015 51 0.827
## 56 Malaysia Asia Pacific 2015 50 0.789
## 57 Hungary Europe and Central Asia 2015 51 0.836
## 58 Jordan Middle East and North Africa 2015 53 0.742
## 59 Romania Europe and Central Asia 2015 46 0.802
## 60 Cuba Americas 2015 47 0.775
## 61 Italy Europe and Central Asia 2015 44 0.887
## 62 Sao Tome and Principe Sub Saharan Africa 2015 42 0.574
## 63 Saudi Arabia Middle East and North Africa 2015 52 0.847
## 64 Montenegro Europe and Central Asia 2015 44 0.807
## 65 Oman Middle East and North Africa 2015 45 0.796
## 66 Senegal Sub Saharan Africa 2015 44 0.494
## 67 South Africa Sub Saharan Africa 2015 44 0.666
## 68 Suriname Americas 2015 36 0.725
## 69 Greece Europe and Central Asia 2015 46 0.866
## 70 Bahrain Middle East and North Africa 2015 51 0.824
## 71 Ghana Sub Saharan Africa 2015 47 0.579
## 72 Burkina Faso Sub Saharan Africa 2015 38 0.402
## 73 Serbia Europe and Central Asia 2015 40 0.776
## 75 Bulgaria Europe and Central Asia 2015 41 0.794
## 76 Kuwait Middle East and North Africa 2015 49 0.800
## 77 Tunisia Middle East and North Africa 2015 38 0.725
## 78 Turkey Europe and Central Asia 2015 42 0.767
## 79 Belarus Europe and Central Asia 2015 32 0.796
## 80 Brazil Americas 2015 38 0.754
## 81 China Asia Pacific 2015 37 0.738
## 82 India Asia Pacific 2015 38 0.624
## 83 Albania Europe and Central Asia 2015 36 0.764
## 84 Bosnia and Herzegovina Europe and Central Asia 2015 38 0.750
## 85 Jamaica Americas 2015 41 0.730
## 86 Lesotho Sub Saharan Africa 2015 44 0.497
## 87 Mongolia Asia Pacific 2015 39 0.735
## 88 Panama Americas 2015 39 0.788
## 89 Zambia Sub Saharan Africa 2015 38 0.579
## 90 Colombia Americas 2015 37 0.727
## 91 Indonesia Asia Pacific 2015 36 0.689
## 92 Liberia Sub Saharan Africa 2015 37 0.427
## 93 Morocco Middle East and North Africa 2015 36 0.647
## 95 Argentina Americas 2015 32 0.827
## 96 Benin Sub Saharan Africa 2015 37 0.485
## 97 El Salvador Americas 2015 39 0.680
## 100 Sri Lanka Asia Pacific 2015 37 0.766
## 101 Gabon Sub Saharan Africa 2015 34 0.697
## 102 Niger Sub Saharan Africa 2015 34 0.353
## 103 Peru Americas 2015 36 0.740
## 104 Philippines Asia Pacific 2015 35 0.682
## 105 Thailand Asia Pacific 2015 38 0.740
## 106 Timor-Leste Asia Pacific 2015 28 0.606
## 107 Trinidad and Tobago Americas 2015 39 0.780
## 108 Algeria Middle East and North Africa 2015 36 0.745
## 110 Egypt Middle East and North Africa 2015 36 0.691
## 111 Ethiopia Sub Saharan Africa 2015 33 0.448
## 112 Guyana Americas 2015 29 0.638
## 113 Armenia Europe and Central Asia 2015 35 0.743
## 116 Mali Sub Saharan Africa 2015 35 0.442
## 117 Pakistan Asia Pacific 2015 30 0.550
## 119 Togo Sub Saharan Africa 2015 32 0.487
## 120 Dominican Republic Americas 2015 33 0.722
## 121 Ecuador Americas 2015 32 0.739
## 122 Malawi Sub Saharan Africa 2015 31 0.476
## 123 Azerbaijan Europe and Central Asia 2015 29 0.759
## 124 Djibouti Sub Saharan Africa 2015 34 0.473
## 125 Honduras Americas 2015 31 0.625
## 127 Mexico Americas 2015 31 0.762
## 129 Paraguay Americas 2015 27 0.693
## 130 Sierra Leone Sub Saharan Africa 2015 29 0.420
## 132 Kazakhstan Europe and Central Asia 2015 28 0.794
## 133 Nepal Asia Pacific 2015 27 0.558
## 135 Ukraine Europe and Central Asia 2015 27 0.743
## 136 Guatemala Americas 2015 28 0.640
## 137 Kyrgyzstan Europe and Central Asia 2015 28 0.664
## 138 Lebanon Middle East and North Africa 2015 28 0.763
## 139 Myanmar Asia Pacific 2015 22 0.556
## 140 Nigeria Sub Saharan Africa 2015 26 0.527
## 141 Papua New Guinea Asia Pacific 2015 25 0.516
## 142 Guinea Sub Saharan Africa 2015 25 0.414
## 143 Mauritania Middle East and North Africa 2015 31 0.513
## 144 Mozambique Sub Saharan Africa 2015 31 0.418
## 145 Bangladesh Asia Pacific 2015 25 0.579
## 146 Cameroon Sub Saharan Africa 2015 27 0.518
## 147 Gambia Sub Saharan Africa 2015 28 0.452
## 148 Kenya Sub Saharan Africa 2015 25 0.555
## 149 Madagascar Sub Saharan Africa 2015 28 0.512
## 150 Nicaragua Americas 2015 27 0.645
## 151 Tajikistan Europe and Central Asia 2015 26 0.627
## 152 Uganda Sub Saharan Africa 2015 25 0.493
## 153 Comoros Sub Saharan Africa 2015 26 0.498
## 154 Turkmenistan Europe and Central Asia 2015 18 0.692
## 155 Zimbabwe Sub Saharan Africa 2015 21 0.516
## 156 Cambodia Asia Pacific 2015 21 0.563
## 158 Uzbekistan Europe and Central Asia 2015 19 0.701
## 159 Burundi Sub Saharan Africa 2015 21 0.404
## 160 Central African Republic Sub Saharan Africa 2015 24 0.352
## 161 Chad Sub Saharan Africa 2015 22 0.396
## 162 Haiti Americas 2015 17 0.493
## 164 Angola Sub Saharan Africa 2015 15 0.533
## 165 Eritrea Sub Saharan Africa 2015 18 0.420
## 166 Iraq Middle East and North Africa 2015 16 0.649
## 168 Guinea-Bissau Sub Saharan Africa 2015 17 0.424
## 169 Afghanistan Asia Pacific 2015 11 0.479
## 170 Libya Middle East and North Africa 2015 16 0.716
## 171 Sudan Middle East and North Africa 2015 12 0.490
## 172 Yemen Middle East and North Africa 2015 18 0.482
## 175 South Sudan Sub Saharan Africa 2015 15 0.418
Let’s revise our existing region field. This will help us later on.
corrupt <- corrupt %>%
mutate(region = case_when(
region == "Middle East and North Africa" ~ "Middle East\nand North Africa",
region == "Europe and Central Asia" ~ "Europe and\nCentral Asia",
region == "Sub Saharan Africa" ~ "Sub-Saharan\nAfrica",
TRUE ~ region))
Let’s now see how many countries we have for each region.
Using dplyr and piping (%>%), count the number of countries by region and assign it to the dataframe region_count. After running it, print it to the console by simply writing the name of the data frame.
region_count<-corrupt%>%
group_by(region)%>%
summarise(count=n())
## `summarise()` ungrouping output (override with `.groups` argument)
region_count
## # A tibble: 5 x 2
## region count
## <chr> <int>
## 1 "Americas" 24
## 2 "Asia Pacific" 21
## 3 "Europe and\nCentral Asia" 46
## 4 "Middle East\nand North Africa" 18
## 5 "Sub-Saharan\nAfrica" 38
Based on the above solution, it is 21.
Create a scatterplot with the dataframe corrupt in which cpi is on the x axis, hdi is on the y axis, and the color of the points is region:
ggplot(data = corrupt)+
geom_point(mapping = aes(x=cpi, y=hdi, colour=region))
ggplot(data = corrupt)+
geom_point(mapping = aes(x=cpi, y=hdi, colour=region),position='jitter')
ggplot(data = corrupt)+
geom_point(mapping = aes(x=cpi, y=hdi, colour=region, shape=region),position='jitter')
3-We can use the smooth geom, a smooth line fitted to the region data.
ggplot(data = corrupt)+
geom_smooth (mapping = aes(x=cpi, y=hdi, color=region))
4- We can represent it usign facet wrap.
ggplot(corrupt) +
aes(x = cpi,
y = hdi,
color = region) +
geom_line(
aes(group = region),
color = "grey75"
) +
geom_point(size = 0.25) +
geom_smooth() +
scale_x_continuous(breaks =
seq(0, 100, 15)
) +
facet_wrap(~ region) +
guides(color = FALSE)
5- We can use world map.
Now, let’s modify our points.
First, let’s reshape each point.
Within your geom_point function, add in the following fixed parameters:
size to 2.5
alpha to 0.5
shape to 21
hint: since these three values are fixed, should the be inside or outside the aesthetics (aes()) function?
Size and alpha inside, shape outside.
ggplot(data=corrupt)+
geom_point(mapping = aes(x=cpi,y=hdi, color=region, size=2.5, alpha=0.5), shape=21)
alpha parameter do?Most geoms have an “alpha” parameter. Legal alpha values are any numbers from 0 (transparent) to 1 (opaque). The default alpha value usually is 1.To set the alpha to a constant value, use the alpha geom parameter (e.g., geom_point(data=d, mapping=aes(x=x, y=y), alpha=0.5) sets the alpha of all points in the layer to 0.5.
The plot is too transparent. The issue is the parameter color encodes the color of the border, not the color of the point.
That’s where we’ll need the fill parameter.
Put these two parameters explicitly in the aes() function of the geom_point():
color = region
fill = region
Also, make sure to remove any mention of color or fill in the aes() of your main ggplot() function.
ggplot(corrupt,aes(x=cpi,y=hdi,size=2.5,alpha=0.5),shape=21)+
geom_point(mapping=aes(color=region,fill=region))
Last, let’s temporarily save this graph as an object g. We can use the same <- (gets arrow) assignment operator. This will enable us to view the object or we can use it to build additional layers (Task 2).
Assign the ggplot from the previous part to g and then run g on its within the chunk.
g<-ggplot(corrupt,aes(x=cpi,y=hdi,size=2.5,alpha=0.5),shape=21)+
geom_point(mapping=aes(color=region,fill=region))
g
In this part, you’ll add additional layers to our plot to re-design it.
This part is much more complicated, so your job will be easier:
Remove the eval=F parameter from each chunk to run each chunk when knit your output.
Answer questions on interpreting what’s going on.
For this, we’ll use the same g object you created in the last chunk and slowly add more layers to the plot.
Before starting, we’ll need two packages: cowplot and colorspace. You can install colorspace from CRAN (remember how to?). For cowplot, you need the most recent version which is on GitHub.
Installing packages from GitHub is relatively straight-forward. But you need an additional package: devtools. You can then run the line below to install it.
Install cowplot and colorspace and call these libraries. Also, remove the eval=F parameter from each chunk to run each chunk when knit your output. (hint: you can do this for all parts via Edit > Replace and Find or CTRL + F)
# install.packages('colorscale',repos="http://cran.us.r-project.org")
library(colorspace)
# install.packages('devtools')
library(devtools)
# devtools::install_github("wilkelab/cowplot")
library(cowplot)
warning=F and message=F do within the code chunk?message = FALSE prevents messages that are generated by code from appearing in the finished file.
warning = FALSE prevents warnings that are generated by code from appearing in the finished.
Modifying themes are very common in ggplot. There are a range of packages to change plot themes like ggtheme.
For this plot, we’ll use a theme built within the cowplot package that is a minimal background with a horizontal grid.
g <- g +
cowplot::theme_minimal_hgrid(12, rel_small = 1)
g
cowplot:: pre-fix for theme_minimal_hgrid() mean? When would it be necessary?theme_minimal_hgrid is minimal_horizontal grid theme, which only draws horizontal grid lines without axis lines.
package ‘dplyr’ successfully unpacked and MD5 sums checked.
After package installation, the package is installed, but it isn’t loaded into memory. If you are going to be using the cowplot functions frequently, you will want to actually load the package into memory. Otherwise you have to preface your functions with double-colons: cowplot::theme(). Who wants to remember that every time?
Next, let’s modify the color scheme. Colors can be represented by hex colors.
Sometimes, color palettes come in as R packages (e.g., RColorBrewer). However, for this plot we’ll manually load up the colors.
# Okabe Ito colors
region_cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#999999")
g <- g +
scale_fill_manual(values = region_cols)
g
Alternative Modify color scheme
{r install colorblindr packages,warning=F,message=F} Alternative solution remotes::install_github(“clauswilke/colorblindr”)
{r install colorblindr library, message = FALSE} library(colorblindr)
Okabe Ito colors via colorblindr g<-g+scale_fill_OkabeIto() g
We can also darken the color scheme automatically through colorspace’s darken() function.
g<-g+
scale_color_manual(values=colorspace::darken(region_cols,0.3))
g
Let’s now overlay a basic regression, using the geom_smooth() function.
For this, we’ll make the function a log transformation of x.
g<-g+
geom_smooth(
aes(color='y~log(x)',fill='y~log(x)'),
method = 'lm', formula = y~log(x),se=FALSE,fullrange=TRUE)
g
y~x instead of y~log(x)?In many situations, the relationship between x and y is non-linear. In order to simplify the underlying model, we can transform or convert either x or y or both to result in a more linear relationship. There are many common transformations such as logarithmic and reciprocal. Including higher order terms on x may also help to linearity the relationship between x and y.
g1<-g+
geom_smooth(
aes(color='y~log(x)',fill='y~log(x)'),
method = 'lm', formula = y~x,se=FALSE,fullrange=TRUE)
g1
Let’s now modify our scales, add scale labels, and modify the legend.
g<-g+
scale_x_continuous(
name='Corruption Perception Index, 2015 (100=least corrupt)',
limits = c(10,95),
breaks = c(20,40,60,80,100),
expand=c(0,0))+
scale_y_continuous(
name='Human Development Index, 2015\n(1.0=most developed)',
limits = c(0.3,1.05),
breaks=c(0.2,0.4,0.6,0.8,1.0),
expand=c(0,0))+
theme(legend.position = 'top',
legend.justification = 'right',
legend.text = element_text(size=9),
legend.box.spacing = unit(0,'pt'))+
guides(fill=guide_legend(
nrow = 1,
override.aes = list(
linetype=c(rep(0,5),1),
shape=c(rep(21,5),NA))))
g
Last, let’s add labels to highlight the countries.
We can use the ggrepel package that includes the geom_text_repel() function that makes sure not to overlap labels.
# install.packages("ggrepel")
library(ggrepel)
# don't assign this to g
# if you do, then simply recreate g by running the "Run All Chunks Above" button
g2<-g+
geom_text_repel(
aes(label=country),
color='black',
size=9/.pt,
point.padding=0.1,
box.padding=.6,
min.segment.length=0,
seed=7654)
g2
Obviously, this is too busy. We have too many labels.
Let’s instead create a vector of countries we want to plot. We can the add in a new column that has the country name only if we want to plot it and nothing ("") otherwise.
country_highlight <- c("Germany", "Norway", "United States", "Greece", "Singapore", "Rwanda", "Russia", "Venezuela", "Sudan", "Iraq", "Ghana", "Niger", "Chad", "Kuwait", "Qatar", "Myanmar", "Nepal", "Chile", "Argentina", "Japan", "China")
corrupt<-corrupt%>%
mutate(
label=if_else (country %in% country_highlight, country, ""))
# wow: %+%
# https://stackoverflow.com/questions/29336964/changing-the-dataset-of-a-ggplot-object
g <- g %+%
corrupt +
geom_text_repel(
aes(label = label),
color = "black",
size = 9/.pt, # font size 9 pt
point.padding = 0.1,
box.padding = .6,
min.segment.length = 0,
seed = 7654)
g
%+% operator does (see the StackOverflow link)? Why is it necessary in this context?1- Add components to a plot 2- Concatenate character vectors
g+ggsave('corrupt.pdf', width=8, height=5)
g + ggsave("corrupt.pdf", width = 10, height = 5)
You now have a pdf saved as this plot. By setting the width and height, it’ll make your life so much easier if you need to reproduce this plot (very likely).
fit.RP<-lm(hdi~cpi,data=corrupt)
summary(fit.RP)
##
## Call:
## lm(formula = hdi ~ cpi, data = corrupt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.28519 -0.06561 0.01051 0.08636 0.20099
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.4311409 0.0213609 20.18 <2e-16 ***
## cpi 0.0060898 0.0004425 13.76 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1085 on 145 degrees of freedom
## Multiple R-squared: 0.5664, Adjusted R-squared: 0.5634
## F-statistic: 189.4 on 1 and 145 DF, p-value: < 2.2e-16
1- There is a cluster formation under the line especially for Sub-Saharan Africa region.
2- Sub-Saharan Africa region seems to be dominant region that causes low R^2 and and high SSE.
3- Countries far away from the regression line have bigger sample error. It seems that countries below the regression line have high sample error.
4- Countries above regression line have smaller sample error since they are closer to the regression line.
5- Majority of Americas and almost all of the Europe and Central Asia countries have smaller sample error and are prone to enhance the regression (by increasing the R^2).
6- Rather then forming cluster as observed with countries below the regression line, there is a kind of linear spread with the countries above the line.
1- Sub-Saharan Africa region has the lowest human development index, whereas Europe and Central Asia seem to have the highest human development index.
2- Most corrupted region is Sub-Saharan Africa, whereas prevalence of corruption is lowest in Europe and Central Asia.
3- Although, very very few Europe and Central Asia countries (2) have significant corruption, it seems that they also have more than 0.6 human development index. Maybe they do corruption for development of “selected group” in their countries. Since it is not possible to have reliable data in corrupted countries, their human development index data could be fake.
4- Except from Sub-Saharan Africa region, most of the other countries have less SSE, and are close to the regression line.
5- Americas have human development index between 0.6 to 0.8.
6- If corruption can be slowed down in Sub-Saharan Africa region, it’s expected to have better human development index.
7- In order to yield a better linearity (ideal condition),Higher the human development index lower the corruption should be attained. In ideal condition, the cpi, hdi should be 100, 1.0, respectively.
8-Europe and Central Asia has most significant linear data overall.
# install.packages('tinytex')
# tinytex::install_tinytex()
library(tinytex)