At the end of 2019 Venezuela had a population of 28,435,940 people. With the 5 most populous cities being: Caracas (capital), Maracaibo, Valencia, Barquisimeto, and Maracay. With a staggering 88.3% of total population living in Urban areas and 11.7% living in rural areas.
knitr::include_url(“https://rpubs.com/jowens91/700218”)myurl <-"https://docs.google.com/spreadsheets/d/e/2PACX-1vSLjElrCkDSEspbMxViKRLBG9QWvASVDlJNHfPUad63sRj22QLOZloMN163DX2mz0KiNEJjegO6iUut/pub?gid=0&single=true&output=csv"
pops <- read.csv(url(myurl))
myColors= c ("#fcba03", "#032cfc", "#fc1c03", "#a4c478", "#d5aae3")
class(pops$City)
## [1] "character"
pops$City <- as.factor(pops$City)
class(pops$City)
## [1] "factor"
ggplot(pops, aes(x=Year, y= GrowthRate, color=City)) +
geom_line(lwd=1.5) +
geom_hline(yintercept = 5, lwd= 0.5, color= 'black')+
labs(title = "Growth Rate Percentage in Venezuela Over Time",
subtitle= "How does population growth differ by city?")+
ylab('Growth Rate (%)')+
xlab('Date')+
##theme_fivethirtyeight()+
theme(
axis.title= element_text(size= 20, face ='bold'),
axis.text = element_text(size = 14, color = 'black'),
legend.text = element_text(size = 14, color = 'black'),
legend.title = element_text(size = 14, face = 'bold'))+
scale_color_manual(values= myColors)
Tanzanians elected a president last week. He is only the fifth person to hold this office. The first president of Tanzania was a man named Julius Nyerere. During Nyere’s 21 years as president, he instituted socialism and nationalized most private industries. The economy was in shambles and dependent on foreign aid. After Nyere left office in 1985, new presidents have come to power and ‘opened Tanzania for business’. The economy has grown a lot since socialism ended, but Tanzania is still a poor country.
Last week, Tanzanians elected John Magufuli. His nickname is “The Bulldozer”. He is known for his aggressive efforts to halt corruption, to build infrastructure, and for his take-no-prisoners approach to defeating political opponents. Laws were passed during his first term in office that curtailed freedom of the press. The election itself took place during a tense time for Tanzanians. There was some deadly street violence against opposition groups and many regional and district level candidates from opposition parties were removed from voter rolls.
President Magufuli earned an overwhelming majority of the votes – more than 84% according to government sources. The Tanzanian citizens have accepted the election results. The main opposition candidate to Magufuli was Tundu Lissu. In 2017, Mr. Lissu was the victim of an assassination attempt and was shot several times in downtown Dodoma (the nation’s capital). He survived and then went into exile. He returned to Tanzania to stand in the 2020 election. He calls his vote total (13%) a sham.
Many Tanzanians take pride in the massive infrastructure projects that have been executed by Magufuli, who has kept true to his nickname “The Bulldozer”. The radical changes that Magufuli has instituted to reduce corruption have also been widely praised by Tanzanians. But because Tanzania is a relatively poor country, a very pressing issue for citizens of the country is employment, income, and economic growth.
Tanzania’s GDP per capita, expressed in US dollar equivalent, is 1,122 USD per person (for comparison, the US GDP per capita is 65,118 USD). The GDP per capita in Tanzania has just reached a level that qualifies the country to be categorized as a “lower middle-income status” country, departing from its prior status of a “low income country”. This leads us to wonder whether Magufuli’s years in office have been exceptional, by the measure of common economic indicators. In this report we will look at GDP and examine whether GDP growth has been higher during Magufuli’s years in office.
First we need to acquire some data. The data below are provided by the World Bank. Just like the UN, the World Bank has a terrible habit of providing poorly formatted data.
data <- read.csv("API_NY.GDP.PCAP.CD_DS2_en_csv_v2_1593914.csv", stringsAsFactors = F)
tz_data <- data[data$Country.Name=="Venezuela,RB",]
head(tz_data)
## [1] Country.Name Country.Code Indicator.Name Indicator.Code X1960
## [6] X1961 X1962 X1963 X1964 X1965
## [11] X1966 X1967 X1968 X1969 X1970
## [16] X1971 X1972 X1973 X1974 X1975
## [21] X1976 X1977 X1978 X1979 X1980
## [26] X1981 X1982 X1983 X1984 X1985
## [31] X1986 X1987 X1988 X1989 X1990
## [36] X1991 X1992 X1993 X1994 X1995
## [41] X1996 X1997 X1998 X1999 X2000
## [46] X2001 X2002 X2003 X2004 X2005
## [51] X2006 X2007 X2008 X2009 X2010
## [56] X2011 X2012 X2013 X2014 X2015
## [61] X2016 X2017 X2018 X2019 X2020
## <0 rows> (or 0-length row.names)
You’ll notice that these data are organized with columns for individual years. These data are thus not ‘tidy’. In tidy data, the columns are variable names (e.g. ‘GDP’, ‘year’) and the row entries should specify the specific values that each variable takes on. We need to reshape these data, and while these steps are a bit tedious, these processes are essential to wrangling data, something you inevitably must do when using data collected by others.
#how many columns are there in the dataset?
ncol(tz_data)
## [1] 65
In this dataset, there are 65 columns. The name of each column is listed below:
print(names(tz_data))
## [1] "Country.Name" "Country.Code" "Indicator.Name" "Indicator.Code"
## [5] "X1960" "X1961" "X1962" "X1963"
## [9] "X1964" "X1965" "X1966" "X1967"
## [13] "X1968" "X1969" "X1970" "X1971"
## [17] "X1972" "X1973" "X1974" "X1975"
## [21] "X1976" "X1977" "X1978" "X1979"
## [25] "X1980" "X1981" "X1982" "X1983"
## [29] "X1984" "X1985" "X1986" "X1987"
## [33] "X1988" "X1989" "X1990" "X1991"
## [37] "X1992" "X1993" "X1994" "X1995"
## [41] "X1996" "X1997" "X1998" "X1999"
## [45] "X2000" "X2001" "X2002" "X2003"
## [49] "X2004" "X2005" "X2006" "X2007"
## [53] "X2008" "X2009" "X2010" "X2011"
## [57] "X2012" "X2013" "X2014" "X2015"
## [61] "X2016" "X2017" "X2018" "X2019"
## [65] "X2020"
The first four column names are relatively straight forward. The next 61 column names all begin with X. The reason they begin with X is the R has automatically inserted an “X” to each column name that existed in the underlying .csv document. R does not like any column names in a dataframe to begin with a number. This is just a feature of R that you have to learn to live with.
Our general strategy will be to create new empty vectors to hold the data we want, and then assemble these into the dataframe that we will use to make the analysis and plot. We would like to have a dataframe in which there is a column for “gdp_per_capita” and a column for “year”, and the values stored accordingly.
This is a common kind of data processing task that you will surely encounter when working with R. The first thing we will do is extract the years. Then we will extract the GPD per capita values.
year_entries <- rep(NA, 63)
gpd_entries <- rep(NA,63)
#this line is telling R to report the column names from the raw dataframe
current_column_names <- colnames(tz_data)
#this grabs only the column names for columns 5 through 65 (the years)
selected_column_names <- current_column_names[5:65]
#this function 'substr' is used to extract a 'sub-string', that is, a subset of the total word,
#defined by start and end character numbers.
#This eliminates the annoying 'X' from the start of each year.
selected_column_names <- substr(selected_column_names, start = 2,stop = 5)
year_entries <- selected_column_names
#this next step is to force R to treat these entries (years) as numeric values, rather than text or factors.
year_entries <- as.numeric(year_entries)
The next task is to extract the values for GDP per capita that were provided by the world bank. In order to grab those values, we need to access the values that are in row 1, columns 5 through 65. With R, it is easy to grab chunks of data like this from a dataframe, using index values for the rows and the columns. The only tricky part is remembering, which comes first? Do I list the numers for the ROWS first, or the COLUMNS first, when I do this? My mnemonic for remembering this is that in the language R, Rows come first. Get it? R? Rows?
# this code is saying, give me the values on row 1, columns 5 through 65.
# when using index values to access data in an R dataframe, you must specify
# the ROWS and then the COLUMNS. The row and column values can be a single value or a series of values.
#You have to insert a comma to separate the row and column indices, too.
#If you leave either the row or the column indices empty, that means "give me all the rows" or "give me all the columns".
#In the code here, we are saying "give me all the rows, and the columns 5 through 65. Since there is only one row in the dataset (TZ data), this gives us all the data we are after.
gdp_entries <- tz_data[,5:65]
#the next line 'strips away' the fact that data frame column names are still attached to this row of data.
# This turns the data into a simple vector of values, like we want.
gdp_entries <- as.numeric(gdp_entries)
Now that we have those data, we can assemble them into a new data frame useful for plotting:
tz_data_2 <- data.frame("gdp_per_capita"=gdp_entries, "year"=year_entries)
head(tz_data_2)
## gdp_per_capita year
## 1 NA 1960
## 2 NA 1961
## 3 NA 1962
## 4 NA 1963
## 5 NA 1964
## 6 NA 1965
Inspecting the data, I see that GDP per capita values are not provided prior to 1988. Tanzania was a socialist country until 1985 and the relevant data were probably not being collected prior to that period. So we need to filter out the data that are prior to 1988.
tz_data_2 <- tz_data_2[tz_data_2$year>=1988,]
We also need to remove the year 2020, because those data are not yet available.
tz_data_2 <- tz_data_2[tz_data_2$year!=2020,]
Let’s plot the data across time, and just “eyeball” GDP growth during Magufuli’s years relative to former presidents of Tanzania. Tanzania election occur every 5 years, and the new president is sworn in at the start of the new year. I hard-code in the lines below each of Tanzania’s presidents since 1998.
plot(x=tz_data_2$year, y=tz_data_2$gdp_per_capita, type="l", xlab="year", ylab="GDP per person in USD", ylim=c(0,1200))
min_gdp <- min(tz_data_2$gdp_per_capita)
max_gdp <- max(tz_data_2$gdp_per_capita)
mwinyi_reelect <- 1991
mkapa_start <- 1996
mkapa_reelect <- 2001
kikwete_start <- 2006
kikwete_reelect <- 2011
magufuli_start <- 2016
lines(x=c(mwinyi_reelect,mwinyi_reelect), y=c(min_gdp, max_gdp), col="orange", lty=2)
lines(x=c(mkapa_start,mkapa_start), y=c(min_gdp, max_gdp), col="red", lty=2)
lines(x=c(mkapa_reelect,mkapa_reelect), y=c(min_gdp, max_gdp), col="red", lty=2)
lines(x=c(kikwete_start,kikwete_start), y=c(min_gdp, max_gdp), col="yellow", lty=2)
lines(x=c(kikwete_reelect,kikwete_reelect), y=c(min_gdp, max_gdp), col="yellow", lty=2)
lines(x=c(magufuli_start,magufuli_start), y=c(min_gdp, max_gdp), col="green", lty=2)
text(x=mwinyi_reelect, y=1180, "Mwinyi 2")
text(x=mkapa_start, y=1180, "Mkapa 1")
text(x=mkapa_reelect, y=1180, "Mkapa 2")
text(x=kikwete_start, y=1180, "Kikwete 1")
text(x=kikwete_reelect, y=1180, "Kikwete 2")
text(x=magufuli_start, y=1180, "Magufuli 1")
Here we look at the data.
Now let’s calculate the change in GDP that has occurred each year in this data series, and then see how the years under Magufuli stack up.
tz_data_2$change_in_gdp <- NA #start with empty values
#you can't have a change value for the first year in the series, and so I hard code this as NA
tz_data_2$change_in_gdp[1] <- NA
#after the first year, we can calculate the change relative to the prior year.
#We do this by first subtracting the values in rows 1 through 31 (the years 1988 through 2018) from the values of rows 2 through 32 (the years 1989 through 2019).
tz_data_2$change_in_gdp[2:32] <- tz_data_2$gdp_per_capita[2:32]-tz_data_2$gdp_per_capita[1:31]
#after the first year, we can calculate the percent change by dividing the absolute change by the GDP values in rows 1 through 31 (GDP for the years 1988 through 2018).
tz_data_2$percent_change_gdp[2:32] <- tz_data_2$change_in_gdp[2:32]/tz_data_2$gdp_per_capita[1:31]
For our summary exercise, we’d like to address the question of whether GDP growth has been exceptional under Magufuli or not. We have already calculated the values of GDP change per year. Now let’s code each year in the series as being during Magufuli’s presidency or not:
tz_data_2$is_magufuli <- tz_data_2$year >= 2016
gdp_percent_change_magufuli <- tz_data_2$percent_change_gdp[tz_data_2$is_magufuli]
gdp_percent_change_not_magufuli <- tz_data_2$percent_change_gdp[!tz_data_2$is_magufuli]
gdp_percent_change_not_magufuli <-gdp_percent_change_not_magufuli[!is.na(gdp_percent_change_not_magufuli)]
You should now be ready to address the summary exercises.
Please insert below in this Rmarkdown document a boxplot displaying side-by-side, the distribution of yearly changes in GDP during Magufuli’s years and in non-Magufuli years.
Then write text (one or two sentences) that reports, using inline R calculated values:
Then answer, using your eyeball appraisal (not a formal statistical test) has the yearly change in GDP under Magufuli been exceptional? If so, in what way?
Try ‘knitting’ this RMarkdown document using different themes. Look at the header of this document where you should see
theme: united
Try changing that to cerulean and re-Knitting the document. Then try some of the other themes, such as cosmo or lumen.