Today we are going to get familiar with using data from various resources. Since we are not familiar with time series models yet, we are going to use same year level data or growth data for each country available. The databases we are going to use and the variables that we are going to discuss are presented below.

Outline

Exporting Data from .csv file

Data Source: Worldbank

https://databank.worldbank.org/source/world-development-indicators#

Variables: GDP (constant 2010 US$), Government expenditure on education, total (% of GDP), Gross fixed capital formation (% of GDP), Gross domestic savings (% of GDP), Current account balance (% of GDP), ICT goods imports (% total goods imports)

Exporting Data from .xslx file

Data Source: International Monetary Fund

https://data.imf.org/?sk=7A51304B-6426-40C0-83DD-CA473CA1FD52&sId=1542633711584

Variables: Net International Investment Position

First dataset I am going to import is a .csv file that I obtained from WorldBank metadata. I choose 2017 as the year of analysis for convenience and I chose variables above. You may see these variables as arbitrary indicators however we are going to come up with models that can be attributed to charactheristics of an economy.

Different options on downloading WorldBank data

The screenshot above contains some data types when downloading data from WorldBank. CSV is a simple file format used to store tabular data, such as a spreadsheet or database. It stands for “comma seperated values”. Excel choice produces .xlsx type of data. A tab-delimited text file is a file containing tabs that separate information with one record per line. A tab delimited file is often used to upload data to a system. Its extension is .txt. In this exercise or on my own projects I find .csv as the most convenient data type hence I prefer it. When we are importing a dataset to R we can use two options. First one of them is import dataset option on the file tab of R. Its location can be seen below.

Import Dataset function In these options, readr refers to Read Rectangular Text Data package of R and it provides a fast and friendly way to read rectangular data (like ‘csv’, ‘tsv’, and ‘fwf’). I use that choice for .csv type of files and when we click on that option it produces the box below.

From text(readr)

While importing the data into R it is crucial to specify the type of the vector we are going to work with. The last column of the dataset is the actual data we are going to deal with hence it is wise to import it as a numeric vector. As we pushed the import button on the box R produces a code and executes it on the console. That code and the command is the second way to import a .csv file into R environment. This command is presented below.

  WorldBank_Data <- read_csv("Downloads/Data_Extract_From_World_Development_Indicators (3)/c7634d3f-2c5f-4223-a6ba-f6368f86bbfc_Data.csv")

Importing from an excel file is almost identical to a csv file. To demonstrate this we are going to deal with IIP data from IMF website. After importing both datasets we can condense them into a single dataset. Let the name of the new dataset that we are going to set up be country_level_indicators.

country_level_indicators <- data.frame()
for (i in 1:217){
  country_level_indicators[i,1] = c7634d3f_2c5f_4223_a6ba_f6368f86bbfc_Data[7*i-6, 1]
   country_level_indicators[i,2] = c7634d3f_2c5f_4223_a6ba_f6368f86bbfc_Data[7*i-6, 5]
   country_level_indicators[i,3] = c7634d3f_2c5f_4223_a6ba_f6368f86bbfc_Data[7*i-5, 5]
   country_level_indicators[i,4] = c7634d3f_2c5f_4223_a6ba_f6368f86bbfc_Data[7*i-4, 5]
   country_level_indicators[i,5] = c7634d3f_2c5f_4223_a6ba_f6368f86bbfc_Data[7*i-3, 5]
   country_level_indicators[i,6] = c7634d3f_2c5f_4223_a6ba_f6368f86bbfc_Data[7*i-2, 5]
   country_level_indicators[i,7] = c7634d3f_2c5f_4223_a6ba_f6368f86bbfc_Data[7*i-1, 5]
   country_level_indicators[i,8] = c7634d3f_2c5f_4223_a6ba_f6368f86bbfc_Data[7*i, 5]
} 

names(country_level_indicators)[names(country_level_indicators) == "2017 [YR2017]" ] <- "Real_GDP"

names(country_level_indicators)[names(country_level_indicators) == "2017 [YR2017].1" ] <- "Government Expenditure"

names(country_level_indicators)[names(country_level_indicators) == "2017 [YR2017].2" ] <- "Education Expenditure"

names(country_level_indicators)[names(country_level_indicators) == "2017 [YR2017].3" ] <- "Gross Fixed Capital Formation"

names(country_level_indicators)[names(country_level_indicators) == "2017 [YR2017].4" ] <- "Gross Domestic Savings"

names(country_level_indicators)[names(country_level_indicators) == "2017 [YR2017].5" ] <- "Current Account Balance"

names(country_level_indicators)[names(country_level_indicators) == "2017 [YR2017].6" ] <- "ICT goods imports"

Chunk above obtains each variable individually and denses them into one data frame. Other commands that start with names changes the column names accordingly. We used a for loop for disecting variables. For loops helps us to execute a command for the various values specified in the for loop. What we have done here is to trim 1520 lines of code to a simple eight line code. If we utilize another for loop with another variable we would be able to pack this code into just one line of code. Let us try to work with a couple of models that we are going to choose the variables. We have dealt with structuring our dataset so far. Now we can proceed with the construction of a few models regarding this dataset. You can try to come up with your own analysis by the way.

arbitrary_model2 <- lm(country_level_indicators$`Gross Fixed Capital Formation`~country_level_indicators$`Gross Domestic Savings`)

arbitrary_model3 <- lm(country_level_indicators$`Gross Fixed Capital Formation`~country_level_indicators$`Gross Domestic Savings` - 1)

arbitrary_model4 <- lm(country_level_indicators$`Gross Fixed Capital Formation`~country_level_indicators$`Gross Domestic Savings` + country_level_indicators$`Current Account Balance`)

arbitrary_model5 <- lm(country_level_indicators$`Gross Fixed Capital Formation`~country_level_indicators$`Gross Domestic Savings` + country_level_indicators$`Current Account Balance`-1)

arbitrary_model6 <- lm(country_level_indicators$`Education Expenditure`~country_level_indicators$`ICT goods imports`)

arbitrary_model7 <- lm(country_level_indicators$`Education Expenditure`~country_level_indicators$`ICT goods imports`-1)

We can add another variable to our dataset which is $\frac{Government Expenditure}{Real GDP}$.

for (i in 1:217) {
  country_level_indicators[i,9] = country_level_indicators[i,2]/country_level_indicators[i,3]
}

Since we have added our new variable which is government expenditure(% of GDP) we can incorporate it to our new regressions.

arbitrary_model8 <- lm(country_level_indicators$`Education Expenditure`~country_level_indicators$`Gov. Exp/Real_GDP`+ I(country_level_indicators$`Gross Fixed Capital Formation`/country_level_indicators$Real_GDP))


arbitrary_model9 <- lm(country_level_indicators$`Education Expenditure`~country_level_indicators$`Gov. Exp/Real_GDP`+ I(country_level_indicators$`Gross Fixed Capital Formation`/country_level_indicators$Real_GDP)-1)

So far we have structured a dataset and created 9 different models. Please check out the summaries of the models and comment on whichever you liked or what other regressions we can set up.

An Exercise on Data Import

Arda Yalcin

10/11/2019