# load data

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

Perhaps the only thing agreed upon by all practioners of the to be mentioned field is: MacroEconomics is hard. In particular, Friedman and Keynes view on the governments role on productivity and other economic outputs define the battle of the right and left respectively.

Using data from the OECD Factbook 2013 we want to investigate how the average worker’s tax rate is associated with macroeconomic output variables, like revenue, productivity, inequality, unemployment.

The, albeit very reductive, versions of the opposing camps theories are:

Left wing: increasing the average tax rate on the allows for spending on infrastructure, which in tern can employ people and provide for basic needs, which directly affects income inequality. Forced liquidity means more money is moving than in savings accounts.

Right wing: Increasing tax burden inspires people to remain unemployed. Decreasing it puts more money directly into the economy where it is needed, increasing economic efficiency. Improved efficiency means that a given marginal tax rate results in more government revenue.

Cases

What are the cases, and how many are there? The cases are 34 individual countries measured in the factbook. Each datapoint is a country.

Data collection

Describe the method of data collection.

Type of study

What type of study is this (observational/experiment)?

This is an observational, just reported noted correlations.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

The data are from the OECD Factbook here:

http://www.oecd-ilibrary.org/economics/oecd-factbook_18147364

You can view the data nicely here:

Inequality: https://www.google.com/publicdata/explore?ds=ltjib1m1uf3pf_&ctype=l&met_y=taxapw_t1#!ctype=b&strail=true&bcs=d&nselm=s&met_x=taxapw_t1&scale_x=lin&ind_x=false&met_y=incinequal_t1a&scale_y=lin&ind_y=false&met_s=taxapw_t1&scale_s=lin&ind_s=false&idim=country:DNK:EST:GBR:USA:JPN:LUX:NLD&ifdim=country:country_group:oecd&pit=1318564800000&hl=en_US&dl=en_US&ind=false

Unemployment: https://www.google.com/publicdata/explore?ds=ltjib1m1uf3pf_&ctype=l&met_y=taxapw_t1#!ctype=b&strail=false&bcs=d&nselm=s&met_x=taxapw_t1&scale_x=lin&ind_x=false&met_s=taxapw_t1&scale_s=lin&ind_s=false&met_y=ltunemp_t1&scale_y=lin&ind_y=false&idim=country:DNK:EST:GBR:USA:JPN:LUX:NLD&ifdim=country:country_group:oecd&pit=1318564800000&hl=en_US&dl=en_US&ind=false

Productivity: https://www.google.com/publicdata/explore?ds=ltjib1m1uf3pf_&ctype=l&met_y=taxapw_t1#!ctype=b&strail=false&bcs=d&nselm=s&met_x=taxapw_t1&scale_x=lin&ind_x=false&met_s=taxapw_t1&scale_s=lin&ind_s=false&met_y=prodincom_g1&scale_y=lin&ind_y=false&idim=country:DNK:EST:GBR:USA:JPN:LUX:NLD&ifdim=country:country_group:oecd&pit=1318564800000&hl=en_US&dl=en_US&ind=false

Government Revenue: https://www.google.com/publicdata/explore?ds=ltjib1m1uf3pf_&ctype=l&met_y=taxapw_t1#!ctype=b&strail=false&bcs=d&nselm=s&met_x=taxapw_t1&scale_x=lin&ind_x=false&met_s=taxapw_t1&scale_s=lin&ind_s=false&met_y=govdefct_t2&scale_y=lin&ind_y=false&idim=country:DNK:EST:GBR:USA:JPN:LUX:NLD&ifdim=country:country_group:oecd&pit=1318564800000&hl=en_US&dl=en_US&ind=false

Response

What is the response variable, and what type is it (numerical/categorical)?

The response variables are economic indicators like GDP Per Hour Work, the Gini Income Inequality Coefficient, Government Revenue per capita and Long term unemployment rates. They are all numerical.

Explanatory

What is the explanatory variable, and what type is it (numerical/categorival)?

The explainatory variable is the taxation rate on the average worker. Naively this is numerical continous, and I think for a lot of applications it is. But in such a complex and noisy field, there may not be an easy linear continuous effect. Instead noticing differences in tax policy may be better by partitioning off the independent variable into 2+ ranges or categories and do a comparison of means among the group.

Relevant summary statistics

Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

I cleaned and put the data into a nice CSV for analysis. I’ll call it from github:

tax<-read.csv("https://raw.githubusercontent.com/scottogden10/606/master/606Proj.csv")
head(tax)
##       Country Avg_tax     Prod  Rev  Gini      Unemp
## 1       Chile   7.000 26.02053 37.7 0.503         NA
## 2 New Zealand  15.869 39.79061 39.7 0.333  8.9943343
## 3      Mexico  16.155 20.22543 24.5 0.457  2.0353722
## 4      Israel  19.821 37.49444 37.7 0.371 20.2281920
## 5       Korea  20.344 31.86053 33.2 0.307  0.3860101
## 6 Switzerland  20.998 60.78066 33.5 0.285 38.8070303

Lets provide some histograms and sumamry stats to see how they are distributed, as it can affect our tests:

Independent Variable

Taxes

library(fBasics)
## Loading required package: timeDate
## Loading required package: timeSeries
## 
## Rmetrics Package fBasics
## Analysing Markets and calculating Basic Statistics
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org
hist(tax$Avg_tax)
summary(tax$Avg_tax)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.00   29.84   37.58   35.20   42.61   55.55
sd(tax$Avg_tax)
## [1] 11.14545
skewness(tax$Avg_tax,na.rm=TRUE)
## [1] -0.5180147
## attr(,"method")
## [1] "moment"

Relatively well behaved, a slight left skew.

Production

hist(tax$Prod)
summary(tax$Prod)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   20.23   35.33   50.51   49.23   63.26   92.50       1
sd(tax$Prod,na.rm=TRUE)
## [1] 16.9115
skewness(tax$Prod,na.rm=TRUE)
## [1] 0.4867807
## attr(,"method")
## [1] "moment"

Not quite normal, right skew

Long Term Unemployment

hist(tax$Unemp)
summary(tax$Unemp)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.386  22.640  33.450  32.880  44.210  63.900       1
sd(tax$Unemp,na.rm=TRUE)
## [1] 16.11931
skewness(tax$Unemp,na.rm=TRUE)
## [1] -0.1257576
## attr(,"method")
## [1] "moment"

Pretty nicely behaved!

Gini Income Inequality

hist(tax$Gini)
summary(tax$Gini)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2490  0.2745  0.3050  0.3146  0.3372  0.5030
sd(tax$Gini,na.rm=TRUE)
## [1] 0.05822181
skewness(tax$Gini,na.rm=TRUE)
## [1] 1.345334
## attr(,"method")
## [1] "moment"

Large right skew, but nicely behaved (no obvious discontinuities).

Revenue

hist(tax$Rev)
summary(tax$Rev)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   24.50   37.70   39.70   42.01   47.40   58.40       1
sd(tax$Rev,na.rm=TRUE)
## [1] 7.786662
skewness(tax$Rev,na.rm=TRUE)
## [1] 0.2067908
## attr(,"method")
## [1] "moment"

Not bad! Slight right skew.