# load data
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Perhaps the only thing agreed upon by all practioners of the to be mentioned field is: MacroEconomics is hard. In particular, Friedman and Keynes view on the governments role on productivity and other economic outputs define the battle of the right and left respectively.
Using data from the OECD Factbook 2013 we want to investigate how the average worker’s tax rate is associated with macroeconomic output variables, like revenue, productivity, inequality, unemployment.
The, albeit very reductive, versions of the opposing camps theories are:
Left wing: increasing the average tax rate on the allows for spending on infrastructure, which in tern can employ people and provide for basic needs, which directly affects income inequality. Forced liquidity means more money is moving than in savings accounts.
Right wing: Increasing tax burden inspires people to remain unemployed. Decreasing it puts more money directly into the economy where it is needed, increasing economic efficiency. Improved efficiency means that a given marginal tax rate results in more government revenue.
What are the cases, and how many are there? The cases are 34 individual countries measured in the factbook. Each datapoint is a country.
Describe the method of data collection.
What type of study is this (observational/experiment)?
This is an observational, just reported noted correlations.
If you collected the data, state self-collected. If not, provide a citation/link.
The data are from the OECD Factbook here:
http://www.oecd-ilibrary.org/economics/oecd-factbook_18147364
You can view the data nicely here:
What is the response variable, and what type is it (numerical/categorical)?
The response variables are economic indicators like GDP Per Hour Work, the Gini Income Inequality Coefficient, Government Revenue per capita and Long term unemployment rates. They are all numerical.
What is the explanatory variable, and what type is it (numerical/categorival)?
The explainatory variable is the taxation rate on the average worker. Naively this is numerical continous, and I think for a lot of applications it is. But in such a complex and noisy field, there may not be an easy linear continuous effect. Instead noticing differences in tax policy may be better by partitioning off the independent variable into 2+ ranges or categories and do a comparison of means among the group.
Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
I cleaned and put the data into a nice CSV for analysis. I’ll call it from github:
tax<-read.csv("https://raw.githubusercontent.com/scottogden10/606/master/606Proj.csv")
head(tax)
## Country Avg_tax Prod Rev Gini Unemp
## 1 Chile 7.000 26.02053 37.7 0.503 NA
## 2 New Zealand 15.869 39.79061 39.7 0.333 8.9943343
## 3 Mexico 16.155 20.22543 24.5 0.457 2.0353722
## 4 Israel 19.821 37.49444 37.7 0.371 20.2281920
## 5 Korea 20.344 31.86053 33.2 0.307 0.3860101
## 6 Switzerland 20.998 60.78066 33.5 0.285 38.8070303
Independent Variable
Taxes
library(fBasics)
## Loading required package: timeDate
## Loading required package: timeSeries
##
## Rmetrics Package fBasics
## Analysing Markets and calculating Basic Statistics
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org
hist(tax$Avg_tax)
summary(tax$Avg_tax)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 7.00 29.84 37.58 35.20 42.61 55.55
sd(tax$Avg_tax)
## [1] 11.14545
skewness(tax$Avg_tax,na.rm=TRUE)
## [1] -0.5180147
## attr(,"method")
## [1] "moment"
Relatively well behaved, a slight left skew.
Production
hist(tax$Prod)
summary(tax$Prod)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 20.23 35.33 50.51 49.23 63.26 92.50 1
sd(tax$Prod,na.rm=TRUE)
## [1] 16.9115
skewness(tax$Prod,na.rm=TRUE)
## [1] 0.4867807
## attr(,"method")
## [1] "moment"
Not quite normal, right skew
Long Term Unemployment
hist(tax$Unemp)
summary(tax$Unemp)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.386 22.640 33.450 32.880 44.210 63.900 1
sd(tax$Unemp,na.rm=TRUE)
## [1] 16.11931
skewness(tax$Unemp,na.rm=TRUE)
## [1] -0.1257576
## attr(,"method")
## [1] "moment"
Pretty nicely behaved!
Gini Income Inequality
hist(tax$Gini)
summary(tax$Gini)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2490 0.2745 0.3050 0.3146 0.3372 0.5030
sd(tax$Gini,na.rm=TRUE)
## [1] 0.05822181
skewness(tax$Gini,na.rm=TRUE)
## [1] 1.345334
## attr(,"method")
## [1] "moment"
Large right skew, but nicely behaved (no obvious discontinuities).
Revenue
hist(tax$Rev)
summary(tax$Rev)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 24.50 37.70 39.70 42.01 47.40 58.40 1
sd(tax$Rev,na.rm=TRUE)
## [1] 7.786662
skewness(tax$Rev,na.rm=TRUE)
## [1] 0.2067908
## attr(,"method")
## [1] "moment"
Not bad! Slight right skew.