library(AER)
library(dplyr)
library(ggplot2)
library(lmtest)
library(sandwich)
library(haven)
library(lmtest)
library(stargazer)
Answer: The intercept is -124.84. It implies that at a zero income the consumption is predicted to be -124.84 and the negative sign is not meaningful as we cannot have a negative consumption. This informs us that the model is only effective when the values of the positive income in the sample are positive not when income is zero. The value of 124.84 is quite low as compared to the average family earnings in the data.
predicted_cons <- -124.84 + 0.853 * 30000
predicted_cons
## [1] 25465.16
Answer: When family income is $30,000, the predicted consumption is $25,465.16.
income <- seq(5000, 100000, by = 1000)
MPC <- rep(0.853, length(income))
APC <- (-124.84 / income) + 0.853
plot_data <- data.frame(income, MPC, APC)
ggplot(plot_data, aes(x = income)) +
geom_line(aes(y = MPC, color = "MPC"), size = 1.2) +
geom_line(aes(y = APC, color = "APC"), size = 1.2) +
labs(x = "Income (USD)", y = "Propensity", color = "", title="The Estimated MPC and APC") +
ylim(0.82, 0.86) +
theme_minimal()
Graph Interpretation: The MPC has a value of 0.853, which is illustrated as the straight line. It implies that every additional dollar of income increases consumption by 85.3 cents. The APC begins at low income and approaches the MPC with increase in incomes because of the negative impact of the intercept which is less when there is an increase in income.
data(Fatalities)
fatalities_1988 <- subset(Fatalities, year == 1988)
summary(fatalities_1988[, c("beertax", "jail", "service", "breath",
"youngdrivers", "fatal", "pop")])
## beertax jail service breath youngdrivers
## Min. :0.04331 no :33 no :37 no :24 Min. :0.07314
## 1st Qu.:0.19440 yes :14 yes :10 yes:24 1st Qu.:0.15543
## Median :0.34649 NA's: 1 NA's: 1 Median :0.16198
## Mean :0.47982 Mean :0.16200
## 3rd Qu.:0.59806 3rd Qu.:0.17104
## Max. :2.19442 Max. :0.22072
## fatal pop
## Min. : 104.0 Min. : 479000
## 1st Qu.: 294.2 1st Qu.: 1578252
## Median : 723.5 Median : 3479496
## Mean : 974.8 Mean : 5074334
## 3rd Qu.:1101.5 3rd Qu.: 5920494
## Max. :5390.0 Max. :28314028
The 1988 data consists of traffic fatality data of 48 US states. Taxes on beer vary widely by state, with the range of 0.04 to 2.19 per case with the median of 0.48. Approximately, 29 percent of states have a mandatory jail sentence on drunken driving, 21 percent require community service, and half of the states have laws on breath test. The average percentage of young drivers at the ages 15-24 is approximately 16.2, and it varies between 7.3 and 22.1. The fatalities range between 104 to 5,390 per state with an average of 974.8. Population in the states increases up to more than 28 million from 479,000. TThe population of the states is growing to over 28 million among 479,000. The contrast in the policies and outcomes offers us with an excellent opportunity to consider the questions whether the legislations on drunk driving actually reduce the number of traffic deaths.
fatalities_1988$VFR <- (fatalities_1988$fatal / fatalities_1988$pop) * 10000
mean(fatalities_1988$VFR)
## [1] 2.069594
Answer: The average vehicle fatality rate is 2.07 deaths per 10,000 people.
\(VFR = \beta_0 + \beta_1 jail + \beta_2 beertax + \beta_3 service + \beta_4 breath + \beta_5 youngdrivers + u\)
I expect β₁ to be negative.The effect of jail sentences should be to deter people to drunk drive, therefore, states that have jail laws should record few deaths on roads than those that do not.
model <- lm(VFR ~ jail + beertax + service + breath + youngdrivers,
data = fatalities_1988)
summary(model)
##
## Call:
## lm(formula = VFR ~ jail + beertax + service + breath + youngdrivers,
## data = fatalities_1988)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7297 -0.3283 -0.1218 0.3028 1.0466
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.3329 0.5339 2.497 0.0166 *
## jailyes 0.4170 0.2030 2.054 0.0464 *
## beertax 0.3860 0.1785 2.163 0.0364 *
## serviceyes -0.1500 0.2098 -0.715 0.4788
## breathyes -0.1238 0.1546 -0.801 0.4280
## youngdrivers 3.2209 3.3735 0.955 0.3453
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.471 on 41 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.2859, Adjusted R-squared: 0.1988
## F-statistic: 3.283 on 5 and 41 DF, p-value: 0.01388
Interpretation: The coefficient on jail is 0.4170. This implies that a state whereby the jail sentence is mandatory has 0.417 higher traffic deaths per 10,000 population as opposed to a state where there are no jail laws, other things held constant. The p-value stands at 0.0464 and this is less than 0.05. Therefore the coefficient becomes statistically significant at level 5% level. It is a positive sign and this was not what I expected. I expected that jail would minimize deaths but all we are seeing is the contrary. Probably states with worse drunk driving problems had stricter laws, or jail is not effective without other policies.
The beertax coefficient stands at 0.3860. The rate of fatality increases by 0.386 deaths per 10,000 people when beer tax is increased one dollar per case, other things being constant. This is strange in that increased prices should lessen drinking and drunk driving. It is possible that states that have more accidents increased beer taxes to address this issue, or beer taxes themselves do not have a significant effect on behavior.
The youngdrivers coefficient is 3.2209. Youngdrivers being a proportion between 0 and 1 implies that an increment of 100 percentage points in young drivers would increase the fatality rate by 3.22 deaths per 10,000 people. In more realistic terms, a 1 percentage point rise in young motorists corresponds to a 0.0322 fatalities/10,000 population rise in the fatalities. This is understandable since young drivers are not as experienced and are more reckless.
ci_results <- confint(model, parm = c("jailyes", "beertax", "youngdrivers"), level = 0.95)
ci_results
## 2.5 % 97.5 %
## jailyes 0.00705388 0.8269059
## beertax 0.02560153 0.7464246
## youngdrivers -3.59210932 10.0339191
For jail, the 95% CI is [0.0071, 0.8269]. This interval doesn’t include zero, so we’re confident the effect is positive. The width of the interval is moderate and has enough precision in the estimate. The lower endpoint is near zero meaning that there is uncertainty on the actual magnitude of the effect. For beertax, the 95% CI is [0.0256, 0.7464]. This period does not have zero indicating that beer taxes are related to the increased fatality rates. The interval is adequate with acceptable accuracy. For youngdrivers, the 95% CI is [-3.5921, 10.0339]. This is a very broad range and zero is contained within it hence, we cannot tell whether young drivers are raising or lowering the fatality rates. The low accuracy may be due to low level of variation of this variable among states or due to correlation with other variables.
primitive <- read_dta("C:/Users/User/Desktop/RM 2 Data/primitive_aejmacro2.dta")
head(primitive)
## # A tibble: 6 × 78
## wbname country colony mineuro majeuro ly2002 a1 a2 a3 a4 c1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 "" French… 1 0 0 NA 1 1 0.667 NA 1.49e-8
## 2 "" French… 1 0 0 NA NA NA 0.667 NA NA
## 3 "" Lesser… 0 0 0 NA NA NA 0.667 NA NA
## 4 "" Micron… 1 0 0 NA NA NA 0.667 NA NA
## 5 "" New Ca… 1 0 0 NA NA NA 0.667 NA NA
## 6 "" Surina… 1 0 0 NA 1 1 NA NA 1.49e-8
## # ℹ 67 more variables: c2 <dbl>, c3 <dbl>, c4 <dbl>, t1 <dbl>, t2 <dbl>,
## # t3 <dbl>, t4 <dbl>, mt1 <dbl>, mt2 <dbl>, mt3 <dbl>, mw1 <dbl>, mw2 <dbl>,
## # mw3 <dbl>, mw4 <dbl>, tr1 <dbl>, tr2 <dbl>, tr3 <dbl>, tr4 <dbl>,
## # tr1500cc <dbl>, mt1500c <dbl>, c1500c <dbl>, u1000 <dbl>, u0 <dbl>,
## # pop1000 <dbl>, lpopu4 <dbl>, larable <dbl>, latitude <dbl>,
## # distequat <dbl>, tropical <dbl>, eu <dbl>, af <dbl>, as <dbl>, am <dbl>,
## # oc <dbl>, weu <dbl>, chi <dbl>, india <dbl>, arab <dbl>, year <dbl>, …
Table 7A examines the persistence of technology adoption from 0 AD to 1500 AD.
# Regression Models for Table 7A
m1 <- lm(tr1500cc~tr2,data=primitive)
m2 <- lm(tr1500cc~tr1,data=primitive)
m3 <- lm(tr1500cc~tr2+eu+af+as+am,data=primitive)
m4 <- lm(tr1500cc~tr1+eu+af+as+am,data=primitive)
m5 <- lm(tr1500cc~tr2+distequat+dist2,data=primitive)
m6 <- lm(tr1500cc~tr1+distequat+dist2,data=primitive)
m7 <- lm(tr1500cc~tr2+distequat+dist2+landlocked,data=primitive)
m8 <- lm(tr1500cc~tr1+distequat+dist2+landlocked,data=primitive)
m9 <- lm(tr1500cc~tr2+landlocked+tropical,data=primitive)
m10 <- lm(tr1500cc~tr1+landlocked+tropical,data=primitive)
# Clustered standard errors
c1 <- coeftest(m1, vcov = vcovCL(m1, cluster = ~clus1000))
c2 <- coeftest(m2, vcov = vcovCL(m2, cluster = ~clus1000))
c3 <- coeftest(m3, vcov = vcovCL(m3, cluster = ~clus1000))
c4 <- coeftest(m4, vcov = vcovCL(m4, cluster = ~clus1000))
c5 <- coeftest(m5, vcov = vcovCL(m5, cluster = ~clus1000))
c6 <- coeftest(m6, vcov = vcovCL(m6, cluster = ~clus1000))
c7 <- coeftest(m7, vcov = vcovCL(m7, cluster = ~clus1000))
c8 <- coeftest(m8, vcov = vcovCL(m8, cluster = ~clus1000))
c9 <- coeftest(m9, vcov = vcovCL(m9, cluster = ~clus1000))
c10 <- coeftest(m10, vcov = vcovCL(m10, cluster = ~clus1000))
# Extract t-statistics and p-values
models <- list(m1, m2, m3, m4, m5, m6, m7, m8, m9, m10)
t_stats_list <- list(
round(c1[,3], 2), round(c2[,3], 2), round(c3[,3], 2), round(c4[,3], 2), round(c5[,3], 2),
round(c6[,3], 2), round(c7[,3], 2), round(c8[,3], 2), round(c9[,3], 2), round(c10[,3], 2)
)
p_value_list <- list(c1[,4], c2[,4], c3[,4], c4[,4], c5[,4], c6[,4], c7[,4], c8[,4], c9[,4], c10[,4])
# R-squared
rsq_rounded <- sapply(models, function(m) round(summary(m)$r.squared, 2))
# Create table
stargazer(models,
type = "text",
title = "Table 7A",
dep.var.labels = "Technology in 1500 AD",
se = t_stats_list,
p = p_value_list,
omit.stat = c("f", "ser", "adj.rsq", "rsq"),
add.lines = list(c("R-squared", rsq_rounded)))
##
## Table 7A
## ========================================================================================================
## Dependent variable:
## -------------------------------------------------------------------------------------------
## Technology in 1500 AD
## (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
## --------------------------------------------------------------------------------------------------------
## tr2 0.782*** 0.356*** 0.614*** 0.635*** 0.691***
## (8.360) (5.750) (7.140) (7.580) (6.150)
##
## tr1 0.784*** 0.242*** 0.582*** 0.584*** 0.637***
## (5.550) (4.920) (3.500) (3.430) (3.250)
##
## eu 0.520*** 0.689***
## (11.070) (17.080)
##
## af 0.005 0.193***
## (0.100) (4.410)
##
## as 0.368*** 0.537***
## (5.110) (9.110)
##
## am -0.004 0.068*
## (-0.340) (1.900)
##
## distequat 0.348 -0.097 0.424 -0.118
## (0.570) (-0.270) (0.820) (-0.300)
##
## dist2 0.590 1.280*** 0.438 1.310***
## (0.820) (2.750) (0.720) (2.710)
##
## landlocked -0.098*** 0.016 -0.101** 0.010
## (-2.790) (0.380) (-2.500) (0.220)
##
## tropical -0.249*** -0.215**
## (-3.090) (-2.260)
##
## Constant -0.062 0.128* 0.030*** 0.018 -0.129 0.088* -0.129* 0.086 0.146 0.299**
## (-0.900) (1.700) (2.850) (0.560) (-1.640) (1.710) (-1.790) (1.620) (1.460) (2.130)
##
## --------------------------------------------------------------------------------------------------------
## R-squared 0.47 0.47 0.86 0.87 0.67 0.63 0.69 0.63 0.64 0.57
## Observations 116 104 116 104 108 97 108 97 116 104
## ========================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
The coefficient on tr2 in Column (1) is 0.786. A one-unit increase in technology in 0 AD leads to a 0.786 unit increase in technology in 1500 AD. Since both are indices from 0 to 1, this shows very strong persistence over 1,500 years. Countries that were more advanced in 0 AD stayed more advanced in 1500 AD.
The coefficient on tr2 in Column (3) is 0.356. This means that, holding the country’s continent constant, a one-unit increase in overall technology adoption in 0 AD is associated with about a 0.356-unit increase in technology adoption in 1500 AD.
About 47% of the cross-country variation in technology adoption in 1500 AD is explained by their technology levels in 0 AD.
# Get Africa's average from Table 4
avg_tr2_africa <- 0.77
# Use Column 3 model (m3) which has continent dummies
pred_africa <- coef(m3)["(Intercept)"] +
coef(m3)["tr2"] * avg_tr2_africa +
coef(m3)["af"] * 1
cat("Predicted technology in 1500 AD for Africa:", round(pred_africa, 3))
## Predicted technology in 1500 AD for Africa: 0.308
Using Column (2) from Table 7A, which includes continent dummies, the prediction for an African country with average 0 AD technology level of 0.77 , from Table 4, is calculated as: 0.5268 + 0.3709(0.77) + (-0.5043) = 0.308. This is the estimated general level of adoption of technology in 1500 AD.
# Figure 1: Technology 1500 AD vs Log Income 2002
ggplot(primitive, aes(x = tr3, y = ly2002)) +
geom_point(alpha = 0.6, size = 2, color = "red") +
geom_smooth(method = "lm", se = FALSE, color = "purple", linewidth = 1) +
geom_text(aes(label = country), size = 2.5, check_overlap = TRUE, vjust = -0.5) +
labs(
title = "Figure 1",
x = "Overall technology adoption level in 1500 AD",
y = "Log per capita income in 2002"
) +
theme_minimal()
# Figure 2: Migration-adjusted Technology 1500 AD vs Log Income 2002
ggplot(primitive, aes(x = tr3mig, y = ly2002)) +
geom_point(alpha = 0.6, size = 2, color = "orange") +
geom_smooth(method = "lm", se = FALSE, color = "brown", linewidth = 1) +
geom_text(aes(label = country), size = 2.5, check_overlap = TRUE, vjust = -0.5) +
labs(
title = "Figure 2",
x = "Migration-adjusted technology level in 1500 AD",
y = "Log per capita income in 2002"
) +
theme_minimal()
The most significant challenge was to learn what variables in the dataset would be related to the concepts of the paper. The names of the variables such as tr1, tr2 and tr3 did not make sense to me when I first saw the data. I responded to this by taking a close read of the document that defines the variables accompanying the dataset and cross-matched it with the methodology section of the paper. Another R command that I used was a str() and a names() command to review the structure of the dataset and ensure that I was referring to the right variable. Doing initial test regressions on each variable acted as a good way of checking whether I was in the right direction, before trying the complete Table 7A replication.
My replication findings are not similar to those obtained in the original paper. In the case of Column (1), I got a t-statistic of 10.01 and a coefficient of 0.786. In the case of Column (3), my coefficient was 0.371 with a t -statistic of 5.82. These discrepancies could have been explained by several reasons. Depending on how the authors selected the initial working dataset, the dataset that I used might not be consistent with their original working dataset in terms of which observations it included and how it dealt with missing values. Sources of specific data cleaning may have been used, or the authors may have used some exclusion criteria that were not well defined in the published article. Also, not all statistical software packages and versions will support the implementation of clustered standard errors. Although with the vcovCL R command using the clustering variable of clus1000 as an input, the actual algorithmic implementation of calculating clustered standard error in Stata (apparently what the authors did) can yield slightly different values. The publicly accessible dataset can also have been modified or revised after the time of writing the paper, which will result in discrepancies in the sample structure or the values of the variables.
Comment visually on the fit of the OLS line for these two figures. What stands out to you?
In both figures, historical technology and the current income have a distinct positive correlation. Figure 1 is more spread around the line with some countries such as the US, Canada and Australia being way above the line even though they were less technologically developed in 1500 AD. With migration being adjusted, Figure 2 indicates a far tighter fit. The points are distributed closer to the line particularly those countries that were previously outliers. This suggests that accounting for migration patterns matters a lot - people brought their technology with them when they moved, which is why migration-adjusted technology is a better predictor of current income than just looking at where technology was geographically located in 1500 AD.