Today you will:
You may only estimate: \[Y_i = \beta_0 + \beta_1 X_i + u_i\]
No multiple regression (yet!). Please also avoid using variables
listed as factors or categorical variables in your
regressions; we have only covered continuous variables in class so
far.
Please select one of the following datasets. The R code to load the data into R, and a brief description of the variables is provided below for each dataset:
Cross-section data originating from the 1977–1978 Australian Health Survey.
Variables:
library(pacman)
p_load(AER)
data("DoctorVisits")
head(DoctorVisits)
## visits gender age income illness reduced health private freepoor freerepat
## 1 1 female 0.19 0.55 1 4 1 yes no no
## 2 1 female 0.19 0.45 1 2 1 yes no no
## 3 1 male 0.19 0.90 3 0 0 no no no
## 4 1 male 0.19 0.15 1 0 0 no no no
## 5 1 male 0.19 0.45 2 5 1 no no no
## 6 1 female 0.19 0.35 5 1 9 no no no
## nchronic lchronic
## 1 no no
## 2 no no
## 3 no no
## 4 no no
## 5 yes no
## 6 yes no
Cross-section data, at the firm level, on electric power generation.
Variables:
library(pacman)
p_load(AER)
data("Electricity1970")
head(Electricity1970)
## cost output labor laborshare capital capitalshare fuel fuelshare
## 1 0.2130 8 6869.47 0.3291 64.945 0.4197 18.000 0.2512
## 4 3.0427 869 8372.96 0.1030 68.227 0.2913 21.067 0.6057
## 5 9.4059 1412 7960.90 0.0891 40.692 0.1567 41.530 0.7542
## 14 0.7606 65 8971.89 0.2802 41.243 0.1282 28.539 0.5916
## 15 2.2587 295 8218.40 0.1772 71.940 0.1623 39.200 0.6606
## 16 1.3422 183 5063.49 0.0960 74.430 0.2629 35.510 0.6411
Cross-section data originating from the 1986 Medicaid Consumer Survey.
Variables:
library(pacman)
p_load(AER)
data("Medicaid1986")
head(Medicaid1986)
## visits exposure children age income health1 health2 access married gender
## 1 0 100 1 24 14.500 0.495 -0.854 0.50 no female
## 2 1 90 3 19 6.000 0.520 -0.969 0.17 no female
## 3 0 106 4 17 8.377 -1.227 0.317 0.42 no female
## 4 0 114 2 29 6.000 -1.524 0.457 0.33 no female
## 5 11 115 1 26 8.500 0.173 -0.599 0.67 no female
## 6 3 102 1 22 6.000 -0.905 0.062 0.25 no female
## ethnicity school enroll program
## 1 cauc 13 yes afdc
## 2 cauc 11 yes afdc
## 3 cauc 12 yes afdc
## 4 cauc 12 yes afdc
## 5 cauc 16 yes afdc
## 6 other 12 yes afdc
Cross-section data on states in 1950.
Variables:
library(pacman)
p_load(AER)
data("MurderRates")
head(MurderRates)
## rate convictions executions time income lfp noncauc southern
## 1 19.25 0.204 0.035 47 1.10 51.2 0.321 yes
## 2 7.53 0.327 0.081 58 0.92 48.5 0.224 yes
## 3 5.66 0.401 0.012 82 1.72 50.8 0.127 no
## 4 3.21 0.318 0.070 100 2.18 54.4 0.063 no
## 5 2.80 0.350 0.062 222 1.75 52.4 0.021 no
## 6 1.41 0.283 0.100 164 2.26 56.7 0.027 no
Write your research question clearly: “Is there a linear relationship between ______ and ______?”
Define:
Examine the scatterplot of your data. Does the relationship look linear? Are there outliers?
Estimate your regression model. Present the results in a table that reports:
In complete sentences:
Even though we are using simple OLS, we must think carefully.
Consider the following:
Before you leave today, please submit your written worksheet to me and upload your R script to the in-class Canvas assignment for today.