Today you will:
Please select one of the following datasets. The R code to load the data into R, and a brief description of the variables is provided below for each dataset:
Cross-section data originating from the 1977–1978 Australian Health Survey.
Variables:
library(pacman)
p_load(AER)
data("DoctorVisits")
head(DoctorVisits)
## visits gender age income illness reduced health private freepoor freerepat
## 1 1 female 0.19 0.55 1 4 1 yes no no
## 2 1 female 0.19 0.45 1 2 1 yes no no
## 3 1 male 0.19 0.90 3 0 0 no no no
## 4 1 male 0.19 0.15 1 0 0 no no no
## 5 1 male 0.19 0.45 2 5 1 no no no
## 6 1 female 0.19 0.35 5 1 9 no no no
## nchronic lchronic
## 1 no no
## 2 no no
## 3 no no
## 4 no no
## 5 yes no
## 6 yes no
Cross-section data, at the firm level, on electric power generation.
Variables:
library(pacman)
p_load(AER)
data("Electricity1970")
head(Electricity1970)
## cost output labor laborshare capital capitalshare fuel fuelshare
## 1 0.2130 8 6869.47 0.3291 64.945 0.4197 18.000 0.2512
## 4 3.0427 869 8372.96 0.1030 68.227 0.2913 21.067 0.6057
## 5 9.4059 1412 7960.90 0.0891 40.692 0.1567 41.530 0.7542
## 14 0.7606 65 8971.89 0.2802 41.243 0.1282 28.539 0.5916
## 15 2.2587 295 8218.40 0.1772 71.940 0.1623 39.200 0.6606
## 16 1.3422 183 5063.49 0.0960 74.430 0.2629 35.510 0.6411
Cross-section data originating from the 1986 Medicaid Consumer Survey.
Variables:
library(pacman)
p_load(AER)
data("Medicaid1986")
head(Medicaid1986)
## visits exposure children age income health1 health2 access married gender
## 1 0 100 1 24 14.500 0.495 -0.854 0.50 no female
## 2 1 90 3 19 6.000 0.520 -0.969 0.17 no female
## 3 0 106 4 17 8.377 -1.227 0.317 0.42 no female
## 4 0 114 2 29 6.000 -1.524 0.457 0.33 no female
## 5 11 115 1 26 8.500 0.173 -0.599 0.67 no female
## 6 3 102 1 22 6.000 -0.905 0.062 0.25 no female
## ethnicity school enroll program
## 1 cauc 13 yes afdc
## 2 cauc 11 yes afdc
## 3 cauc 12 yes afdc
## 4 cauc 12 yes afdc
## 5 cauc 16 yes afdc
## 6 other 12 yes afdc
Cross-section data on states in 1950.
Variables:
library(pacman)
p_load(AER)
data("MurderRates")
head(MurderRates)
## rate convictions executions time income lfp noncauc southern
## 1 19.25 0.204 0.035 47 1.10 51.2 0.321 yes
## 2 7.53 0.327 0.081 58 0.92 48.5 0.224 yes
## 3 5.66 0.401 0.012 82 1.72 50.8 0.127 no
## 4 3.21 0.318 0.070 100 2.18 54.4 0.063 no
## 5 2.80 0.350 0.062 222 1.75 52.4 0.021 no
## 6 1.41 0.283 0.100 164 2.26 56.7 0.027 no
Infidelity data, known as Fair’s Affairs. Cross-section data from a survey conducted by Psychology Today in 1969.
library(pacman)
p_load(AER)
data("Affairs")
head(Affairs)
## affairs gender age yearsmarried children religiousness education occupation
## 4 0 male 37 10.00 no 3 18 7
## 5 0 female 27 4.00 no 4 14 6
## 11 0 female 32 15.00 yes 1 12 1
## 16 0 male 57 15.00 yes 5 18 6
## 23 0 male 22 0.75 no 2 17 6
## 29 0 female 32 1.50 no 2 17 5
## rating
## 4 4
## 5 4
## 11 4
## 16 5
## 23 3
## 29 5
Variables: * affairs How often engaged in extramarital sexual intercourse during the past year? 0 = none, 1 = once, 2 = twice, 3 = 3 times, 7 = 4–10 times, 12 = monthly, 12 = weekly, 12 = daily. * gender factor indicating gender. * age numeric variable coding age in years: 17.5 = under 20, 22 = 20–24, 27 = 25–29, 32 = 30–34, 37 = 35–39, 42 = 40–44, 47 = 45–49, 52 = 50–54, 57 = 55 or over. * yearsmarried numeric variable coding number of years married: 0.125 = 3 months or less, 0.417 = 4–6 months, 0.75 = 6 months–1 year, 1.5 = 1–2 years, 4 = 3–5 years, 7 = 6–8 years, 10 = 9–11 years, 15 = 12 or more years. * children factor. Are there children in the marriage? * religiousness numeric variable coding religiousness: 1 = anti, 2 = not at all, 3 = slightly, 4 = somewhat, 5 = very. * education numeric variable coding level of education: 9 = grade school, 12 = high school graduate, 14 = some college, 16 = college graduate, 17 = some graduate work, 18 = master’s degree, 20 = Ph.D., M.D., or other advanced degree
Cross-section data from the High School and Beyond survey conducted by the Department of Education in 1980, with a follow-up in 1986. The survey included students from approximately 1,100 high schools.
library(pacman)
p_load(AER)
data("CollegeDistance")
head(CollegeDistance)
## gender ethnicity score fcollege mcollege home urban unemp wage distance
## 1 male other 39.15 yes no yes yes 6.2 8.09 0.2
## 2 female other 48.87 no no yes yes 6.2 8.09 0.2
## 3 male other 48.74 no no yes yes 6.2 8.09 0.2
## 4 male afam 40.40 no no yes yes 6.2 8.09 0.2
## 5 female other 40.48 no no no yes 5.6 8.09 0.4
## 6 male other 54.71 no no yes yes 5.6 8.09 0.4
## tuition education income region
## 1 0.88915 12 high other
## 2 0.88915 12 low other
## 3 0.88915 12 low other
## 4 0.88915 12 low other
## 5 0.88915 13 low other
## 6 0.88915 12 low other
Variables:
Cross-section data originating from the May 1985 Current Population Survey by the US Census Bureau (random sample drawn for Berndt 1991)
library(pacman)
p_load(AER)
data("CPS1985")
head(CPS1985)
## wage education experience age ethnicity region gender occupation
## 1 5.10 8 21 35 hispanic other female worker
## 1100 4.95 9 42 57 cauc other female worker
## 2 6.67 12 1 19 cauc other male worker
## 3 4.00 12 4 22 cauc other male worker
## 4 7.50 12 17 35 cauc other male worker
## 5 13.07 13 9 28 cauc other male worker
## sector union married
## 1 manufacturing no yes
## 1100 manufacturing no yes
## 2 manufacturing no no
## 3 other no no
## 4 other no yes
## 5 other yes no
Variables:
Guns is a balanced panel of data on 50 US states, plus the District of Columbia (for a total of 51 states), by year for 1977–1999.
library(pacman)
p_load(AER)
data("Guns")
head(Guns)
## year violent murder robbery prisoners afam cauc male population
## 1 1977 414.4 14.2 96.8 83 8.384873 55.12291 18.17441 3.780403
## 2 1978 419.1 13.3 99.1 94 8.352101 55.14367 17.99408 3.831838
## 3 1979 413.3 13.2 109.5 144 8.329575 55.13586 17.83934 3.866248
## 4 1980 448.5 13.2 132.1 141 8.408386 54.91259 17.73420 3.900368
## 5 1981 470.5 11.9 126.5 149 8.483435 54.92513 17.67372 3.918531
## 6 1982 447.7 10.6 112.0 183 8.514000 54.89621 17.51052 3.925229
## income density state law
## 1 9563.148 0.0745524 Alabama no
## 2 9932.000 0.0755667 Alabama no
## 3 9877.028 0.0762453 Alabama no
## 4 9541.428 0.0768288 Alabama no
## 5 9548.351 0.0771866 Alabama no
## 6 9478.919 0.0773185 Alabama no
Variables:
Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.
library(pacman)
p_load(AER)
data("HousePrices")
head(HousePrices)
## price lotsize bedrooms bathrooms stories driveway recreation fullbase gasheat
## 1 42000 5850 3 1 2 yes no yes no
## 2 38500 4000 2 1 1 yes no no no
## 3 49500 3060 3 1 1 yes no no no
## 4 60500 6650 3 1 2 yes yes no no
## 5 61000 6360 2 1 1 yes no no no
## 6 66000 4160 3 1 1 yes yes yes no
## aircon garage prefer
## 1 no 1 no
## 2 no 0 no
## 3 no 0 no
## 4 no 0 no
## 5 no 0 no
## 6 yes 0 no
Variables:
Balanced panel data for the years 1983–1997 from 50 US States, plus the District of Columbia, for assessing traffic fatalities and seat belt usage.
library(pacman)
p_load(AER)
data("USSeatBelts")
head(USSeatBelts)
## state year miles fatalities seatbelt speed65 speed70 drinkage alcohol income
## 1 AK 1983 3358 0.04466945 NA no no yes no 17973
## 2 AK 1984 3589 0.03733630 NA no no yes no 18093
## 3 AK 1985 3840 0.03307291 NA no no yes no 18925
## 4 AK 1986 4008 0.02519960 NA no no yes no 18466
## 5 AK 1987 3900 0.01948718 NA no no yes no 18021
## 6 AK 1988 3841 0.02525384 NA no no yes no 18447
## age enforce
## 1 28.23497 no
## 2 28.34354 no
## 3 28.37282 no
## 4 28.39665 no
## 5 28.45325 no
## 6 28.85142 no
Variables:
Write your research question clearly: “Is there a linear relationship between ______ and ______?”. Carefully consider if you should incorporate non-linearities such as logs or quadratic terms in your regression.
Define:
Estimate your regression model. Present the results in a table that reports:
In complete sentences:
Before you leave today, please submit your written worksheet to me (with your name on it) and upload your R script to the in-class Canvas assignment for today.