Objective

Today you will:

Step 1: Dataset Selection

Please select one of the following datasets. The R code to load the data into R, and a brief description of the variables is provided below for each dataset:

Australian Health Service Utilization Data

Cross-section data originating from the 1977–1978 Australian Health Survey.

Variables:

  • visits Number of doctor visits in past 2 weeks.
  • gender Factor indicating gender.
  • age Age in years divided by 100.
  • income Annual income in tens of thousands of dollars.
  • illness Number of illnesses in past 2 weeks.
  • reduced Number of days of reduced activity in past 2 weeks due to illness or injury.
  • health General health questionnaire score using Goldberg’s method.
  • private Factor. Does the individual have private health insurance?
  • freepoor Factor. Does the individual have free government health insurance due to low income?
  • freerepat Factor. Does the individual have free government health insurance due to old age, disability or veteran status?
  • nchronic Factor. Is there a chronic condition not limiting activity?
  • lchronic Factor. Is there a chronic condition limiting activity?
library(pacman)
p_load(AER)
data("DoctorVisits")
head(DoctorVisits)
##   visits gender  age income illness reduced health private freepoor freerepat
## 1      1 female 0.19   0.55       1       4      1     yes       no        no
## 2      1 female 0.19   0.45       1       2      1     yes       no        no
## 3      1   male 0.19   0.90       3       0      0      no       no        no
## 4      1   male 0.19   0.15       1       0      0      no       no        no
## 5      1   male 0.19   0.45       2       5      1      no       no        no
## 6      1 female 0.19   0.35       5       1      9      no       no        no
##   nchronic lchronic
## 1       no       no
## 2       no       no
## 3       no       no
## 4       no       no
## 5      yes       no
## 6      yes       no

Cost Function of Electricity Producers 1970

Cross-section data, at the firm level, on electric power generation.

Variables:

  • cost total cost.
  • output total output.
  • labor wage rate.
  • laborshare cost share for labor.
  • capital capital price index.
  • capitalshare cost share for capital.
  • fuel fuel price.
  • fuelshare cost share for fuel.
library(pacman)
p_load(AER)
data("Electricity1970")
head(Electricity1970)
##      cost output   labor laborshare capital capitalshare   fuel fuelshare
## 1  0.2130      8 6869.47     0.3291  64.945       0.4197 18.000    0.2512
## 4  3.0427    869 8372.96     0.1030  68.227       0.2913 21.067    0.6057
## 5  9.4059   1412 7960.90     0.0891  40.692       0.1567 41.530    0.7542
## 14 0.7606     65 8971.89     0.2802  41.243       0.1282 28.539    0.5916
## 15 2.2587    295 8218.40     0.1772  71.940       0.1623 39.200    0.6606
## 16 1.3422    183 5063.49     0.0960  74.430       0.2629 35.510    0.6411

US General Social Survey 1974–2002

Cross-section data for 9120 women taken from every fourth year of the US General Social Survey between 1974 and 2002 to investigate the determinants of fertility.

Variables:

  • kids Number of children. This is coded as a numerical variable but note that the value 8 actually encompasses 8 or more children.
  • age Age of respondent.
  • education Highest year of school completed.
  • year GSS year for respondent.
  • siblings Number of brothers and sisters.
  • agefirstbirth Woman’s age at birth of first child.
  • ethnicity Factor indicating ethnicity. Is the individual Caucasian (“cauc”) or not (“other”)?
  • city16 Factor. Did the respondent live in a city (with population \(>\) 50,000) at age 16?
  • lowincome16 Factor. Was the income below average at age 16?
  • immigrant Factor. Was the respondent (or both parents) born abroad?
library(pacman)
p_load(AER)
data("GSS7402")
head(GSS7402)
##   kids age education year siblings agefirstbirth ethnicity city16 lowincome16
## 1    0  25        14 2002        1            NA      cauc     no          no
## 2    1  30        13 2002        4            19      cauc    yes          no
## 3    1  55         2 2002        1            27      cauc     no          no
## 4    2  57        16 2002        1            22      cauc     no          no
## 5    2  71        12 2002        6            29      cauc    yes          no
## 6    0  19        13 2002        1            NA     other    yes          no
##   immigrant
## 1        no
## 2        no
## 3       yes
## 4        no
## 5        no
## 6        no

Medicaid Utilization Data

Cross-section data originating from the 1986 Medicaid Consumer Survey.

Variables:

  • visits Number of doctor visits.
  • exposure Length of observation period for ambulatory care (days).
  • children Total number of children in the household.
  • age Age of the respondent.
  • income Annual household income (average of income range in million USD).
  • health1 The first principal component (divided by 1000) of three health-status variables: functional limitations, acute conditions, and chronic conditions.
  • health2 The second principal component (divided by 1000) of three health-status variables: functional limitations, acute conditions, and chronic conditions.
  • access Availability of health services (0 = low access, 1 = high access).
  • married Factor. Is the individual married?
  • gender Factor indicating gender.
  • ethnicity Factor indicating ethnicity (“cauc” or “other”).
  • school Number of years completed in school.
  • enroll Factor. Is the individual enrolled in a demonstration program?
  • program Factor indicating the managed care demonstration program: Aid to Families with Dependent Children (“afdc”) or non-institutionalized Supplementary Security Income (“ssi”).
library(pacman)
p_load(AER)
data("Medicaid1986")
head(Medicaid1986)
##   visits exposure children age income health1 health2 access married gender
## 1      0      100        1  24 14.500   0.495  -0.854   0.50      no female
## 2      1       90        3  19  6.000   0.520  -0.969   0.17      no female
## 3      0      106        4  17  8.377  -1.227   0.317   0.42      no female
## 4      0      114        2  29  6.000  -1.524   0.457   0.33      no female
## 5     11      115        1  26  8.500   0.173  -0.599   0.67      no female
## 6      3      102        1  22  6.000  -0.905   0.062   0.25      no female
##   ethnicity school enroll program
## 1      cauc     13    yes    afdc
## 2      cauc     11    yes    afdc
## 3      cauc     12    yes    afdc
## 4      cauc     12    yes    afdc
## 5      cauc     16    yes    afdc
## 6     other     12    yes    afdc

Determinants of Murder Rates in the United States

Cross-section data on states in 1950.

Variables:

  • rate Murder rate per 100,000 (FBI estimate, 1950).
  • convictions Number of convictions divided by number of murders in 1950.
  • executions Average number of executions during 1946–1950 divided by convictions in 1950.
  • time Median time served (in months) of convicted murderers released in 1951.
  • income Median family income in 1949 (in 1,000 USD).
  • lfp Labor force participation rate in 1950 (in percent).
  • noncauc Proportion of population that is non-Caucasian in 1950.
  • southern Factor indicating region.
library(pacman)
p_load(AER)
data("MurderRates")
head(MurderRates)
##    rate convictions executions time income  lfp noncauc southern
## 1 19.25       0.204      0.035   47   1.10 51.2   0.321      yes
## 2  7.53       0.327      0.081   58   0.92 48.5   0.224      yes
## 3  5.66       0.401      0.012   82   1.72 50.8   0.127       no
## 4  3.21       0.318      0.070  100   2.18 54.4   0.063       no
## 5  2.80       0.350      0.062  222   1.75 52.4   0.021       no
## 6  1.41       0.283      0.100  164   2.26 56.7   0.027       no

Extramarital Affairs Data

Infidelity data, known as Fair’s Affairs. Cross-section data from a survey conducted by Psychology Today in 1969.

library(pacman)
p_load(AER)
data("Affairs")
head(Affairs)
##    affairs gender age yearsmarried children religiousness education occupation
## 4        0   male  37        10.00       no             3        18          7
## 5        0 female  27         4.00       no             4        14          6
## 11       0 female  32        15.00      yes             1        12          1
## 16       0   male  57        15.00      yes             5        18          6
## 23       0   male  22         0.75       no             2        17          6
## 29       0 female  32         1.50       no             2        17          5
##    rating
## 4       4
## 5       4
## 11      4
## 16      5
## 23      3
## 29      5

Variables: * affairs How often engaged in extramarital sexual intercourse during the past year? 0 = none, 1 = once, 2 = twice, 3 = 3 times, 7 = 4–10 times, 12 = monthly, 12 = weekly, 12 = daily. * gender factor indicating gender. * age numeric variable coding age in years: 17.5 = under 20, 22 = 20–24, 27 = 25–29, 32 = 30–34, 37 = 35–39, 42 = 40–44, 47 = 45–49, 52 = 50–54, 57 = 55 or over. * yearsmarried numeric variable coding number of years married: 0.125 = 3 months or less, 0.417 = 4–6 months, 0.75 = 6 months–1 year, 1.5 = 1–2 years, 4 = 3–5 years, 7 = 6–8 years, 10 = 9–11 years, 15 = 12 or more years. * children factor. Are there children in the marriage? * religiousness numeric variable coding religiousness: 1 = anti, 2 = not at all, 3 = slightly, 4 = somewhat, 5 = very. * education numeric variable coding level of education: 9 = grade school, 12 = high school graduate, 14 = some college, 16 = college graduate, 17 = some graduate work, 18 = master’s degree, 20 = Ph.D., M.D., or other advanced degree

College Distance Data

Cross-section data from the High School and Beyond survey conducted by the Department of Education in 1980, with a follow-up in 1986. The survey included students from approximately 1,100 high schools.

library(pacman)
p_load(AER)
data("CollegeDistance")
head(CollegeDistance)
##   gender ethnicity score fcollege mcollege home urban unemp wage distance
## 1   male     other 39.15      yes       no  yes   yes   6.2 8.09      0.2
## 2 female     other 48.87       no       no  yes   yes   6.2 8.09      0.2
## 3   male     other 48.74       no       no  yes   yes   6.2 8.09      0.2
## 4   male      afam 40.40       no       no  yes   yes   6.2 8.09      0.2
## 5 female     other 40.48       no       no   no   yes   5.6 8.09      0.4
## 6   male     other 54.71       no       no  yes   yes   5.6 8.09      0.4
##   tuition education income region
## 1 0.88915        12   high  other
## 2 0.88915        12    low  other
## 3 0.88915        12    low  other
## 4 0.88915        12    low  other
## 5 0.88915        13    low  other
## 6 0.88915        12    low  other

Variables:

  • gender factor indicating gender.
  • ethnicity factor indicating ethnicity (African-American, Hispanic or other).
  • score base year composite test score. These are achievement tests given to high school seniors in the sample.
  • fcollege factor. Is the father a college graduate?
  • mcollege factor. Is the mother a college graduate?
  • home factor. Does the family own their home?
  • urban factor. Is the school in an urban area?
  • unemp county unemployment rate in 1980.
  • wage state hourly wage in manufacturing in 1980.
  • distance distance from 4-year college (in 10 miles).
  • tuition average state 4-year college tuition (in 1000 USD).
  • education number of years of education.
  • income factor. Is the family income above USD 25,000 per year?
  • region factor indicating region (West or other)

Determinants of Wage Data

Cross-section data originating from the May 1985 Current Population Survey by the US Census Bureau (random sample drawn for Berndt 1991)

library(pacman)
p_load(AER)
data("CPS1985")
head(CPS1985)
##       wage education experience age ethnicity region gender occupation
## 1     5.10         8         21  35  hispanic  other female     worker
## 1100  4.95         9         42  57      cauc  other female     worker
## 2     6.67        12          1  19      cauc  other   male     worker
## 3     4.00        12          4  22      cauc  other   male     worker
## 4     7.50        12         17  35      cauc  other   male     worker
## 5    13.07        13          9  28      cauc  other   male     worker
##             sector union married
## 1    manufacturing    no     yes
## 1100 manufacturing    no     yes
## 2    manufacturing    no      no
## 3            other    no      no
## 4            other    no     yes
## 5            other   yes      no

Variables:

  • wage Wage (in dollars per hour).
  • education Number of years of education.
  • experience Number of years of potential work experience (age - education - 6).
  • age Age in years.
  • ethnicity Factor with levels “cauc”, “hispanic”, “other”.
  • region Factor. Does the individual live in the South?
  • gender Factor indicating gender.
  • occupation Factor with levels “worker” (tradesperson or assembly line worker), “technical” (technical or professional worker), “services” (service worker), “office” (office and clerical worker), “sales” (sales worker), “management” (management and administration).
  • sector Factor with levels “manufacturing” (manufacturing or mining), “construction”, “other”.
  • union Factor. Does the individual work on a union job?
  • married Factor. Is the individual married?

More Guns, Less Crime?

Guns is a balanced panel of data on 50 US states, plus the District of Columbia (for a total of 51 states), by year for 1977–1999.

library(pacman)
p_load(AER)
data("Guns")
head(Guns)
##   year violent murder robbery prisoners     afam     cauc     male population
## 1 1977   414.4   14.2    96.8        83 8.384873 55.12291 18.17441   3.780403
## 2 1978   419.1   13.3    99.1        94 8.352101 55.14367 17.99408   3.831838
## 3 1979   413.3   13.2   109.5       144 8.329575 55.13586 17.83934   3.866248
## 4 1980   448.5   13.2   132.1       141 8.408386 54.91259 17.73420   3.900368
## 5 1981   470.5   11.9   126.5       149 8.483435 54.92513 17.67372   3.918531
## 6 1982   447.7   10.6   112.0       183 8.514000 54.89621 17.51052   3.925229
##     income   density   state law
## 1 9563.148 0.0745524 Alabama  no
## 2 9932.000 0.0755667 Alabama  no
## 3 9877.028 0.0762453 Alabama  no
## 4 9541.428 0.0768288 Alabama  no
## 5 9548.351 0.0771866 Alabama  no
## 6 9478.919 0.0773185 Alabama  no

Variables:

  • state factor indicating state.
  • year factor indicating year.
  • violent violent crime rate (incidents per 100,000 members of the population).
  • murder murder rate (incidents per 100,000).
  • robbery robbery rate (incidents per 100,000).
  • prisoners incarceration rate in the state in the previous year (sentenced prisoners per 100,000 residents; value for the previous year).
  • afam percent of state population that is African-American, ages 10 to 64.
  • cauc percent of state population that is Caucasian, ages 10 to 64.
  • male percent of state population that is male, ages 10 to 29.
  • population state population, in millions of people.
  • income real per capita personal income in the state (US dollars).
  • density population per square mile of land area, divided by 1,000.
  • law factor. Does the state have a shall carry law in effect in that year?

House Prices in the City of Windsor, Canada

Sales prices of houses sold in the city of Windsor, Canada, during July, August and September, 1987.

library(pacman)
p_load(AER)
data("HousePrices")
head(HousePrices)
##   price lotsize bedrooms bathrooms stories driveway recreation fullbase gasheat
## 1 42000    5850        3         1       2      yes         no      yes      no
## 2 38500    4000        2         1       1      yes         no       no      no
## 3 49500    3060        3         1       1      yes         no       no      no
## 4 60500    6650        3         1       2      yes        yes       no      no
## 5 61000    6360        2         1       1      yes         no       no      no
## 6 66000    4160        3         1       1      yes        yes      yes      no
##   aircon garage prefer
## 1     no      1     no
## 2     no      0     no
## 3     no      0     no
## 4     no      0     no
## 5     no      0     no
## 6    yes      0     no

Variables:

  • price Sale price of a house.
  • lotsize Lot size of a property in square feet.
  • bedrooms Number of bedrooms.
  • bathrooms Number of full bathrooms.
  • stories Number of stories excluding basement.
  • driveway Factor. Does the house have a driveway?
  • recreation Factor. Does the house have a recreational room?
  • fullbase Factor. Does the house have a full finished basement?
  • gasheat Factor. Does the house use gas for hot water heating?
  • aircon Factor. Is there central air conditioning?
  • garage Number of garage places.
  • prefer Factor. Is the house located in the preferred neighborhood of the city?

Effects of Mandatory Seat Belt Laws in the US

Balanced panel data for the years 1983–1997 from 50 US States, plus the District of Columbia, for assessing traffic fatalities and seat belt usage.

library(pacman)
p_load(AER)
data("USSeatBelts")
head(USSeatBelts)
##   state year miles fatalities seatbelt speed65 speed70 drinkage alcohol income
## 1    AK 1983  3358 0.04466945       NA      no      no      yes      no  17973
## 2    AK 1984  3589 0.03733630       NA      no      no      yes      no  18093
## 3    AK 1985  3840 0.03307291       NA      no      no      yes      no  18925
## 4    AK 1986  4008 0.02519960       NA      no      no      yes      no  18466
## 5    AK 1987  3900 0.01948718       NA      no      no      yes      no  18021
## 6    AK 1988  3841 0.02525384       NA      no      no      yes      no  18447
##        age enforce
## 1 28.23497      no
## 2 28.34354      no
## 3 28.37282      no
## 4 28.39665      no
## 5 28.45325      no
## 6 28.85142      no

Variables:

  • state factor indicating US state (abbreviation).
  • year factor indicating year.
  • miles millions of traffic miles per year.
  • fatalities number of fatalities per million of traffic miles (absolute frequencies of fatalities = fatalities times miles).
  • seatbelt seat belt usage rate, as self-reported by state population surveyed.
  • speed65 factor. Is there a 65 mile per hour speed limit?
  • speed70 factor. Is there a 70 (or higher) mile per hour speed limit?
  • drinkage factor. Is there a minimum drinking age of 21 years?
  • alcohol factor. Is there a maximum of 0.08 blood alcohol content?
  • income median per capita income (in current US dollar).
  • age mean age.
  • enforce factor indicating seat belt law enforcement (“no”, “primary”, “secondary”).

Step 2: Formulate a Research Question

Write your research question clearly: “Is there a linear relationship between ______ and ______?”. Carefully consider if you should incorporate non-linearities such as logs or quadratic terms in your regression.

Define:

Step 3: Estimate the Model

Estimate your regression model. Present the results in a table that reports:

Step 4: Interpret Your Results

In complete sentences:

Step 5: Share

Share your results with a classmate.

Step 6: Submit Work

Before you leave today, please submit your written worksheet to me (with your name on it) and upload your R script to the in-class Canvas assignment for today.