Discussion_2

Author

Aritra Ray

Warning: package 'AER' was built under R version 4.3.3
Loading required package: car
Loading required package: carData
Loading required package: lmtest
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
Loading required package: sandwich
Loading required package: survival

Attaching package: 'psych'
The following object is masked from 'package:car':

    logit

Attaching package: 'ggplot2'
The following objects are masked from 'package:psych':

    %+%, alpha

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout

Attaching package: 'dplyr'
The following object is masked from 'package:car':

    recode
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Dataset 1

#Store the dataset
data("CigarettesSW")
#Transforming
CigarettesSW <- transform(CigarettesSW,
  rprice  = price/cpi,
  rincome = income/population/cpi,
  rtax    = tax/cpi,
  rtdiff  = (taxs - tax)/cpi
)
df <- CigarettesSW
kable(head(df, 10), 
      format = "html")
state year cpi population packs income tax price taxs rprice rincome rtax rtdiff
AL 1985 1.076 3973000 116.4863 46014968 32.5 102.18167 33.34834 94.96438 10.76387 30.20447 0.7884121
AR 1985 1.076 2327000 128.5346 26210736 37.0 101.47500 37.00000 94.30762 10.46817 34.38662 0.0000000
AZ 1985 1.076 3184000 104.5226 43956936 31.0 108.57875 36.17042 100.90962 12.83046 28.81041 4.8052211
CA 1985 1.076 26444000 100.3630 447102816 26.0 107.83734 32.10400 100.22058 15.71332 24.16357 5.6728627
CO 1985 1.076 3209000 112.9635 49466672 31.0 94.26666 31.00000 87.60842 14.32619 28.81041 0.0000000
CT 1985 1.076 3201000 109.2784 60063368 42.0 128.02499 51.48333 118.98234 17.43861 39.03346 8.8135073
DE 1985 1.076 618000 143.8511 9927301 30.0 102.49166 30.00000 95.25248 14.92899 27.88104 0.0000000
FL 1985 1.076 11352000 122.1811 166919248 37.0 115.29000 42.49000 107.14684 13.66538 34.38662 5.1022322
GA 1985 1.076 5963000 127.2346 78364336 28.0 97.02517 28.84183 90.17209 12.21354 26.02231 0.7823728
IA 1985 1.076 2830000 113.7456 37902896 34.0 101.84200 37.91700 94.64870 12.44726 31.59851 3.6403345

Describe the data

It is a panel data on cigarette consumption for the 48 continental US States from 1985–1995.

  1. state - Factor indicating state.

  2. year - Factor indicating year.

  3. cpi - Consumer price index.

  4. population - State population.

  5. packs - Number of packs per capita.

  6. income - State personal income (total, nominal).

  7. tax - Average state, federal and average local excise taxes for fiscal year.

  8. price - Average price during fiscal year, including sales tax.

  9. taxs - Average excise taxes for fiscal year, including sales tax.

  10. rprice - Real price of cigarette

  11. rincome - real income

  12. rtax - real tax

Type of data

This is panel dataset as it has observation of different states over two different time frames for each.

Graph

# A two-way table summarizing the average real price (rprice) by state and year
avg_rprice_table <- CigarettesSW %>%
  group_by(state, year) %>%
  summarise(avg_rprice = mean(rprice, na.rm = TRUE))
`summarise()` has grouped output by 'state'. You can override using the
`.groups` argument.
# The summarized table
kable(head(avg_rprice_table, 10), 
      format = "html", 
      caption = "Average real price by state and year")
Average real price by state and year
state year avg_rprice
AL 1985 94.96438
AL 1995 103.91821
AR 1985 94.30762
AR 1995 115.18538
AZ 1985 100.90962
AZ 1995 130.31989
CA 1985 100.22058
CA 1995 138.12643
CO 1985 87.60842
CO 1995 109.80972
ggplot(df, aes(x = year, y = rprice)) +
  geom_line() +
  labs(title = "Real Cigarette Prices Over Time",
       x = "Year", 
       y = "Real Price (Adjusted)") +
  theme_minimal()

p <- ggplot(CigarettesSW, aes(x = year, y = rprice, group = state, color = factor(state))) +
  geom_line() +
  labs(title = "Real Cigarette Prices Over Time by State",
       x = "Year", 
       y = "Real Price (Adjusted for Inflation)") +
  theme_minimal()
# Convert to an interactive plot
interactive_plot <- ggplotly(p, tooltip = c("x", "y", "group"))

# Display
interactive_plot

Dataset 2

data("MASchools")
df2 <- MASchools
kable(head(df2, 10), 
      format = "html")
district municipality expreg expspecial expbil expocc exptot scratio special lunch stratio income score4 score8 salary english
1 Abington 4201 7375.69 0 0 4646 16.6 14.6 11.8 19.0 16.379 714 691 34.3600 0.0000000
2 Acton 4129 8573.99 0 0 4930 5.7 17.4 2.5 22.6 25.792 731 NA 38.0630 1.2461059
3 Acushnet 3627 8081.72 0 0 4281 7.5 12.1 14.1 19.3 14.040 704 693 32.4910 0.0000000
5 Agawam 4015 8181.37 0 0 4826 8.6 21.1 12.1 17.9 16.111 704 691 33.1060 0.3225806
7 Amesbury 4273 7037.22 0 0 4824 6.1 16.8 17.4 17.5 15.423 701 699 34.4365 0.0000000
8 Amherst 5183 10595.80 6235 0 6454 7.7 17.2 26.8 15.7 11.144 714 NA NA 3.9215686
9 Andover 4685 12279.58 0 0 5537 5.4 11.3 3.3 17.1 26.327 725 728 41.6150 0.0000000
10 Arlington 5518 10055.05 0 0 6405 7.1 20.4 11.2 16.8 21.449 717 715 36.9940 2.7027028
14 Ashland 5009 8840.86 0 0 5649 10.6 13.9 8.6 17.3 21.912 702 705 34.4215 0.0000000
16 Attleboro 3823 9547.39 12943 11519 4814 6.7 13.2 20.7 20.5 14.970 701 688 33.8790 0.3752345

Describe

The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in Massachusetts.

  1. district - District code.

  2. municipality - Municipality name.

  3. expreg - Expenditures per pupil, regular.

  4. expspecial - Expenditures per pupil, special needs.

  5. expbil - Expenditures per pupil, bilingual.

  6. expocc - Expenditures per pupil, occupational.

  7. exptot - Expenditures per pupil, total.

  8. scratio - Students per computer.

  9. special - Special education students (per cent).

  10. lunch - Percent qualifying for reduced-price lunch.

  11. stratio - Student-teacher ratio.

  12. income - Per capita income.

  13. score4 - 4th grade score (math + English + science).

  14. score8 - 8th grade score (math + English + science).

  15. salary - Average teacher salary.

  16. english - Percent of English learners.

Type of dataset

This is a cross-sectional dataset with observed variable for each district at a time.

Graphs

# Scatter plot for score8
ggplot(df2, aes(x = stratio, y = score8)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Student-Teacher Ratio vs. Score8",
       x = "Student-Teacher Ratio",
       y = "Score8") +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 40 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 40 rows containing missing values or values outside the scale range
(`geom_point()`).

# Plotting
p <- ggplot(df2, aes(x = expreg, y = income, color = factor(district))) +
  geom_point() +
  labs(title = "Relationship between Regular Expenditure and Income",
       x = "Regular Expenditure",
       y = "Income",
       color = "District") +
  theme_minimal()+
  theme(legend.position = "none") 

# Convert ggplot object to an interactive plot
p_interactive <- ggplotly(p)

# Display
p_interactive