df <- CigarettesSWkable(head(df, 10), format ="html")
state
year
cpi
population
packs
income
tax
price
taxs
rprice
rincome
rtax
rtdiff
AL
1985
1.076
3973000
116.4863
46014968
32.5
102.18167
33.34834
94.96438
10.76387
30.20447
0.7884121
AR
1985
1.076
2327000
128.5346
26210736
37.0
101.47500
37.00000
94.30762
10.46817
34.38662
0.0000000
AZ
1985
1.076
3184000
104.5226
43956936
31.0
108.57875
36.17042
100.90962
12.83046
28.81041
4.8052211
CA
1985
1.076
26444000
100.3630
447102816
26.0
107.83734
32.10400
100.22058
15.71332
24.16357
5.6728627
CO
1985
1.076
3209000
112.9635
49466672
31.0
94.26666
31.00000
87.60842
14.32619
28.81041
0.0000000
CT
1985
1.076
3201000
109.2784
60063368
42.0
128.02499
51.48333
118.98234
17.43861
39.03346
8.8135073
DE
1985
1.076
618000
143.8511
9927301
30.0
102.49166
30.00000
95.25248
14.92899
27.88104
0.0000000
FL
1985
1.076
11352000
122.1811
166919248
37.0
115.29000
42.49000
107.14684
13.66538
34.38662
5.1022322
GA
1985
1.076
5963000
127.2346
78364336
28.0
97.02517
28.84183
90.17209
12.21354
26.02231
0.7823728
IA
1985
1.076
2830000
113.7456
37902896
34.0
101.84200
37.91700
94.64870
12.44726
31.59851
3.6403345
Describe the data
It is a panel data on cigarette consumption for the 48 continental US States from 1985–1995.
state - Factor indicating state.
year - Factor indicating year.
cpi - Consumer price index.
population - State population.
packs - Number of packs per capita.
income - State personal income (total, nominal).
tax - Average state, federal and average local excise taxes for fiscal year.
price - Average price during fiscal year, including sales tax.
taxs - Average excise taxes for fiscal year, including sales tax.
rprice - Real price of cigarette
rincome - real income
rtax - real tax
Type of data
This is panel dataset as it has observation of different states over two different time frames for each.
Graph
# A two-way table summarizing the average real price (rprice) by state and yearavg_rprice_table <- CigarettesSW %>%group_by(state, year) %>%summarise(avg_rprice =mean(rprice, na.rm =TRUE))
`summarise()` has grouped output by 'state'. You can override using the
`.groups` argument.
# The summarized tablekable(head(avg_rprice_table, 10), format ="html", caption ="Average real price by state and year")
Average real price by state and year
state
year
avg_rprice
AL
1985
94.96438
AL
1995
103.91821
AR
1985
94.30762
AR
1995
115.18538
AZ
1985
100.90962
AZ
1995
130.31989
CA
1985
100.22058
CA
1995
138.12643
CO
1985
87.60842
CO
1995
109.80972
ggplot(df, aes(x = year, y = rprice)) +geom_line() +labs(title ="Real Cigarette Prices Over Time",x ="Year", y ="Real Price (Adjusted)") +theme_minimal()
p <-ggplot(CigarettesSW, aes(x = year, y = rprice, group = state, color =factor(state))) +geom_line() +labs(title ="Real Cigarette Prices Over Time by State",x ="Year", y ="Real Price (Adjusted for Inflation)") +theme_minimal()
# Convert to an interactive plotinteractive_plot <-ggplotly(p, tooltip =c("x", "y", "group"))# Displayinteractive_plot
Dataset 2
data("MASchools")
df2 <- MASchoolskable(head(df2, 10), format ="html")
district
municipality
expreg
expspecial
expbil
expocc
exptot
scratio
special
lunch
stratio
income
score4
score8
salary
english
1
Abington
4201
7375.69
0
0
4646
16.6
14.6
11.8
19.0
16.379
714
691
34.3600
0.0000000
2
Acton
4129
8573.99
0
0
4930
5.7
17.4
2.5
22.6
25.792
731
NA
38.0630
1.2461059
3
Acushnet
3627
8081.72
0
0
4281
7.5
12.1
14.1
19.3
14.040
704
693
32.4910
0.0000000
5
Agawam
4015
8181.37
0
0
4826
8.6
21.1
12.1
17.9
16.111
704
691
33.1060
0.3225806
7
Amesbury
4273
7037.22
0
0
4824
6.1
16.8
17.4
17.5
15.423
701
699
34.4365
0.0000000
8
Amherst
5183
10595.80
6235
0
6454
7.7
17.2
26.8
15.7
11.144
714
NA
NA
3.9215686
9
Andover
4685
12279.58
0
0
5537
5.4
11.3
3.3
17.1
26.327
725
728
41.6150
0.0000000
10
Arlington
5518
10055.05
0
0
6405
7.1
20.4
11.2
16.8
21.449
717
715
36.9940
2.7027028
14
Ashland
5009
8840.86
0
0
5649
10.6
13.9
8.6
17.3
21.912
702
705
34.4215
0.0000000
16
Attleboro
3823
9547.39
12943
11519
4814
6.7
13.2
20.7
20.5
14.970
701
688
33.8790
0.3752345
Describe
The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in Massachusetts.
district - District code.
municipality - Municipality name.
expreg - Expenditures per pupil, regular.
expspecial - Expenditures per pupil, special needs.
expbil - Expenditures per pupil, bilingual.
expocc - Expenditures per pupil, occupational.
exptot - Expenditures per pupil, total.
scratio - Students per computer.
special - Special education students (per cent).
lunch - Percent qualifying for reduced-price lunch.
stratio - Student-teacher ratio.
income - Per capita income.
score4 - 4th grade score (math + English + science).
score8 - 8th grade score (math + English + science).
salary - Average teacher salary.
english - Percent of English learners.
Type of dataset
This is a cross-sectional dataset with observed variable for each district at a time.
Graphs
# Scatter plot for score8ggplot(df2, aes(x = stratio, y = score8)) +geom_point() +geom_smooth(method ="lm", se =FALSE) +labs(title ="Student-Teacher Ratio vs. Score8",x ="Student-Teacher Ratio",y ="Score8") +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 40 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 40 rows containing missing values or values outside the scale range
(`geom_point()`).
# Plottingp <-ggplot(df2, aes(x = expreg, y = income, color =factor(district))) +geom_point() +labs(title ="Relationship between Regular Expenditure and Income",x ="Regular Expenditure",y ="Income",color ="District") +theme_minimal()+theme(legend.position ="none") # Convert ggplot object to an interactive plotp_interactive <-ggplotly(p)# Displayp_interactive