The link between cigarette smoking and lung cancer was first discovered by a British scientist named Sir Richard Doll in 1957 (BBC). In the five and a half decades since the announcement, public health organizations and government agencies have undertaken measures to fight the big tobacco industry and decrease smoking rates. In 1964, the acting Surgeon General of the United States released the first official report warning of the health risks of smoking (CDC). Shortly after in 1965, Congress enacted the Federal Cigarette Labeling and Advertising Act, which requried printed health warnings on cigarette packs and banned broadcasted cigarette advertising (CDC).
The anti-tobacco campaign in the mid to late 20th century is widely considered one of the most successful public health campaigns in American history. That is, nearly half of living adults who ever smoked have quit. (CDC) Even so, according to the CDC, smoking remains the number one cause of preventable death and disease in the United States, causing roughly 480,000 deaths per year. According to estimates from the NIH, the United States spends an annual $13.4 billion on health care costs associated with lung cancer. Smoking conintues to place immense social, economic, and medical burdens on the United States and countries around the world.
This exploration of 2013 smoking and demographic data investigates the relationships between smoking prevelance, regulation, and outcomes in the United States and addresses the following questions: How do smoking trends in the United States vary by region? What demographic variables correlate most strongly with lung cancer death rates? Do high cigarette taxes correlate with lower smoking or lung cancer death rates? Ultimately, by answering these questions, this project aims to illuminate the long-term legacy of the anti-tobacco campaign in the United States.
I aggregated four data sets, two from Social Explorer and two from the CDC, in order to unite data on state demographics, lung cancer deaths, smoking prevelance, and tobacco taxes for 2013. After selecting the variables I wished to examine, I merged the data with shapefiles data from Social Explorer and polygons data from the USAboundaries package in order to later represent relevant variables on chloropleths map of the United States. I began my analysis with an exploration of each variable, identifying summary statistics, the states with the top five and bottom five values, and regional variation of each variable. After noting key trends, I analysed the relationships between each variable and Lung Cancer Death rates by performing simple linear regressions, presenting correlation coefficients, and accessing the patterts of residuals.
| LC_Death_Rate | |
|---|---|
| Min. :15.10 | |
| 1st Qu.:45.10 | |
| Median :50.35 | |
| Mean :52.14 | |
| 3rd Qu.:60.65 | |
| Max. :81.00 |
| State | LC_Death_Rate |
|---|---|
| Kentucky | 81.0 |
| West Virginia | 77.5 |
| Maine | 77.3 |
| Arkansas | 73.3 |
| Tennessee | 66.5 |
| State | LC_Death_Rate |
|---|---|
| Texas | 35.7 |
| New Mexico | 35.3 |
| California | 32.5 |
| Colorado | 29.6 |
| Utah | 15.1 |
| Region | Avg_Death_Rate |
|---|---|
| East_South_Central | 69.75000 |
| West_South_Central | 58.12500 |
| New_England | 57.50000 |
| East_North_Central | 56.82000 |
| South_Atlantic | 56.46250 |
| West_North_Central | 52.02857 |
| Middle_Atlantic | 50.10000 |
| Pacific | 40.72000 |
| Mountain | 37.08750 |
Lung Cancer death rate is defined as the number of deaths per 100,000 deaths that are caused by tobacco use. The minimum lung cancer death rate is 15.10 in Utah while the maximum lung cancer death rate is 81.0 in West Virginia. The mean lung cancer death rate is 52.14. Lung cancer death rates are highest in the East and West South Central regions and New England, while death rates are the lowest in the Pacific and Mountain regions. Appart from the three regions with the highest death rates, death rates show a pretty even spread around the nation. Again, Utah’s very low death rate is likely associated with the influence of the Mormon church.
| Current_Smokers_All | |
|---|---|
| Min. :10.30 | |
| 1st Qu.:16.65 | |
| Median :19.05 | |
| Mean :19.32 | |
| 3rd Qu.:21.48 | |
| Max. :27.30 |
| State | Current_Smokers_All |
|---|---|
| West Virginia | 27.3 |
| Kentucky | 26.5 |
| Arkansas | 25.9 |
| Mississippi | 24.8 |
| Tennessee | 24.3 |
| State | Current_Smokers_All |
|---|---|
| New Jersey | 15.7 |
| Connecticut | 15.5 |
| Hawaii | 13.3 |
| California | 12.5 |
| Utah | 10.3 |
| Region | Avg_Percent_Smokers |
|---|---|
| East_South_Central | 24.27500 |
| West_South_Central | 22.25000 |
| East_North_Central | 20.68000 |
| South_Atlantic | 20.02500 |
| West_North_Central | 19.84286 |
| Middle_Atlantic | 17.76667 |
| Mountain | 17.45000 |
| New_England | 17.08333 |
| Pacific | 16.36000 |
Current smokers is assessed as the percentage of the state population that currently smokes. The minimum percentage of current smokers is 10.30% in Utah while the maximum percentage of smokers is 27.30% in West Virgina. The mean percentage of smokers is 19.32%. Percentages of smokers are lowest in the Pacific, New England, and Mountain regions and highest in the East and West Sough Central regions. It is important to note that Utah’s very low smoking rate is likely due to the strong influence of the Mormom church, which explicitly condemns the use of tobacco and alcohol, in the state.
| Consumption | |
|---|---|
| Min. : 16.60 | |
| 1st Qu.: 35.52 | |
| Median : 44.30 | |
| Mean : 49.54 | |
| 3rd Qu.: 63.42 | |
| Max. :103.10 |
| State | Consumption |
|---|---|
| West Virginia | 103.1 |
| Kentucky | 93.5 |
| New Hampshire | 89.6 |
| Missouri | 87.4 |
| Delaware | 77.1 |
| State | Consumption |
|---|---|
| Arizona | 24.4 |
| California | 23.9 |
| Utah | 21.4 |
| Washington | 19.5 |
| New York | 16.6 |
| Region | Avg_Consumption |
|---|---|
| East_South_Central | 72.05000 |
| South_Atlantic | 61.98750 |
| West_South_Central | 58.55000 |
| West_North_Central | 55.20000 |
| East_North_Central | 47.68000 |
| New_England | 46.51667 |
| Mountain | 37.73750 |
| Middle_Atlantic | 33.33333 |
| Pacific | 30.54000 |
Cigarette consumption is defined as per capita cigarette pack sales. The minimum cigarette consumption is 16.6 in New York while the maximum cigarette consumption is 103.1 in West Virginia. The mean cigarette consumption is 49.54. Notably, cigarette consumption displays a wide spread across the states, with a roughly 6-fold difference between the minimum and maximum. Consumption is lowest in the Pacific and Mid-Atlantic regions while it is highest in the East South Central and South Atlantic regions. Consumption conveys smoking prevelance differently than percentage of smokers as it indicates the amount of cigaretts smoked in each state, as opposed to number of smokers. Therefore, this variable can speak to the distribution of heavy versus casual smokers.
| median_hs_income | |
|---|---|
| Min. :37963 | |
| 1st Qu.:46820 | |
| Median :51335 | |
| Mean :52884 | |
| 3rd Qu.:58665 | |
| Max. :72483 |
| State | median_hs_income |
|---|---|
| Maryland | 72483 |
| Alaska | 72237 |
| New Jersey | 70165 |
| Hawaii | 68020 |
| Connecticut | 67098 |
| State | median_hs_income |
|---|---|
| Kentucky | 43399 |
| Alabama | 42849 |
| West Virginia | 41253 |
| Arkansas | 40511 |
| Mississippi | 37963 |
| Region | Avg_Income |
|---|---|
| Pacific | 61820.60 |
| Middle_Atlantic | 59847.00 |
| New_England | 58925.00 |
| West_North_Central | 52425.71 |
| South_Atlantic | 52272.75 |
| Mountain | 51839.00 |
| East_North_Central | 50312.00 |
| West_South_Central | 45517.25 |
| East_South_Central | 42127.00 |
Income is assessed as the median household income per state. The minimum median household income is 37,963 in Mississippi while the maximum median household income is 72,483 in Maryland. The mean median household income is 52,884. Median incomes are lowest in the South, namely the East and West South Central regions and highest in the Pacific, Middle Antlantic, and New England. However, as the map shows, a fair amount of regional variability exists with regards to median income.
| Total_Tax_percent | |
|---|---|
| Min. :26.50 | |
| 1st Qu.:33.05 | |
| Median :40.65 | |
| Mean :40.24 | |
| 3rd Qu.:46.27 | |
| Max. :56.90 |
| State | Total_Tax_percent |
|---|---|
| Minnesota | 56.9 |
| Rhode Island | 54.9 |
| Connecticut | 53.7 |
| New York | 53.4 |
| Massachusetts | 52.1 |
| Washington | 52.1 |
| State | Total_Tax_percent |
|---|---|
| Georgia | 30.2 |
| Louisiana | 29.6 |
| Alabama | 29.0 |
| Missouri | 26.9 |
| Virginia | 26.5 |
| Region | Avg_Tax |
|---|---|
| New_England | 50.18333 |
| Middle_Atlantic | 49.20000 |
| East_North_Central | 42.82000 |
| Pacific | 41.58000 |
| Mountain | 38.60000 |
| West_North_Central | 38.18571 |
| West_South_Central | 36.92500 |
| South_Atlantic | 35.92500 |
| East_South_Central | 32.57500 |
Tobacco tax is assessed as the percentage of the retail price of a cigarette pack that represents state or federal taxes. The minimum tax percentage is 26.50% in Virginia while the maximum tax percentage is 56.9% in Minnesota. The mean tax percentage is 40.24%. Tobacco taxes are lowest in the South, namely the East South Central and South Atlantic regions and highest in New England, with high outliers in Washington and Minnesota.
## [1] 0.7558119
##
## Call:
## lm(formula = LC_Death_Rate ~ Current_Smokers_All, data = Complete_State_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.2047 -4.0579 0.3892 4.4979 22.6916
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.7515 6.8481 -0.256 0.799
## Current_Smokers_All 2.7901 0.3489 7.997 2.22e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.604 on 48 degrees of freedom
## Multiple R-squared: 0.5713, Adjusted R-squared: 0.5623
## F-statistic: 63.95 on 1 and 48 DF, p-value: 2.224e-10
As one would expect based off medical research dating back to the 1950s, there is a strong positive correlation between the percentage of current smokers in a state and the lung cancer death rate. The correlation coefficient is 0.756. A linear regression found the positive correlation to be siginificant to the 0.1% level.
## [1] -0.5743452
##
## Call:
## lm(formula = LC_Death_Rate ~ median_hs_income, data = Complete_State_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.117 -4.386 1.219 6.516 20.697
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 97.6462934 9.4841309 10.296 9.67e-14 ***
## median_hs_income -0.0008605 0.0001770 -4.861 1.29e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.76 on 48 degrees of freedom
## Multiple R-squared: 0.3299, Adjusted R-squared: 0.3159
## F-statistic: 23.63 on 1 and 48 DF, p-value: 1.293e-05
There is a strong negative correlation between the median household income and lung cancer death rate. The correlation coefficient is -0.574. A linear regression found the negative correlation to be siginificant to the 0.1% level.
## [1] 0.6653948
##
## Call:
## lm(formula = LC_Death_Rate ~ Consumption, data = Complete_State_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -24.895 -5.058 1.338 6.519 25.605
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.75542 3.73062 8.244 9.44e-11 ***
## Consumption 0.43174 0.06991 6.176 1.35e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.809 on 48 degrees of freedom
## Multiple R-squared: 0.4428, Adjusted R-squared: 0.4311
## F-statistic: 38.14 on 1 and 48 DF, p-value: 1.352e-07
As one would assume, there is a strong positive correlation between cigaratte consumption in a state and the lung cancer death rate. The correlation coefficient is 0.665. A linear regression found the positive correlation to be siginificant to the 0.1% level.
## [1] -0.1606215
##
## Call:
## lm(formula = LC_Death_Rate ~ Total_Tax_percent, data = Complete_State_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.327 -6.507 -1.189 8.669 27.187
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.5778 9.4360 6.632 2.7e-08 ***
## Total_Tax_percent -0.2593 0.2300 -1.127 0.265
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.97 on 48 degrees of freedom
## Multiple R-squared: 0.0258, Adjusted R-squared: 0.005503
## F-statistic: 1.271 on 1 and 48 DF, p-value: 0.2652
There is a small negative correlation between the percent tobacco tax and lung cancer death rate. The correlation coefficient is -0.161. A linear regression found the negative correlation to be siginificant to the 0.1% level. However, the small size of the correlation calls into question the efficacy of the taxes in deterring smokers and reducing the prevelance of lung cancer deaths.
## [1] -0.4943496
##
## Call:
## lm(formula = Current_Smokers_All ~ Total_Tax_percent, data = Complete_State_Data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.4202 -1.6993 -0.1269 2.0510 6.3367
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.01664 2.25119 12.45 < 2e-16 ***
## Total_Tax_percent -0.21620 0.05487 -3.94 0.000263 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.094 on 48 degrees of freedom
## Multiple R-squared: 0.2444, Adjusted R-squared: 0.2286
## F-statistic: 15.52 on 1 and 48 DF, p-value: 0.0002634
There is a negative correlation between tobacco tax and the percentage of current smokers. Notably, the correlation coefficient, -0.494, is much larger than the coefficient between tobacco tax and lung cancer death rate. A linear regression found the negative correlation to be siginificant to the 0.1% level.
## [1] 0.2887825
A multivariable regression controlling for both median household income and percentage of smokers–two of the strongest predictor variables for lung cancer death rate–allows us to identify trends in the distribution left unexplained by the those variables. There is a positive correlation of 0.289 between tobacco tax and the residuals left over from the multivariable regression. That is, as cigarette tax goes up, the residual associated with that state becomes more positive, indicating that the state’s lung caner death rate is higher than would be predicted by median income and precentage of smokers alone. This result appears counterintuitive at first, but is ultimately expected. The positive correlation indicates that independent of income or precentage of smokers, there is no longer a negative correlation between lung cancer death rates and tobacco tax. As evidenced by the strong negative correlation between tobacco tax and percentage of smokers, tobacco taxes likely contribute to lower lung cancer death rates by detering people from smoking.
My investigation of 2013 smoking data suggests the existence of strong regional trends in smoking prevalence, regulations, and outcomes across the United States. The southern regions, namely South Atlantic, East South Central and West South Central showed the strongest regional correlations between variables. For example, the southern regions have the lowest cigarette taxes and median household incomes while they have the highest smoking and lung cancer death rates.
As one would expect, lung cancer death rates correlate positively and strongly with current percentage of smokers and cigarette consumption. These relationships support the proven causal relationship between tobacco consumption and lung cancer. Meanwhile, lung cancer death rates corretly negatively with median household income, suggesting that, generally, smoking remains more prevelant in lower income states. While there is a negative correlation between lung cancer death rate and cigarette tax, which could suggest the efficacy of such taxes, the correlation is very small. Upon controlling for median income and number of smokers, the negative correlation reverses, resulting in a positive correlation between the regression residuals and tobacco tax. This result indicates that independent of income and percentage of smokers, tobacco taxes do not correlate with lower lung cancer death rates.
Further exploration into the relationship between anti-tobacco regulations and adverse health outcomes should track the changes in outcomes over time, paying special attention to the few years after the implementation of new anti-tobacco regulations or changes in tobacco taxes. Moreover, one can analyze the breakdown of smoking prevelance across different demographic groups in order gain deeper insights into the possible imacts of tobacco regulations.
Ultimately, my findings indicate that there is much more work to be done in the fight against smoking in the United States. Smoking rates remain as high as 27% in some states and lung cancer death rates as high as 69. Public health groups should target the regions with the highest smoking prevelance, namely the south, and try to better understand the links between certain demographic factors and smoking behaviors in those areas in order to create efficient yet effective anti-smoking campaigns.