1. Introduction

The link between cigarette smoking and lung cancer was first discovered by a British scientist named Sir Richard Doll in 1957 (BBC). In the five and a half decades since the announcement, public health organizations and government agencies have undertaken measures to fight the big tobacco industry and decrease smoking rates. In 1964, the acting Surgeon General of the United States released the first official report warning of the health risks of smoking (CDC). Shortly after in 1965, Congress enacted the Federal Cigarette Labeling and Advertising Act, which requried printed health warnings on cigarette packs and banned broadcasted cigarette advertising (CDC).

The anti-tobacco campaign in the mid to late 20th century is widely considered one of the most successful public health campaigns in American history. That is, nearly half of living adults who ever smoked have quit. (CDC) Even so, according to the CDC, smoking remains the number one cause of preventable death and disease in the United States, causing roughly 480,000 deaths per year. According to estimates from the NIH, the United States spends an annual $13.4 billion on health care costs associated with lung cancer. Smoking conintues to place immense social, economic, and medical burdens on the United States and countries around the world.

This exploration of 2013 smoking and demographic data investigates the relationships between smoking prevelance, regulation, and outcomes in the United States and addresses the following questions: How do smoking trends in the United States vary by region? What demographic variables correlate most strongly with lung cancer death rates? Do high cigarette taxes correlate with lower smoking or lung cancer death rates? Ultimately, by answering these questions, this project aims to illuminate the long-term legacy of the anti-tobacco campaign in the United States.

2. Methodology

I aggregated four data sets, two from Social Explorer and two from the CDC, in order to unite data on state demographics, lung cancer deaths, smoking prevelance, and tobacco taxes for 2013. After selecting the variables I wished to examine, I merged the data with shapefiles data from Social Explorer and polygons data from the USAboundaries package in order to later represent relevant variables on chloropleths map of the United States. I began my analysis with an exploration of each variable, identifying summary statistics, the states with the top five and bottom five values, and regional variation of each variable. After noting key trends, I analysed the relationships between each variable and Lung Cancer Death rates by performing simple linear regressions, presenting correlation coefficients, and accessing the patterts of residuals.

3. Data Exploration


Regions



Lung Cancer Death Rate

LC_Death_Rate
Min. :15.10
1st Qu.:45.10
Median :50.35
Mean :52.14
3rd Qu.:60.65
Max. :81.00
State LC_Death_Rate
Kentucky 81.0
West Virginia 77.5
Maine 77.3
Arkansas 73.3
Tennessee 66.5
State LC_Death_Rate
Texas 35.7
New Mexico 35.3
California 32.5
Colorado 29.6
Utah 15.1
Region Avg_Death_Rate
East_South_Central 69.75000
West_South_Central 58.12500
New_England 57.50000
East_North_Central 56.82000
South_Atlantic 56.46250
West_North_Central 52.02857
Middle_Atlantic 50.10000
Pacific 40.72000
Mountain 37.08750

Lung Cancer death rate is defined as the number of deaths per 100,000 deaths that are caused by tobacco use. The minimum lung cancer death rate is 15.10 in Utah while the maximum lung cancer death rate is 81.0 in West Virginia. The mean lung cancer death rate is 52.14. Lung cancer death rates are highest in the East and West South Central regions and New England, while death rates are the lowest in the Pacific and Mountain regions. Appart from the three regions with the highest death rates, death rates show a pretty even spread around the nation. Again, Utah’s very low death rate is likely associated with the influence of the Mormon church.

Percent of Current Smokers

Current_Smokers_All
Min. :10.30
1st Qu.:16.65
Median :19.05
Mean :19.32
3rd Qu.:21.48
Max. :27.30
State Current_Smokers_All
West Virginia 27.3
Kentucky 26.5
Arkansas 25.9
Mississippi 24.8
Tennessee 24.3
State Current_Smokers_All
New Jersey 15.7
Connecticut 15.5
Hawaii 13.3
California 12.5
Utah 10.3
Region Avg_Percent_Smokers
East_South_Central 24.27500
West_South_Central 22.25000
East_North_Central 20.68000
South_Atlantic 20.02500
West_North_Central 19.84286
Middle_Atlantic 17.76667
Mountain 17.45000
New_England 17.08333
Pacific 16.36000

Current smokers is assessed as the percentage of the state population that currently smokes. The minimum percentage of current smokers is 10.30% in Utah while the maximum percentage of smokers is 27.30% in West Virgina. The mean percentage of smokers is 19.32%. Percentages of smokers are lowest in the Pacific, New England, and Mountain regions and highest in the East and West Sough Central regions. It is important to note that Utah’s very low smoking rate is likely due to the strong influence of the Mormom church, which explicitly condemns the use of tobacco and alcohol, in the state.

Cigarette Consumption

Consumption
Min. : 16.60
1st Qu.: 35.52
Median : 44.30
Mean : 49.54
3rd Qu.: 63.42
Max. :103.10
State Consumption
West Virginia 103.1
Kentucky 93.5
New Hampshire 89.6
Missouri 87.4
Delaware 77.1
State Consumption
Arizona 24.4
California 23.9
Utah 21.4
Washington 19.5
New York 16.6
Region Avg_Consumption
East_South_Central 72.05000
South_Atlantic 61.98750
West_South_Central 58.55000
West_North_Central 55.20000
East_North_Central 47.68000
New_England 46.51667
Mountain 37.73750
Middle_Atlantic 33.33333
Pacific 30.54000

Cigarette consumption is defined as per capita cigarette pack sales. The minimum cigarette consumption is 16.6 in New York while the maximum cigarette consumption is 103.1 in West Virginia. The mean cigarette consumption is 49.54. Notably, cigarette consumption displays a wide spread across the states, with a roughly 6-fold difference between the minimum and maximum. Consumption is lowest in the Pacific and Mid-Atlantic regions while it is highest in the East South Central and South Atlantic regions. Consumption conveys smoking prevelance differently than percentage of smokers as it indicates the amount of cigaretts smoked in each state, as opposed to number of smokers. Therefore, this variable can speak to the distribution of heavy versus casual smokers.

Median Household Income

median_hs_income
Min. :37963
1st Qu.:46820
Median :51335
Mean :52884
3rd Qu.:58665
Max. :72483
State median_hs_income
Maryland 72483
Alaska 72237
New Jersey 70165
Hawaii 68020
Connecticut 67098
State median_hs_income
Kentucky 43399
Alabama 42849
West Virginia 41253
Arkansas 40511
Mississippi 37963
Region Avg_Income
Pacific 61820.60
Middle_Atlantic 59847.00
New_England 58925.00
West_North_Central 52425.71
South_Atlantic 52272.75
Mountain 51839.00
East_North_Central 50312.00
West_South_Central 45517.25
East_South_Central 42127.00

Income is assessed as the median household income per state. The minimum median household income is 37,963 in Mississippi while the maximum median household income is 72,483 in Maryland. The mean median household income is 52,884. Median incomes are lowest in the South, namely the East and West South Central regions and highest in the Pacific, Middle Antlantic, and New England. However, as the map shows, a fair amount of regional variability exists with regards to median income.

Tobacco Tax

Total_Tax_percent
Min. :26.50
1st Qu.:33.05
Median :40.65
Mean :40.24
3rd Qu.:46.27
Max. :56.90
State Total_Tax_percent
Minnesota 56.9
Rhode Island 54.9
Connecticut 53.7
New York 53.4
Massachusetts 52.1
Washington 52.1
State Total_Tax_percent
Georgia 30.2
Louisiana 29.6
Alabama 29.0
Missouri 26.9
Virginia 26.5
Region Avg_Tax
New_England 50.18333
Middle_Atlantic 49.20000
East_North_Central 42.82000
Pacific 41.58000
Mountain 38.60000
West_North_Central 38.18571
West_South_Central 36.92500
South_Atlantic 35.92500
East_South_Central 32.57500

Tobacco tax is assessed as the percentage of the retail price of a cigarette pack that represents state or federal taxes. The minimum tax percentage is 26.50% in Virginia while the maximum tax percentage is 56.9% in Minnesota. The mean tax percentage is 40.24%. Tobacco taxes are lowest in the South, namely the East South Central and South Atlantic regions and highest in New England, with high outliers in Washington and Minnesota.

4. Analysis of Relationships

Lung Cancer Death Rate and Current Smokers

## [1] 0.7558119
## 
## Call:
## lm(formula = LC_Death_Rate ~ Current_Smokers_All, data = Complete_State_Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -24.2047  -4.0579   0.3892   4.4979  22.6916 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -1.7515     6.8481  -0.256    0.799    
## Current_Smokers_All   2.7901     0.3489   7.997 2.22e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.604 on 48 degrees of freedom
## Multiple R-squared:  0.5713, Adjusted R-squared:  0.5623 
## F-statistic: 63.95 on 1 and 48 DF,  p-value: 2.224e-10

As one would expect based off medical research dating back to the 1950s, there is a strong positive correlation between the percentage of current smokers in a state and the lung cancer death rate. The correlation coefficient is 0.756. A linear regression found the positive correlation to be siginificant to the 0.1% level.

Lung Cancer Death Rate and Median Household Income

## [1] -0.5743452
## 
## Call:
## lm(formula = LC_Death_Rate ~ median_hs_income, data = Complete_State_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.117  -4.386   1.219   6.516  20.697 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      97.6462934  9.4841309  10.296 9.67e-14 ***
## median_hs_income -0.0008605  0.0001770  -4.861 1.29e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.76 on 48 degrees of freedom
## Multiple R-squared:  0.3299, Adjusted R-squared:  0.3159 
## F-statistic: 23.63 on 1 and 48 DF,  p-value: 1.293e-05

There is a strong negative correlation between the median household income and lung cancer death rate. The correlation coefficient is -0.574. A linear regression found the negative correlation to be siginificant to the 0.1% level.

Lung Cancer Death Rate and Cigarette Consumption

## [1] 0.6653948
## 
## Call:
## lm(formula = LC_Death_Rate ~ Consumption, data = Complete_State_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -24.895  -5.058   1.338   6.519  25.605 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.75542    3.73062   8.244 9.44e-11 ***
## Consumption  0.43174    0.06991   6.176 1.35e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.809 on 48 degrees of freedom
## Multiple R-squared:  0.4428, Adjusted R-squared:  0.4311 
## F-statistic: 38.14 on 1 and 48 DF,  p-value: 1.352e-07

As one would assume, there is a strong positive correlation between cigaratte consumption in a state and the lung cancer death rate. The correlation coefficient is 0.665. A linear regression found the positive correlation to be siginificant to the 0.1% level.

Lung Cancer Death Rate and Tobacco Tax

## [1] -0.1606215
## 
## Call:
## lm(formula = LC_Death_Rate ~ Total_Tax_percent, data = Complete_State_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -36.327  -6.507  -1.189   8.669  27.187 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        62.5778     9.4360   6.632  2.7e-08 ***
## Total_Tax_percent  -0.2593     0.2300  -1.127    0.265    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.97 on 48 degrees of freedom
## Multiple R-squared:  0.0258, Adjusted R-squared:  0.005503 
## F-statistic: 1.271 on 1 and 48 DF,  p-value: 0.2652

There is a small negative correlation between the percent tobacco tax and lung cancer death rate. The correlation coefficient is -0.161. A linear regression found the negative correlation to be siginificant to the 0.1% level. However, the small size of the correlation calls into question the efficacy of the taxes in deterring smokers and reducing the prevelance of lung cancer deaths.

Percentage of Current Smokers and Tobacco tax

## [1] -0.4943496
## 
## Call:
## lm(formula = Current_Smokers_All ~ Total_Tax_percent, data = Complete_State_Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.4202 -1.6993 -0.1269  2.0510  6.3367 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       28.01664    2.25119   12.45  < 2e-16 ***
## Total_Tax_percent -0.21620    0.05487   -3.94 0.000263 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.094 on 48 degrees of freedom
## Multiple R-squared:  0.2444, Adjusted R-squared:  0.2286 
## F-statistic: 15.52 on 1 and 48 DF,  p-value: 0.0002634

There is a negative correlation between tobacco tax and the percentage of current smokers. Notably, the correlation coefficient, -0.494, is much larger than the coefficient between tobacco tax and lung cancer death rate. A linear regression found the negative correlation to be siginificant to the 0.1% level.

A Closer Look at Lung Cancer and Tobacco tax

## [1] 0.2887825

A multivariable regression controlling for both median household income and percentage of smokers–two of the strongest predictor variables for lung cancer death rate–allows us to identify trends in the distribution left unexplained by the those variables. There is a positive correlation of 0.289 between tobacco tax and the residuals left over from the multivariable regression. That is, as cigarette tax goes up, the residual associated with that state becomes more positive, indicating that the state’s lung caner death rate is higher than would be predicted by median income and precentage of smokers alone. This result appears counterintuitive at first, but is ultimately expected. The positive correlation indicates that independent of income or precentage of smokers, there is no longer a negative correlation between lung cancer death rates and tobacco tax. As evidenced by the strong negative correlation between tobacco tax and percentage of smokers, tobacco taxes likely contribute to lower lung cancer death rates by detering people from smoking.

Concluions

My investigation of 2013 smoking data suggests the existence of strong regional trends in smoking prevalence, regulations, and outcomes across the United States. The southern regions, namely South Atlantic, East South Central and West South Central showed the strongest regional correlations between variables. For example, the southern regions have the lowest cigarette taxes and median household incomes while they have the highest smoking and lung cancer death rates.

As one would expect, lung cancer death rates correlate positively and strongly with current percentage of smokers and cigarette consumption. These relationships support the proven causal relationship between tobacco consumption and lung cancer. Meanwhile, lung cancer death rates corretly negatively with median household income, suggesting that, generally, smoking remains more prevelant in lower income states. While there is a negative correlation between lung cancer death rate and cigarette tax, which could suggest the efficacy of such taxes, the correlation is very small. Upon controlling for median income and number of smokers, the negative correlation reverses, resulting in a positive correlation between the regression residuals and tobacco tax. This result indicates that independent of income and percentage of smokers, tobacco taxes do not correlate with lower lung cancer death rates.

Further exploration into the relationship between anti-tobacco regulations and adverse health outcomes should track the changes in outcomes over time, paying special attention to the few years after the implementation of new anti-tobacco regulations or changes in tobacco taxes. Moreover, one can analyze the breakdown of smoking prevelance across different demographic groups in order gain deeper insights into the possible imacts of tobacco regulations.

Ultimately, my findings indicate that there is much more work to be done in the fight against smoking in the United States. Smoking rates remain as high as 27% in some states and lung cancer death rates as high as 69. Public health groups should target the regions with the highest smoking prevelance, namely the south, and try to better understand the links between certain demographic factors and smoking behaviors in those areas in order to create efficient yet effective anti-smoking campaigns.