This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.
#Analysis involves influence of various continous variables and categorical variables on the current use and dependence of alcohol in different states
# The data sheet contains the following variables
library(readr)
Alcohol_policy <- read_csv("~/Documents/Harshitha_analysis/Alcohol_policy.csv")
## Rows: 31 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): state
## dbl (16): SaleTimings, MinLegalDrinkingAge, HealthIndex, PercentageBPL, Drin...
## num (1): PercapitaIncome
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
attach(Alcohol_policy)
Alcohol_policy$state <- as.factor(Alcohol_policy$state)
Alcohol_policy$SaleTimings <- as.numeric(Alcohol_policy$SaleTimings)
Alcohol_policy$MinLegalDrinkingAge <- as.numeric(Alcohol_policy$MinLegalDrinkingAge)
Alcohol_policy$PercapitaIncome <- as.numeric(Alcohol_policy$PercapitaIncome)
Alcohol_policy$HealthIndex <- as.numeric(Alcohol_policy$HealthIndex)
Alcohol_policy$PercentageBPL <- as.numeric(Alcohol_policy$PercentageBPL)
Alcohol_policy$DrinkDrive <- as.numeric(Alcohol_policy$DrinkDrive)
Alcohol_policy$SDI2017 <- as.numeric(Alcohol_policy$SDI2017)
Alcohol_policy$PercentageIPV <- as.numeric(Alcohol_policy$PercentageIPV)
Alcohol_policy$banofsalepublicplaces <- as.factor(Alcohol_policy$banofsalepublicplaces)
Alcohol_policy$minimumsaleprice <- as.factor(Alcohol_policy$minimumsaleprice)
Alcohol_policy$policycontroldensity <- as.factor(Alcohol_policy$policycontroldensity)
Alcohol_policy$quotaforretailsale <- as.factor(Alcohol_policy$quotaforretailsale)
Alcohol_policy$warninquality <- as.factor(Alcohol_policy$warninquality)
Alcohol_policy$distributionsystem <- as.factor(Alcohol_policy$distributionsystem)
Alcohol_policy$pointofsale <- as.factor(Alcohol_policy$pointofsale)
Alcohol_policy$CurrentAlcoholUse <- as.numeric(Alcohol_policy$CurrentAlcoholUse)
Alcohol_policy$AlcoholDependence <- as.numeric(Alcohol_policy$AlcoholDependence)
str(Alcohol_policy)
## spc_tbl_ [31 × 18] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ state : Factor w/ 31 levels "Andaman and Nic",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ SaleTimings : num [1:31] 12 9 12 11 0 14 9 12 12 12 ...
## $ MinLegalDrinkingAge : num [1:31] 18 21 21 21 21 25 21 21 18 25 ...
## $ PercapitaIncome : num [1:31] 218649 168480 169742 86801 46292 ...
## $ HealthIndex : num [1:31] 45.4 65.1 46.1 48.5 32.1 ...
## $ PercentageBPL : num [1:31] 1 12.3 34.7 32 33.7 ...
## $ DrinkDrive : num [1:31] 20 1345 55 377 10 ...
## $ SDI2017 : num [1:31] 0.65 0.54 0.56 0.53 0.43 0.65 0.51 0.65 0.65 0.72 ...
## $ PercentageIPV : num [1:31] 20 45 35 27 45 23 38 36 29 30 ...
## $ banofsalepublicplaces: Factor w/ 2 levels "1","2": 1 1 1 1 1 2 1 1 1 1 ...
## $ minimumsaleprice : Factor w/ 2 levels "1","2": 1 1 2 2 1 1 1 1 1 2 ...
## $ policycontroldensity : Factor w/ 3 levels "0","1","2": 2 2 2 3 3 3 3 2 2 3 ...
## $ quotaforretailsale : Factor w/ 2 levels "1","2": 1 1 1 1 1 2 2 1 1 2 ...
## $ warninquality : Factor w/ 2 levels "1","2": 2 1 2 2 1 1 2 1 1 1 ...
## $ distributionsystem : Factor w/ 4 levels "1","2","3","4": 4 1 2 2 1 4 1 4 4 4 ...
## $ pointofsale : Factor w/ 3 levels "1","2","3": 2 2 2 2 1 2 2 2 2 2 ...
## $ CurrentAlcoholUse : num [1:31] 25.4 13.7 28 8.8 0.9 17.5 35.6 11.6 18.3 21.3 ...
## $ AlcoholDependence : num [1:31] 7.1 13.7 10.2 1.3 0.15 1.1 6.2 0.5 3.3 2.4 ...
## - attr(*, "spec")=
## .. cols(
## .. state = col_character(),
## .. SaleTimings = col_double(),
## .. MinLegalDrinkingAge = col_double(),
## .. PercapitaIncome = col_number(),
## .. HealthIndex = col_double(),
## .. PercentageBPL = col_double(),
## .. DrinkDrive = col_double(),
## .. SDI2017 = col_double(),
## .. PercentageIPV = col_double(),
## .. banofsalepublicplaces = col_double(),
## .. minimumsaleprice = col_double(),
## .. policycontroldensity = col_double(),
## .. quotaforretailsale = col_double(),
## .. warninquality = col_double(),
## .. distributionsystem = col_double(),
## .. pointofsale = col_double(),
## .. CurrentAlcoholUse = col_double(),
## .. AlcoholDependence = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(Alcohol_policy)
## state SaleTimings MinLegalDrinkingAge PercapitaIncome
## Andaman and Nic: 1 Min. : 0.00 Min. :18.00 Min. : 46292
## Andra pradesh : 1 1st Qu.:10.00 1st Qu.:21.00 1st Qu.:107360
## Arunachal Prade: 1 Median :12.00 Median :21.00 Median :190407
## Assam : 1 Mean :11.52 Mean :21.06 Mean :192662
## Bihar : 1 3rd Qu.:14.00 3rd Qu.:21.00 3rd Qu.:228250
## Chandigarh : 1 Max. :15.00 Max. :25.00 Max. :435959
## (Other) :25
## HealthIndex PercentageBPL DrinkDrive SDI2017
## Min. :28.61 Min. : 0.71 Min. : 1.0 Min. :0.4300
## 1st Qu.:43.30 1st Qu.: 9.80 1st Qu.: 20.0 1st Qu.:0.5350
## Median :51.90 Median :14.88 Median : 139.0 Median :0.5900
## Mean :50.67 Mean :18.79 Mean : 378.7 Mean :0.5848
## 3rd Qu.:59.70 3rd Qu.:31.80 3rd Qu.: 355.0 3rd Qu.:0.6400
## Max. :74.01 Max. :39.90 Max. :3595.0 Max. :0.7400
##
## PercentageIPV banofsalepublicplaces minimumsaleprice policycontroldensity
## Min. : 3.50 1:30 1:22 0: 1
## 1st Qu.:22.00 2: 1 2: 9 1:20
## Median :30.00 2:10
## Mean :28.92
## 3rd Qu.:36.00
## Max. :46.00
##
## quotaforretailsale warninquality distributionsystem pointofsale
## 1:26 1:22 1:10 1: 5
## 2: 5 2: 9 2: 4 2:22
## 3: 3 3: 4
## 4:14
##
##
##
## CurrentAlcoholUse AlcoholDependence
## Min. : 0.90 Min. : 0.150
## 1st Qu.: 8.85 1st Qu.: 1.000
## Median :16.40 Median : 2.400
## Mean :15.83 Mean : 3.776
## 3rd Qu.:21.45 3rd Qu.: 4.500
## Max. :35.60 Max. :14.200
##
#considering there is only one variable for licensing places of consumption and licensing places of hrs
# we have 8 continous variables and 7 categorical variables
Multiple linear regression is a generalization of simple linear regression, in the sense that this approach makes it possible to relate one variable with several variables through a linear function in its parameters.
Multiple linear regression is used to assess the relationship between two variables while taking into account the effect of other variables. By taking into account the effect of other variables, we cancel out the effect of these other variables in order to isolate and measure the relationship between the two variables of interest. This point is the main difference with simple linear regression A sample regression analysis is conducted below
library(ggplot2)
ggplot(Alcohol_policy) +
aes(x = Alcohol_policy$SDI2017, y = Alcohol_policy$AlcoholDependence, colour = Alcohol_policy$PercentageIPV, size = Alcohol_policy$PercentageBPL) +
geom_point() +
scale_color_gradient() +
labs(
y = "Alcohol Dependence",
x = "SDI",
color = "IPV",
size = "BPL"
) +
theme_minimal()
We conduct a multiple linear regression below for the alcohol
depedence
Dependence <- lm(Alcohol_policy$AlcoholDependence ~ Alcohol_policy$SaleTimings + Alcohol_policy$MinLegalDrinkingAge + Alcohol_policy$PercapitaIncome + Alcohol_policy$PercapitaIncome + Alcohol_policy$HealthIndex +Alcohol_policy$PercentageBPL + Alcohol_policy$banofsalepublicplaces +Alcohol_policy$DrinkDrive +Alcohol_policy$SDI2017+ Alcohol_policy$PercentageIPV +Alcohol_policy$minimumsaleprice + Alcohol_policy$policycontroldensity +Alcohol_policy$quotaforretailsale +Alcohol_policy$warninquality +Alcohol_policy$distributionsystem +Alcohol_policy$pointofsale,
data = Alcohol_policy
)
summary(Dependence)
##
## Call:
## lm(formula = Alcohol_policy$AlcoholDependence ~ Alcohol_policy$SaleTimings +
## Alcohol_policy$MinLegalDrinkingAge + Alcohol_policy$PercapitaIncome +
## Alcohol_policy$PercapitaIncome + Alcohol_policy$HealthIndex +
## Alcohol_policy$PercentageBPL + Alcohol_policy$banofsalepublicplaces +
## Alcohol_policy$DrinkDrive + Alcohol_policy$SDI2017 + Alcohol_policy$PercentageIPV +
## Alcohol_policy$minimumsaleprice + Alcohol_policy$policycontroldensity +
## Alcohol_policy$quotaforretailsale + Alcohol_policy$warninquality +
## Alcohol_policy$distributionsystem + Alcohol_policy$pointofsale,
## data = Alcohol_policy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8866 -0.9813 0.0000 1.3511 4.2172
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.843e+01 1.471e+01 1.253 0.23622
## Alcohol_policy$SaleTimings 5.482e-02 3.175e-01 0.173 0.86606
## Alcohol_policy$MinLegalDrinkingAge 2.901e-01 4.404e-01 0.659 0.52357
## Alcohol_policy$PercapitaIncome 2.602e-05 1.521e-05 1.711 0.11513
## Alcohol_policy$HealthIndex 2.874e-02 1.179e-01 0.244 0.81192
## Alcohol_policy$PercentageBPL -2.288e-01 1.031e-01 -2.219 0.04846 *
## Alcohol_policy$banofsalepublicplaces2 -1.218e+00 4.375e+00 -0.279 0.78579
## Alcohol_policy$DrinkDrive -1.650e-03 1.224e-03 -1.348 0.20472
## Alcohol_policy$SDI2017 -8.340e+01 3.016e+01 -2.765 0.01838 *
## Alcohol_policy$PercentageIPV 2.761e-01 8.364e-02 3.301 0.00706 **
## Alcohol_policy$minimumsaleprice2 8.471e-01 1.736e+00 0.488 0.63526
## Alcohol_policy$policycontroldensity1 1.321e+01 4.678e+00 2.823 0.01657 *
## Alcohol_policy$policycontroldensity2 5.123e+00 4.519e+00 1.134 0.28107
## Alcohol_policy$quotaforretailsale2 6.443e+00 2.873e+00 2.242 0.04650 *
## Alcohol_policy$warninquality2 6.538e-01 1.976e+00 0.331 0.74696
## Alcohol_policy$distributionsystem2 3.225e+00 3.114e+00 1.036 0.32261
## Alcohol_policy$distributionsystem3 1.666e+00 3.667e+00 0.454 0.65839
## Alcohol_policy$distributionsystem4 1.351e+00 2.803e+00 0.482 0.63916
## Alcohol_policy$pointofsale2 5.726e+00 2.301e+00 2.489 0.03010 *
## Alcohol_policy$pointofsale3 7.635e+00 4.025e+00 1.897 0.08437 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.4 on 11 degrees of freedom
## Multiple R-squared: 0.7408, Adjusted R-squared: 0.293
## F-statistic: 1.654 on 19 and 11 DF, p-value: 0.1973
We conduct a multiple linear regression below for the current alcohol use
Use_current <- lm(Alcohol_policy$CurrentAlcoholUse ~ Alcohol_policy$SaleTimings + Alcohol_policy$MinLegalDrinkingAge + Alcohol_policy$PercapitaIncome + Alcohol_policy$PercapitaIncome + Alcohol_policy$HealthIndex +Alcohol_policy$PercentageBPL + Alcohol_policy$banofsalepublicplaces +Alcohol_policy$DrinkDrive +Alcohol_policy$SDI2017+ Alcohol_policy$PercentageIPV +Alcohol_policy$minimumsaleprice + Alcohol_policy$policycontroldensity +Alcohol_policy$quotaforretailsale +Alcohol_policy$warninquality +Alcohol_policy$distributionsystem +Alcohol_policy$pointofsale,
data = Alcohol_policy
)
summary(Use_current)
##
## Call:
## lm(formula = Alcohol_policy$CurrentAlcoholUse ~ Alcohol_policy$SaleTimings +
## Alcohol_policy$MinLegalDrinkingAge + Alcohol_policy$PercapitaIncome +
## Alcohol_policy$PercapitaIncome + Alcohol_policy$HealthIndex +
## Alcohol_policy$PercentageBPL + Alcohol_policy$banofsalepublicplaces +
## Alcohol_policy$DrinkDrive + Alcohol_policy$SDI2017 + Alcohol_policy$PercentageIPV +
## Alcohol_policy$minimumsaleprice + Alcohol_policy$policycontroldensity +
## Alcohol_policy$quotaforretailsale + Alcohol_policy$warninquality +
## Alcohol_policy$distributionsystem + Alcohol_policy$pointofsale,
## data = Alcohol_policy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.781 -2.307 0.000 2.076 9.445
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.952e+01 3.307e+01 1.497 0.1624
## Alcohol_policy$SaleTimings 1.058e+00 7.137e-01 1.483 0.1661
## Alcohol_policy$MinLegalDrinkingAge 2.546e+00 9.899e-01 2.572 0.0260 *
## Alcohol_policy$PercapitaIncome 4.286e-05 3.419e-05 1.254 0.2359
## Alcohol_policy$HealthIndex -7.900e-01 2.650e-01 -2.981 0.0125 *
## Alcohol_policy$PercentageBPL -6.643e-01 2.318e-01 -2.867 0.0153 *
## Alcohol_policy$banofsalepublicplaces2 -7.656e+00 9.833e+00 -0.779 0.4526
## Alcohol_policy$DrinkDrive -3.405e-03 2.752e-03 -1.238 0.2417
## Alcohol_policy$SDI2017 -1.402e+02 6.779e+01 -2.069 0.0629 .
## Alcohol_policy$PercentageIPV 2.357e-01 1.880e-01 1.254 0.2360
## Alcohol_policy$minimumsaleprice2 1.392e-01 3.903e+00 0.036 0.9722
## Alcohol_policy$policycontroldensity1 7.951e+00 1.051e+01 0.756 0.4654
## Alcohol_policy$policycontroldensity2 -4.580e+00 1.016e+01 -0.451 0.6608
## Alcohol_policy$quotaforretailsale2 1.870e+01 6.458e+00 2.895 0.0146 *
## Alcohol_policy$warninquality2 1.055e+01 4.441e+00 2.377 0.0367 *
## Alcohol_policy$distributionsystem2 -6.972e-01 7.000e+00 -0.100 0.9225
## Alcohol_policy$distributionsystem3 5.155e+00 8.241e+00 0.626 0.5444
## Alcohol_policy$distributionsystem4 -6.522e+00 6.299e+00 -1.035 0.3228
## Alcohol_policy$pointofsale2 1.698e+01 5.171e+00 3.283 0.0073 **
## Alcohol_policy$pointofsale3 1.701e+01 9.046e+00 1.880 0.0868 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.642 on 11 degrees of freedom
## Multiple R-squared: 0.751, Adjusted R-squared: 0.3209
## F-statistic: 1.746 on 19 and 11 DF, p-value: 0.1724
Based on the above analysis, there is a 1. significant positive relationship between Plot the model below - Minimum legal age of drinking and current alcohol use Others to be related 2. Significant negative association between Health index and alcohol use. States with higher health index has lower proportion of alcohol use For one unit increase in health index there is -7.900e-01 (e to be considered while interpretation) units decrease in prevalence of alcohol use.
Other to be related for both current use and dependence
library(visreg)
library(ggstatsplot)
## You can cite this package as:
## Patil, I. (2021). Visualizations with statistical details: The 'ggstatsplot' approach.
## Journal of Open Source Software, 6(61), 3167, doi:10.21105/joss.03167