library(purrr)
data_path <- "C:/Users/shanata/Downloads/smoking_driking_dataset_Ver01.csv"
Here I have created a new column for hypertension. If a patient has Systolic Blood pressure greater than 140 or Diastolic pressure greater than 90 he/she is considered to be hypertensive.
data$Hypertension[data$SBP > 140] <- "YES"
data$Hypertension[data$SBP < 140] <- "NO"
data$Hypertension[data$DBP > 90] <-"YES"
data$Hypertension[data$SBP < 90] <- "NO"
After creating the hypertensive column, I will create separate columns for hypotensive patients.If a patient has Systolic Blood pressure less than 90 or Diastolic pressure less than 60 he/she is considered to be hypotensive.
data$Hypotension[data$SBP <90 ] <- "YES"
data$Hypotension[data$SBP > 90] <- "NO"
data$Hypotension[data$DBP < 60] <-"YES"
data$Hypotension[data$DBP > 60] <- "NO"
Now we have separate columns for hypertensive and hypotensive patients. Now let’s find out, how many people with drinking habits have hyper and hypo tension.
summary(data$DRK_YN)
## Length Class Mode
## 991346 character character
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Here, I have grouped people based on their drinking habits and their blood pressure levels. I have stripped the null values before computing.
f1 <- data %>%
group_by(DRK_YN) %>%
summarize(Hypertension = sum(Hypertension == "YES", na.rm = TRUE),
#Normal_sytolic = sum(Hypertension == "NO", na.rm = TRUE),
#Normal_diastolic=sum(Hypotension=="NO",na.rm=TRUE),
Hypotension = sum(Hypotension == "YES", na.rm = TRUE)
)
f1
## # A tibble: 2 × 3
## DRK_YN Hypertension Hypotension
## <chr> <int> <int>
## 1 N 49005 15537
## 2 Y 50988 10242
count_yes <- sum(data$DRK_YN == "Y")
count_no <- sum(data$DRK_YN == "N")
# Display the counts
cat("Count of YES:", count_yes, "\n")
## Count of YES: 495488
cat("Count of NO:", count_no, "\n")
## Count of NO: 495858
Now, I have the total number of alcoholic drinkers and non-alcoholic drinkers, I also have the Hypertensive and Hypotensive people among the drinkers and non drinkers. Now, it is time to calculate the expected probability of each case.
total_drinkers <- 495488
total_non_drinkers <- 495858
total_hypertensive_non_drinkers <- 49005
total_hypotensive_non_drinkers <- 15537
total_hypertensive_drinkers <- 50988
total_hypotensive_drinkers <- 10242
# Calculate the probabilities
p_drinker <- total_drinkers / (total_drinkers + total_non_drinkers)
p_hypertensive_given_drinker <- total_hypertensive_drinkers / total_drinkers
p_hypotensive_given_drinker <- total_hypotensive_drinkers / total_drinkers
p_non_drinker <- total_non_drinkers / (total_drinkers + total_non_drinkers)
p_hypertensive_given_non_drinker <- total_hypertensive_non_drinkers / total_non_drinkers
p_hypotensive_given_non_drinker <- total_hypotensive_non_drinkers / total_non_drinkers
# Calculate the expected probability
expected_probability_dh <- p_drinker * p_hypertensive_given_drinker
expected_probability_di <- p_drinker * p_hypotensive_given_drinker
expected_probability_dj <- p_non_drinker * p_hypertensive_given_non_drinker
expected_probability_dk <- p_non_drinker * p_hypotensive_given_non_drinker
# Display the result
cat("Expected Probability of Drinkers who are Hypertensive:", expected_probability_dh, "\n")
## Expected Probability of Drinkers who are Hypertensive: 0.0514331
cat("Expected Probability of Drinkers who are Hypotensive:", expected_probability_di, "\n")
## Expected Probability of Drinkers who are Hypotensive: 0.01033141
cat("Expected Probability of Non Drinkers who are Hypertensive:", expected_probability_dj, "\n")
## Expected Probability of Non Drinkers who are Hypertensive: 0.04943279
cat("Expected Probability of NonDrinkers who are Hypotensive:", expected_probability_dk, "\n")
## Expected Probability of NonDrinkers who are Hypotensive: 0.01567263
The probability of being a drinker and hypertensive is equal to the probability of being a drinker and hypotensive.
The probability of being a drinker and hypertensive is not equal to the probability of being a drinker and hypotensive.
To test the hypothesis, I have used chi-squared test.
contingency_table <- matrix(c(total_hypertensive_drinkers,total_hypertensive_non_drinkers,total_hypotensive_drinkers,total_hypotensive_non_drinkers), nrow = 2)
# Perform a chi-squared test
chi_squared_test_result <- chisq.test(contingency_table)
# Display the test result
print(chi_squared_test_result)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: contingency_table
## X-squared = 1040, df = 1, p-value < 2.2e-16
library(ggplot2)
v1 <- data.frame(
Drinking_Status = c("Drinkers", "Drinkers", "Non Drinkers", "Non Drinkers"),
Tension_Status = c("Hypertensive", "Hypotensive", "Hypertensive", "Hypotensive"),
Expected_Probability = c(0.0514331, 0.01033141, 0.04943279, 0.01567263)
)
ggplot(v1, aes(x = Drinking_Status, y = Expected_Probability, fill = Tension_Status)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.7) +
labs(
title = "Expected Probabilities of Tension by Drinking Status",
x = "Drinking Status",
y = "Expected Probability"
)
Are there any other confounding variables or interactions that may affect the relationship between drinking status and hypertension/hypotension ??
Grouping data based on smokers and non-smokers. Here 1 indicate people who don’t smoke, 2 indicate people who used to smoke but quit smoking now and 3 indicate people who are active smokers.
f <- data %>%
group_by(SMK_stat_type_cd) %>%
summarise(Hypertension = sum(Hypertension == "YES", na.rm = TRUE),
#Normal_sytolic = sum(Hypertension == "NO", na.rm = TRUE),
#Normal_diastolic=sum(Hypotension=="NO",na.rm=TRUE),
Hypotension = sum(Hypotension == "YES", na.rm = TRUE)
)
f
## # A tibble: 3 × 3
## SMK_stat_type_cd Hypertension Hypotension
## <dbl> <int> <int>
## 1 1 56357 19900
## 2 2 22044 2541
## 3 3 21592 3338
Count of people in each category :
count_1 <- sum(data$SMK_stat_type_cd == 1)
count_2 <- sum(data$SMK_stat_type_cd == 2)
count_3 <- sum(data$SMK_stat_type_cd == 3)
# Display the counts
cat("Count of people who don't smoke:", count_1, "\n")
## Count of people who don't smoke: 602441
cat("Count of people who used to smoke but quit:", count_2, "\n")
## Count of people who used to smoke but quit: 174951
cat("Count of people who used to still smoke:", count_3, "\n")
## Count of people who used to still smoke: 213954
total_non_smokers <- 602441
#total_smokers_but_quit <- 174951
total_smokers<- 213954
total_hypertensive_non_smokers <- 56357
total_hypotensive_non_smokers <- 19900
#total_hypertensive_used_to_smoke <- 22044
#total_hypotensive_used_to_smoke <- 2541
total_hypertensive_smokers <- 21592
total_hypotensive_smokers <- 3338
# Calculate the probabilities
p_smoker <- total_smokers/ (total_non_smokers + total_smokers )
p_hypertensive_given_smoker <- total_hypertensive_smokers / total_smokers
p_hypotensive_given_smoker <- total_hypotensive_smokers / total_smokers
p_non_smokers <- total_non_smokers / (total_smokers + total_non_smokers)
p_hypertensive_given_non_smoker <- total_hypertensive_non_smokers / total_non_smokers
p_hypotensive_given_non_smoker <- total_hypotensive_non_smokers / total_non_smokers
# Calculate the expected probability for drinkers who are hypersensitive
expected_probability_dh <- p_smoker * p_hypertensive_given_smoker
expected_probability_di <- p_smoker * p_hypotensive_given_smoker
expected_probability_dj <- p_non_smokers * p_hypertensive_given_non_smoker
expected_probability_dk <- p_non_smokers * p_hypotensive_given_non_smoker
# Display the result
cat("Expected Probability of Smokers who are Hypertensive:", expected_probability_dh, "\n")
## Expected Probability of Smokers who are Hypertensive: 0.02644798
cat("Expected Probability of Smokers who are Hypotensive:", expected_probability_di, "\n")
## Expected Probability of Smokers who are Hypotensive: 0.004088707
cat("Expected Probability of Non Smokers who are Hypertensive:", expected_probability_dj, "\n")
## Expected Probability of Non Smokers who are Hypertensive: 0.06903153
cat("Expected Probability of Non Smokers who are Hypotensive:", expected_probability_dk, "\n")
## Expected Probability of Non Smokers who are Hypotensive: 0.02437546
The probability of being a smoker and hypertensive is equal to the probability of being a smoker and hypotensive.
The probability of being a smoker and hypertensive is not equal to the probability of being a smoker and hypotensive.
contingency_table <- matrix(c(21592,56357,3338,19900), nrow = 2)
# Perform a chi-squared test
chi_squared_test_result <- chisq.test(contingency_table)
# Display the test result
print(chi_squared_test_result)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: contingency_table
## X-squared = 1713.9, df = 1, p-value < 2.2e-16
v2 <- data.frame(
Smoking_Status = c("Smokers", "Smokers", "Non Smokers", "Non Smokers"),
Tension_Status = c("Hypertensive", "Hypotensive", "Hypertensive", "Hypotensive"),
Expected_Probability = c(0.02644798, 0.004088707 , 0.06903153, 0.02437546)
)
ggplot(v2, aes(x = Smoking_Status, y = Expected_Probability, fill = Tension_Status)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.7) +
labs(
title = "Expected Probabilities of Tension by Smoking Status",
x = "Smoking Status",
y = "Expected Probability"
)
### Group-by dataframe 3
f3 <- data %>%
group_by(sex) %>%
summarise(Hypertension = sum(Hypertension == "YES", na.rm = TRUE),
#Normal_sytolic = sum(Hypertension == "NO", na.rm = TRUE),
#Normal_diastolic=sum(Hypotension=="NO",na.rm=TRUE),
Hypotension = sum(Hypotension == "YES", na.rm = TRUE)
)
f3
## # A tibble: 2 × 3
## sex Hypertension Hypotension
## <chr> <int> <int>
## 1 Female 39746 18804
## 2 Male 60247 6975
count_1 <- sum(data$sex=="Female")
count_2 <- sum(data$sex=="Male")
# Display the counts
cat("Count of Female:", count_1, "\n")
## Count of Female: 464931
cat("Count of Male:", count_2, "\n")
## Count of Male: 526415
total_female <- 464931
#total_smokers_but_quit <- 174951
total_male<- 526415
total_hypertensive_male <- 60247
total_hypotensive_male<- 6975
#total_hypertensive_used_to_smoke <- 22044
#total_hypotensive_used_to_smoke <- 2541
total_hypertensive_female <- 39746
total_hypotensive_female <- 18804
# Calculate the probabilities
p_female <- total_female/ (total_female + total_male )
p_hypertensive_given_female <- total_hypertensive_female / total_female
p_hypotensive_given_female <- total_hypotensive_female / total_female
p_male <- total_male / (total_male + total_female)
p_hypertensive_given_male <- total_hypertensive_male / total_male
p_hypotensive_given_male <- total_hypotensive_male / total_male
# Calculate the expected probability for drinkers who are hypersensitive
expected_probability_dh <- p_male * p_hypertensive_given_male
expected_probability_di <- p_male * p_hypotensive_given_male
expected_probability_dj <- p_female * p_hypertensive_given_female
expected_probability_dk <- p_female * p_hypotensive_given_female
# Display the result
cat("Expected Probability of Male who are Hypertensive:", expected_probability_dh, "\n")
## Expected Probability of Male who are Hypertensive: 0.06077293
cat("Expected Probability of Male who are Hypotensive:", expected_probability_di, "\n")
## Expected Probability of Male who are Hypotensive: 0.007035889
cat("Expected Probability of Female who are Hypertensive:", expected_probability_dj, "\n")
## Expected Probability of Female who are Hypertensive: 0.04009296
cat("Expected Probability of Female who are Hypotensive:", expected_probability_dk, "\n")
## Expected Probability of Female who are Hypotensive: 0.01896815
The probability of being a male and hypertensive is equal to the probability of being a male and hypotensive.
The probability of being a male and hypertensive is not equal to the probability of being a male and hypotensive.
contingency_table <- matrix(c(60247,39746,6975,18804), nrow = 2)
# Perform a chi-squared test
chi_squared_test_result <- chisq.test(contingency_table)
# Display the test result
print(chi_squared_test_result)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: contingency_table
## X-squared = 9075, df = 1, p-value < 2.2e-16
Males seem to have a higher expected probability of both hypertension and hypotension compared to females.
v3 <- data.frame(
Gender = c("Male", "Male", "Female", "Female"),
Tension_Status = c("Hypertensive", "Hypotensive", "Hypertensive", "Hypotensive"),
Expected_Probability = c(0.06077293 ,0.007035889,0.04009296,0.01896815)
)
ggplot(v3, aes(x = Gender, y = Expected_Probability, fill = Tension_Status)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.9), width = 0.7) +
labs(
title = "Expected Probabilities of Tension by Gender",
x = "Gender",
y = "Expected Probability"
)
cross_table <- table(data$sex, data$DRK_YN, data$Hypertension,data$Hypotension,data$SMK_stat_type_cd)
cross_table_df <- as.data.frame(cross_table)
cross_table_sum <- rowSums(cross_table)
cross_table_normalized <- scale(cross_table_sum)
cross_table_df
## Var1 Var2 Var3 Var4 Var5 Freq
## 1 Female N NO NO 1 262236
## 2 Male N NO NO 1 64507
## 3 Female Y NO NO 1 109298
## 4 Male Y NO NO 1 75682
## 5 Female N YES NO 1 30686
## 6 Male N YES NO 1 8195
## 7 Female Y YES NO 1 7207
## 8 Male Y YES NO 1 10172
## 9 Female N NO YES 1 11645
## 10 Male N NO YES 1 1301
## 11 Female Y NO YES 1 5700
## 12 Male Y NO YES 1 1148
## 13 Female N YES YES 1 70
## 14 Male N YES YES 1 17
## 15 Female Y YES YES 1 5
## 16 Male Y YES YES 1 5
## 17 Female N NO NO 2 3762
## 18 Male N NO NO 2 42140
## 19 Female Y NO NO 2 5728
## 20 Male Y NO NO 2 94889
## 21 Female N YES NO 2 321
## 22 Male N YES NO 2 5848
## 23 Female Y YES NO 2 362
## 24 Male Y YES NO 2 15496
## 25 Female N NO YES 2 252
## 26 Male N NO YES 2 866
## 27 Female Y NO YES 2 307
## 28 Male Y NO YES 2 1097
## 29 Female N YES YES 2 0
## 30 Male N YES YES 2 9
## 31 Female Y YES YES 2 0
## 32 Male Y YES YES 2 8
## 33 Female N NO NO 3 5237
## 34 Male N NO NO 3 40947
## 35 Female Y NO NO 3 8795
## 36 Male Y NO NO 3 130116
## 37 Female N YES NO 3 395
## 38 Male N YES NO 3 3458
## 39 Female Y YES NO 3 699
## 40 Male Y YES NO 3 17029
## 41 Female N NO YES 3 386
## 42 Male N NO YES 3 974
## 43 Female Y NO YES 3 430
## 44 Male Y NO YES 3 1536
## 45 Female N YES YES 3 0
## 46 Male N YES YES 3 6
## 47 Female Y YES YES 3 1
## 48 Male Y YES YES 3 4
I have created a Cross-table with Sex, Drinking State, Smoking State, Hypertension and Hypotension as variables. We can note that,
This may be due to :
Rare occurrences: Females who don’t drink and smoke but have both hypertension and hypo tension is a rare occurrence.
The data which I have used may be not large enough to cover all cases.
The data I have used, have no female who quit smoking.
1)Females: Drinking, smoking, Hypertension, Hypotension 2)Male : Drinking, smoking, Hypertension , Hypotension 3)Females: Drinking, no smoking, Hypertensive, Hypotensive