Introduction

In this blog, I will document my process of exploring the association between smoking levels and nicotine dependence using randomly generated data.

Step 1: Generate Random Data

Let’s create some random data for illustration purposes.

# Set a seed for reproducibility
set.seed(123)

# Generate random data
nesarc_data <- data.frame(
  nicotine_dependence = rnorm(100, mean = 50, sd = 10),
  smoking_quantity = rnorm(100, mean = 20, sd = 5),
  smoking_frequency = runif(100, min = 0, max = 1),
  gad_symptoms = rnorm(100, mean = 5, sd = 2),
  panic_symptoms = rnorm(100, mean = 3, sd = 1)
)

# Display the first few rows of the generated data
head(nesarc_data)
##   nicotine_dependence smoking_quantity smoking_frequency gad_symptoms
## 1            44.39524         16.44797         0.9860543     4.248794
## 2            47.69823         21.28442         0.1370675     3.876247
## 3            65.58708         18.76654         0.9053096     4.312166
## 4            50.70508         18.26229         0.5763018     5.180993
## 5            51.29288         15.24191         0.3954489     8.197018
## 6            67.15065         19.77486         0.4498025     4.822870
##   panic_symptoms
## 1       4.014943
## 2       1.007252
## 3       2.572721
## 4       3.116637
## 5       2.106792
## 6       3.333903

Step 2: Identify a Specific Topic of Interest My primary focus is on understanding the relationship between smoking levels and nicotine dependence.

Step 3: Personal Codebook I have created a personal codebook with the relevant variables for my research question. Here are the selected variables:

Nicotine dependence (dependent variable) Smoking quantity Smoking frequency

# Code chunk to show the structure of the dataset
str(nesarc_data[, c("nicotine_dependence", "smoking_quantity", "smoking_frequency")])
## 'data.frame':    100 obs. of  3 variables:
##  $ nicotine_dependence: num  44.4 47.7 65.6 50.7 51.3 ...
##  $ smoking_quantity   : num  16.4 21.3 18.8 18.3 15.2 ...
##  $ smoking_frequency  : num  0.986 0.137 0.905 0.576 0.395 ...

Step 4: Identify a Second Topic During a second review, I found another interesting topic related to mental health. I want to explore the association between nicotine dependence and symptoms of anxiety.

Step 5: Expand Personal Codebook I’ve added variables related to symptoms of anxiety to my personal codebook:

Generalized Anxiety Disorder (GAD) symptoms Panic disorder symptoms

# Code chunk to show the structure of the expanded dataset
str(nesarc_data[, c("nicotine_dependence", "smoking_quantity", "smoking_frequency", "gad_symptoms", "panic_symptoms")])
## 'data.frame':    100 obs. of  5 variables:
##  $ nicotine_dependence: num  44.4 47.7 65.6 50.7 51.3 ...
##  $ smoking_quantity   : num  16.4 21.3 18.8 18.3 15.2 ...
##  $ smoking_frequency  : num  0.986 0.137 0.905 0.576 0.395 ...
##  $ gad_symptoms       : num  4.25 3.88 4.31 5.18 8.2 ...
##  $ panic_symptoms     : num  4.01 1.01 2.57 3.12 2.11 ...
# Code chunk to show the structure of the expanded dataset
str(nesarc_data[, c("nicotine_dependence", "smoking_quantity", "smoking_frequency", "gad_symptoms", "panic_symptoms")])
## 'data.frame':    100 obs. of  5 variables:
##  $ nicotine_dependence: num  44.4 47.7 65.6 50.7 51.3 ...
##  $ smoking_quantity   : num  16.4 21.3 18.8 18.3 15.2 ...
##  $ smoking_frequency  : num  0.986 0.137 0.905 0.576 0.395 ...
##  $ gad_symptoms       : num  4.25 3.88 4.31 5.18 8.2 ...
##  $ panic_symptoms     : num  4.01 1.01 2.57 3.12 2.11 ...

Step 6: Visualizing Smoking Levels Let’s create a histogram to visualize the distribution of smoking levels.

# Code chunk to create a histogram of smoking quantity
hist(nesarc_data$smoking_quantity, main = "Distribution of Smoking Quantity", xlab = "Smoking Quantity", col = "skyblue", border = "black")

Step 7: Visualizing Relationships Now, let’s create scatter plots to explore relationships between variables.

# Code chunk to create a scatter plot between smoking quantity and nicotine dependence
plot(nesarc_data$smoking_quantity, nesarc_data$nicotine_dependence, main = "Scatter Plot: Smoking Quantity vs Nicotine Dependence", xlab = "Smoking Quantity", ylab = "Nicotine Dependence", col = "darkgreen")

# Code chunk to create a scatter plot between smoking frequency and nicotine dependence
plot(nesarc_data$smoking_frequency, nesarc_data$nicotine_dependence, main = "Scatter Plot: Smoking Frequency vs Nicotine Dependence", xlab = "Smoking Frequency", ylab = "Nicotine Dependence", col = "darkblue")