In this blog, I will document my process of exploring the association between smoking levels and nicotine dependence using randomly generated data.
Let’s create some random data for illustration purposes.
# Set a seed for reproducibility
set.seed(123)
# Generate random data
nesarc_data <- data.frame(
nicotine_dependence = rnorm(100, mean = 50, sd = 10),
smoking_quantity = rnorm(100, mean = 20, sd = 5),
smoking_frequency = runif(100, min = 0, max = 1),
gad_symptoms = rnorm(100, mean = 5, sd = 2),
panic_symptoms = rnorm(100, mean = 3, sd = 1)
)
# Display the first few rows of the generated data
head(nesarc_data)
## nicotine_dependence smoking_quantity smoking_frequency gad_symptoms
## 1 44.39524 16.44797 0.9860543 4.248794
## 2 47.69823 21.28442 0.1370675 3.876247
## 3 65.58708 18.76654 0.9053096 4.312166
## 4 50.70508 18.26229 0.5763018 5.180993
## 5 51.29288 15.24191 0.3954489 8.197018
## 6 67.15065 19.77486 0.4498025 4.822870
## panic_symptoms
## 1 4.014943
## 2 1.007252
## 3 2.572721
## 4 3.116637
## 5 2.106792
## 6 3.333903
Step 2: Identify a Specific Topic of Interest My primary focus is on understanding the relationship between smoking levels and nicotine dependence.
Step 3: Personal Codebook I have created a personal codebook with the relevant variables for my research question. Here are the selected variables:
Nicotine dependence (dependent variable) Smoking quantity Smoking frequency
# Code chunk to show the structure of the dataset
str(nesarc_data[, c("nicotine_dependence", "smoking_quantity", "smoking_frequency")])
## 'data.frame': 100 obs. of 3 variables:
## $ nicotine_dependence: num 44.4 47.7 65.6 50.7 51.3 ...
## $ smoking_quantity : num 16.4 21.3 18.8 18.3 15.2 ...
## $ smoking_frequency : num 0.986 0.137 0.905 0.576 0.395 ...
Step 4: Identify a Second Topic During a second review, I found another interesting topic related to mental health. I want to explore the association between nicotine dependence and symptoms of anxiety.
Step 5: Expand Personal Codebook I’ve added variables related to symptoms of anxiety to my personal codebook:
Generalized Anxiety Disorder (GAD) symptoms Panic disorder symptoms
# Code chunk to show the structure of the expanded dataset
str(nesarc_data[, c("nicotine_dependence", "smoking_quantity", "smoking_frequency", "gad_symptoms", "panic_symptoms")])
## 'data.frame': 100 obs. of 5 variables:
## $ nicotine_dependence: num 44.4 47.7 65.6 50.7 51.3 ...
## $ smoking_quantity : num 16.4 21.3 18.8 18.3 15.2 ...
## $ smoking_frequency : num 0.986 0.137 0.905 0.576 0.395 ...
## $ gad_symptoms : num 4.25 3.88 4.31 5.18 8.2 ...
## $ panic_symptoms : num 4.01 1.01 2.57 3.12 2.11 ...
# Code chunk to show the structure of the expanded dataset
str(nesarc_data[, c("nicotine_dependence", "smoking_quantity", "smoking_frequency", "gad_symptoms", "panic_symptoms")])
## 'data.frame': 100 obs. of 5 variables:
## $ nicotine_dependence: num 44.4 47.7 65.6 50.7 51.3 ...
## $ smoking_quantity : num 16.4 21.3 18.8 18.3 15.2 ...
## $ smoking_frequency : num 0.986 0.137 0.905 0.576 0.395 ...
## $ gad_symptoms : num 4.25 3.88 4.31 5.18 8.2 ...
## $ panic_symptoms : num 4.01 1.01 2.57 3.12 2.11 ...
Step 6: Visualizing Smoking Levels Let’s create a histogram to visualize the distribution of smoking levels.
# Code chunk to create a histogram of smoking quantity
hist(nesarc_data$smoking_quantity, main = "Distribution of Smoking Quantity", xlab = "Smoking Quantity", col = "skyblue", border = "black")
Step 7: Visualizing Relationships Now, let’s create scatter plots to explore relationships between variables.
# Code chunk to create a scatter plot between smoking quantity and nicotine dependence
plot(nesarc_data$smoking_quantity, nesarc_data$nicotine_dependence, main = "Scatter Plot: Smoking Quantity vs Nicotine Dependence", xlab = "Smoking Quantity", ylab = "Nicotine Dependence", col = "darkgreen")
# Code chunk to create a scatter plot between smoking frequency and nicotine dependence
plot(nesarc_data$smoking_frequency, nesarc_data$nicotine_dependence, main = "Scatter Plot: Smoking Frequency vs Nicotine Dependence", xlab = "Smoking Frequency", ylab = "Nicotine Dependence", col = "darkblue")