##Introduction to Data In this assignment, the data set named “How Baby Boomers Get High” from “https://github.com/fivethirtyeight/data/blob/master/drug-use-by-age/drug-use-by-age.csv”. This data frame includes 13 different drugs over 17 groups of ages. Depending on the 28 variables, the potential diseases and their consequential impacts can be predicted or analysed.
# Load the required libraries
library(readr)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(RCurl)
# Define the URL to the raw CSV file on GitHub
x <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/drug-use-by-age/drug-use-by-age.csv"
# Use getURL() to fetch the data from the URL
drug_use_by_age <- getURL(x)
# Use read_csv() to parse the data_text as CSV and store it in a data frame
drug_use_by_age <- read_csv(drug_use_by_age)
## Rows: 17 Columns: 28
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): age, cocaine_frequency, crack_frequency, heroin_frequency, inhalan...
## dbl (21): n, alcohol_use, alcohol_frequency, marijuana_use, marijuana_freque...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## age n alcohol_use
## 0 0 0
## alcohol_frequency marijuana_use marijuana_frequency
## 0 0 0
## cocaine_use cocaine_frequency crack_use
## 0 0 0
## crack_frequency heroin_use heroin_frequency
## 0 0 0
## hallucinogen_use hallucinogen_frequency inhalant_use
## 0 0 0
## inhalant_frequency pain_releiver_use pain_releiver_frequency
## 0 0 0
## oxycontin_use oxycontin_frequency tranquilizer_use
## 0 0 0
## tranquilizer_frequency stimulant_use stimulant_frequency
## 0 0 0
## meth_use meth_frequency sedative_use
## 0 0 0
## sedative_frequency
## 0
## numeric(0)
# Box plot to visualize outliers
boxplot(drug_use_by_age$marijuana_use )
# Create a bar plot
ggplot(data = drug_use_by_age, aes(x = as.factor(age), y = marijuana_use)) +
geom_bar(stat = "identity", fill = "grey") +
labs(
title = "Marijuana Use Over Ages",
x = "Ages",
y = "Marijuana Use"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
It is found that there is no missing value and outliers that means data is cleaned and good to make analysis. The bar plot illustrates the significant increases in the marijuana use over ages group is between 18 and 21 years. This is an indicatives of social norms or access fo marijuana use over this population. If the particular age group shows concerning trends in marijuana use, it might need to make prevention strategies and public education for their healh care development.