Introduction:

This data was suggested by Michael D’acampora in our week-5 discussion on untidy data. I selected this one as one of the three dataset that we needed for project 2. This data contains various types of drug, percentage of uses, and the frequency of drug uses in different age groups for 12 months .

The data was downloaded as a CSV file, which then needed to be made tidy from its raw form in order to do analysis in R. The analysis was done to see if the data can provide insight on drug use and the frequency of use in terms of age and drug types.

Load necessary libraries:

library(stringr)
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

load data:

drugDataRaw <- read.csv("C:\\Temp\\druguse.csv", sep = ",", stringsAsFactors = FALSE, 
    fill = TRUE)
str(drugDataRaw)
## 'data.frame':    17 obs. of  28 variables:
##  $ age                    : chr  "12" "13" "14" "15" ...
##  $ n                      : int  2798 2757 2792 2956 3058 3038 2469 2223 2271 2354 ...
##  $ alcohol.use            : num  3.9 8.5 18.1 29.2 40.1 49.3 58.7 64.6 69.7 83.2 ...
##  $ alcohol.frequency      : num  3 6 5 6 10 13 24 36 48 52 ...
##  $ marijuana.use          : num  1.1 3.4 8.7 14.5 22.5 28 33.7 33.4 34 33 ...
##  $ marijuana.frequency    : num  4 15 24 25 30 36 52 60 60 52 ...
##  $ cocaine.use            : num  0.1 0.1 0.1 0.5 1 2 3.2 4.1 4.9 4.8 ...
##  $ cocaine.frequency      : chr  "5.0" "1.0" "5.5" "4.0" ...
##  $ crack.use              : num  0 0 0 0.1 0 0.1 0.4 0.5 0.6 0.5 ...
##  $ crack.frequency        : chr  "-" "3.0" "-" "9.5" ...
##  $ heroin.use             : num  0.1 0 0.1 0.2 0.1 0.1 0.4 0.5 0.9 0.6 ...
##  $ heroin.frequency       : chr  "35.5" "-" "2.0" "1.0" ...
##  $ hallucinogen.use       : num  0.2 0.6 1.6 2.1 3.4 4.8 7 8.6 7.4 6.3 ...
##  $ hallucinogen.frequency : num  52 6 3 4 3 3 4 3 2 4 ...
##  $ inhalant.use           : num  1.6 2.5 2.6 2.5 3 2 1.8 1.4 1.5 1.4 ...
##  $ inhalant.frequency     : chr  "19.0" "12.0" "5.0" "5.5" ...
##  $ pain.releiver.use      : num  2 2.4 3.9 5.5 6.2 8.5 9.2 9.4 10 9 ...
##  $ pain.releiver.frequency: num  36 14 12 10 7 9 12 12 10 15 ...
##  $ oxycontin.use          : num  0.1 0.1 0.4 0.8 1.1 1.4 1.7 1.5 1.7 1.3 ...
##  $ oxycontin.frequency    : chr  "24.5" "41.0" "4.5" "3.0" ...
##  $ tranquilizer.use       : num  0.2 0.3 0.9 2 2.4 3.5 4.9 4.2 5.4 3.9 ...
##  $ tranquilizer.frequency : num  52 25.5 5 4.5 11 7 12 4.5 10 7 ...
##  $ stimulant.use          : num  0.2 0.3 0.8 1.5 1.8 2.8 3 3.3 4 4.1 ...
##  $ stimulant.frequency    : num  2 4 12 6 9.5 9 8 6 12 10 ...
##  $ meth.use               : num  0 0.1 0.1 0.3 0.3 0.6 0.5 0.4 0.9 0.6 ...
##  $ meth.frequency         : chr  "-" "5.0" "24.0" "10.5" ...
##  $ sedative.use           : num  0.2 0.1 0.2 0.4 0.2 0.5 0.4 0.3 0.5 0.3 ...
##  $ sedative.frequency     : num  13 19 16.5 30 3 6.5 10 6 4 9 ...

All the types of drug and their frequency could be arranged under two columns which would change the format of the dataset from wide to long. The types of drug and frequencies were originally arranged in alternative column names, so those columns were rearranged so that all the types and all frequencies could be rffered easily in gather function, since both frequencies and drug types needed to be reorganized two datasets were created for easy untidy operations

drugDataRaw <- mutate(drugDataRaw, age_group = ifelse(age %in% (12:21), 
    "12-21", age))
col_order_use <- grep("use", colnames(drugDataRaw))
col_order_frequency <- grep("frequency", colnames(drugDataRaw))

select(drugDataRaw, col_order_use)
##    alcohol.use marijuana.use cocaine.use crack.use heroin.use
## 1          3.9           1.1         0.1       0.0        0.1
## 2          8.5           3.4         0.1       0.0        0.0
## 3         18.1           8.7         0.1       0.0        0.1
## 4         29.2          14.5         0.5       0.1        0.2
## 5         40.1          22.5         1.0       0.0        0.1
## 6         49.3          28.0         2.0       0.1        0.1
## 7         58.7          33.7         3.2       0.4        0.4
## 8         64.6          33.4         4.1       0.5        0.5
## 9         69.7          34.0         4.9       0.6        0.9
## 10        83.2          33.0         4.8       0.5        0.6
## 11        84.2          28.4         4.5       0.5        1.1
## 12        83.1          24.9         4.0       0.5        0.7
## 13        80.7          20.8         3.2       0.4        0.6
## 14        77.5          16.4         2.1       0.5        0.4
## 15        75.0          10.4         1.5       0.5        0.1
## 16        67.2           7.3         0.9       0.4        0.1
## 17        49.3           1.2         0.0       0.0        0.0
##    hallucinogen.use inhalant.use pain.releiver.use oxycontin.use
## 1               0.2          1.6               2.0           0.1
## 2               0.6          2.5               2.4           0.1
## 3               1.6          2.6               3.9           0.4
## 4               2.1          2.5               5.5           0.8
## 5               3.4          3.0               6.2           1.1
## 6               4.8          2.0               8.5           1.4
## 7               7.0          1.8               9.2           1.7
## 8               8.6          1.4               9.4           1.5
## 9               7.4          1.5              10.0           1.7
## 10              6.3          1.4               9.0           1.3
## 11              5.2          1.0              10.0           1.7
## 12              4.5          0.8               9.0           1.3
## 13              3.2          0.6               8.3           1.2
## 14              1.8          0.4               5.9           0.9
## 15              0.6          0.3               4.2           0.3
## 16              0.3          0.2               2.5           0.4
## 17              0.1          0.0               0.6           0.0
##    tranquilizer.use stimulant.use meth.use sedative.use
## 1               0.2           0.2      0.0          0.2
## 2               0.3           0.3      0.1          0.1
## 3               0.9           0.8      0.1          0.2
## 4               2.0           1.5      0.3          0.4
## 5               2.4           1.8      0.3          0.2
## 6               3.5           2.8      0.6          0.5
## 7               4.9           3.0      0.5          0.4
## 8               4.2           3.3      0.4          0.3
## 9               5.4           4.0      0.9          0.5
## 10              3.9           4.1      0.6          0.3
## 11              4.4           3.6      0.6          0.2
## 12              4.3           2.6      0.7          0.2
## 13              4.2           2.3      0.6          0.4
## 14              3.6           1.4      0.4          0.4
## 15              1.9           0.6      0.2          0.3
## 16              1.4           0.3      0.2          0.2
## 17              0.2           0.0      0.0          0.0
select(drugDataRaw, col_order_frequency)
##    alcohol.frequency marijuana.frequency cocaine.frequency crack.frequency
## 1                  3                   4               5.0               -
## 2                  6                  15               1.0             3.0
## 3                  5                  24               5.5               -
## 4                  6                  25               4.0             9.5
## 5                 10                  30               7.0             1.0
## 6                 13                  36               5.0            21.0
## 7                 24                  52               5.0            10.0
## 8                 36                  60               5.5             2.0
## 9                 48                  60               8.0             5.0
## 10                52                  52               5.0            17.0
## 11                52                  52               5.0             5.0
## 12                52                  60               6.0             6.0
## 13                52                  52               5.0             6.0
## 14                52                  72               8.0            15.0
## 15                52                  48              15.0            48.0
## 16                52                  52              36.0            62.0
## 17                52                  36                 -               -
##    heroin.frequency hallucinogen.frequency inhalant.frequency
## 1              35.5                     52               19.0
## 2                 -                      6               12.0
## 3               2.0                      3                5.0
## 4               1.0                      4                5.5
## 5              66.5                      3                3.0
## 6              64.0                      3                4.0
## 7              46.0                      4                4.0
## 8             180.0                      3                3.0
## 9              45.0                      2                4.0
## 10             30.0                      4                2.0
## 11             57.5                      3                4.0
## 12             88.0                      2                2.0
## 13             50.0                      3                4.0
## 14             66.0                      2                3.5
## 15            280.0                      3               10.0
## 16             41.0                     44               13.5
## 17            120.0                      2                  -
##    pain.releiver.frequency oxycontin.frequency tranquilizer.frequency
## 1                       36                24.5                   52.0
## 2                       14                41.0                   25.5
## 3                       12                 4.5                    5.0
## 4                       10                 3.0                    4.5
## 5                        7                 4.0                   11.0
## 6                        9                 6.0                    7.0
## 7                       12                 7.0                   12.0
## 8                       12                 7.5                    4.5
## 9                       10                12.0                   10.0
## 10                      15                13.5                    7.0
## 11                      15                17.5                   12.0
## 12                      15                20.0                   10.0
## 13                      13                13.5                   10.0
## 14                      22                46.0                    8.0
## 15                      12                12.0                    6.0
## 16                      12                 5.0                   10.0
## 17                      24                   -                    5.0
##    stimulant.frequency meth.frequency sedative.frequency
## 1                  2.0              -               13.0
## 2                  4.0            5.0               19.0
## 3                 12.0           24.0               16.5
## 4                  6.0           10.5               30.0
## 5                  9.5           36.0                3.0
## 6                  9.0           48.0                6.5
## 7                  8.0           12.0               10.0
## 8                  6.0          105.0                6.0
## 9                 12.0           12.0                4.0
## 10                10.0            2.0                9.0
## 11                10.0           46.0               52.0
## 12                10.0           21.0               17.5
## 13                 7.0           30.0                4.0
## 14                12.0           54.0               10.0
## 15                24.0          104.0               10.0
## 16                24.0           30.0              104.0
## 17               364.0              -               15.0
colnames(drugDataRaw)[col_order_use] <- sub(".use", "", colnames(drugDataRaw)[col_order_use])
drugType <- select(drugDataRaw, c(1, 2, 29, col_order_use))
drugFreq <- select(drugDataRaw, c(col_order_frequency))

drugType <- gather(drugType, drug_types, use_percent, 4:16)
drugFreq <- gather(drugFreq, frequent, frequency, 1:13)
# drugData <-
# cbind(select(drugDataRaw,col_order_use),select(drugDataRaw,col_order_frequency))

drugData <- cbind(drugType, drugFreq)

drugData <- drugData[, -6]

Some statistics:

library(ggplot2)

drug_stat_type <- drugData %>% group_by(drug_types) %>% summarise(highest_use = max(use_percent), 
    highest_frequency = max(frequency), min_use = min(use_percent), 
    age = drugData[which.max(use_percent), 1])
drug_stat_age <- drugData %>% group_by(age_group) %>% summarise(highest_use = max(use_percent), 
    highest_frequency = max(frequency), min_use = min(use_percent), 
    drug_types = drugData[which.max(use_percent), 4])

stat 1:

View(drug_stat_type)

stat 2:

View(drug_stat_age)

Figure 1:

ggplot(drug_stat_type, aes(x = drug_types, y = highest_frequency)) + 
    geom_point(aes(color = age, size = highest_use)) + labs(title = "Drug types and percentage of uses in different ages") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

Both the stats and the figure 1 shows that alcohol is the highest abused substance among all but the frequency of alcohol use is considerably lower than other more harmful substances. Ages between 22 and 23 are the highest abusers of alcohol and heroin. While the frequency of heroin is low but the percentage of use is really high i.e. whoever is addicted to heroin uses it at a higher level. Closer attention needs to be drawn to age group 20, they are the biggest users of marijuana, pain releiver and cocaine, their uses are distributed but the combined use is big.

Figure 2:

ggplot(drugData, aes(age_group, use_percent)) + geom_bar(aes(fill = drug_types), 
    stat = "identity", position = "dodge") + labs(title = "Drug types and their in various age groups", 
    y = "percentage of use")

Figure 2 depicts that alcohol is the most abused substance in all age groups. While the range in age group 12-21 is bigger still it is a bad news that this age group has a huge abuse percentage.

Figure 3:

ggplot(drugData, aes(x = age_group, y = use_percent, group = drug_types, 
    color = drug_types)) + geom_line() + labs(title = "Percentage of use of drugs by various age groups ", 
    y = "Percentge")

Figure 3 shows that alcohol and marijuana are two highest abused substance. The use of both declines age the population grow older although the rate of decline in alcohol is not encouraging. The percentage of use of both drugs in age group 12-21 is very alarming.