Step 1 Loading required packages

library(mosaic)
library(mosaicData)
library(dplyr)
library(kableExtra)

Step2 Print first 5 rows and first 5 columns of data

Note I am using one of the several formats avaible at https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html

dt<-HELPrct[1:5, 1:5]

dt %>%
  kbl(caption = "Booktabs style table for HELPrct") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Booktabs style table for HELPrct
age anysubstatus anysub cesd d1
37 1 yes 49 3
37 1 yes 30 22
26 1 yes 39 0
39 1 yes 15 2
32 1 yes 39 12

Step 3 Add an additional column (secondary variable created from primary variable) using avg_drinks column, call it drinks_binary. Anyone with >=5 drinks == heavy drinker, otherwise == not heavy drinker. Use tally to count how many in each category.

HELPrct$drinks_binary[HELPrct$avg_drinks >= 5] <- "Heavy Drinker"
HELPrct$drinks_binary[HELPrct$avg_drinks < 5] <- "Not a Heavy Drinker"
count.drinker<-tally(~drinks_binary, data=HELPrct)
kbl(count.drinker)
drinks_binary Freq
Heavy Drinker 313
Not a Heavy Drinker 140

Step4 Create an additional race column, which merges racegroups into three categories only, white, black and others (others +Hispanics).

mydata <- HELPrct
race.1 <- recode(mydata$racegrp, "hispanic" = "other") 
mydata.1 <- mutate(mydata, race.1)
dt1<-mydata.1[1:5, c(1,23, 32)]
dt1 %>%
  kbl(caption = "HELPrct with new race column") %>%
  kable_classic(full_width = F, html_font = "Cambria")
HELPrct with new race column
age racegrp race.1
37 black black
37 white white
26 black black
39 white white
32 black black

Step 5: Insert an RChunk to run the following commands and describe in a sentence if the command finds a descriptive (numerical summary vs., graphical summary) or inferential statistics. Also, comment on the statistic obtained.

One Categorical Variable: Counts by Category

dt3<-tally(~ sex, data = HELPrct)
kbl(dt3)
sex Freq
female 107
male 346
The statistic obtained shows the total number of male adult inpatients recruited, 346 males, and the total number of female adult inpatients recruited for a detoxification unit, 107 females.

Two Categorical Variables: Contingency Tables with margins

dt4<-tally(~ substance + sex, margins = TRUE, data = HELPrct)
kbl(dt4)
female male Total
alcohol 36 141 177
cocaine 41 111 152
heroin 30 94 124
Total 107 346 453
This table breaks down the tally of subjects by substance and their gender. For example, there are 30 heroine addict females in this data.

Two Quantitative Variables

dt5<-cor(cesd ~ mcs, data = HELPrct)
kbl(round(dt5,3))
x
-0.682
The correlation coefficient is -0.682, which describes a negative relationship between a depression measure where higher scores mean moredepressive symptoms (cesd) and a mental component score where lower scores indicate worse status (mcs).

Quantitative Response and Categorical Predictor

dt6<-favstats(~cesd | sex, data = HELPrct)
dt6 %>%
  kbl(caption = "Fav Stats of HELPrct's cesd by gender") %>%
  kable_classic(full_width = F, html_font = "Cambria")
Fav Stats of HELPrct’s cesd by gender
sex min Q1 median Q3 max mean sd n missing
female 3 29 38.0 46.5 60 36.88785 13.01764 107 0
male 1 24 32.5 40.0 58 31.59827 12.10332 346 0
The statistic describes different depression measures based on gender. It seems that on an average females tend to have higher depression scores (more depressive symptoms) than males.

Quantitative Response and Categorical Predictor

bwplot(age ~ sex, data = HELPrct)

Please follow the link below to learn in depth about interpretation of boxplots. https://www.simplypsychology.org/boxplots.html

  • The tally commands find descriptive statistics, i.e., the frequency of observations in each category.

  • The cor command finds the correlation coefficient between two variables, which measures the strength of linear association between two variables. It is commonly used in inferential procedures such as, hypothesis testing.

  • The favstats command finds descriptive statistics, more specifically, the minimum, Q1, median, Q3, max, mean, standard deviation, and number of observations.

  • The bwplot command is just a graphical summary in a box and whisker plot of the 5 number summary (min, Q1, median, Q3, and max).