Problem 2

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tinytex)
library(readr)
library(summarytools)

## 
## Attaching package: 'summarytools'
## 
## The following object is masked from 'package:tibble':
## 
##     view

Dataset imported from excel file

Activated-protein-C (APC) resistance is a serum marker that has been associated with thrombosis (the formation of blood clots often leading to heart attacks) among adults. A study assessed this risk factor among adolescents. To assess the reproducibility of the assay, a split-sample technique was used in which a blood sample was provided by 10 people; each sample was split into two aliquots (sub-samples), and each aliquot was assessed separately. The following table gives the results.

library(readxl)
q2_table1 <- read_excel("~/utmb/classes_fall_2025/biostatistics/assignments/lab_09_10_2025/q2_table1.xlsx")
df <- q2_table1
rm(q2_table1)

To assess the variability of the assay, the investigators need to compute the coefficient of variation. Compute the coefficient of variation (CV) for each subject by obtaining the mean and standard deviation over the 2 replicates for each subject. (Round your answers to one decimal place.)
coefficient of variation is

standard deviation / mean * 100

# Combine columns in our dataset with the mutate function
mean_ab <- df |> 
  rowwise() |> 
  mutate(mean(c(A,B)))

# This created a new table with a new df with mean added
# try it again with mutate encompassing and name the new columns

df_mean_sd_cv <- df |> 
  rowwise() |> 
  mutate(
    mean_ab = mean(c(A,B)),
    sd_ab = sd(c(A,B)),
    cv_ab = sd_ab/mean_ab * 100,
    cv_ab = round(cv_ab, 1)
       )

Compute the average CV (as a percent) over the 10 subjects as an overall measure of variability of the assay. (Round your answer to one decimal place.)

mean_cv = mean(df_mean_sd_cv$cv_ab)

Problem 3

The Left Ventricular Mass lndex (LVMI) is a measure of the enlargement of the left side of the heart and is expressed in the units (gm/ht(m)2.7). High values may predict subsequent cardiovascular disease in children as they get older.† A study is performed to relate the level of LVMI to blood pressure category in children and adolescents age 10–18. The bp level of children was categorized as either Normal (bpcat = 1 or bp percentile < 80% for a given age, gender, and height), Pre-Hypertensive (bpcat = 2 or bp percentile ≥ 80% and bp percentile < 90%), or Hypertensive (bpcat = 3 or bp percentile ≥ 90%). A description of the variables in the data is given below.

Lets Import that dataset

library(readxl)
lvm_dat <- read_excel("~/utmb/classes_fall_2025/biostatistics/assignments/lab_09_10_2025/lvm_dat.xlsx")
View(lvm_dat)

And rename some variables

lvm_dat |> 
  rename(
    id = ID,
    bmi = BMI
  )

## # A tibble: 224 × 6
##       id lvmht27 bpcat gender   age   bmi
##    <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>
##  1     1    31.3     1      1  17.6  21.4
##  2     2    36.8     1      2  16.1  19.8
##  3     6    20.7     1      2  17.0  20.6
##  4    10    44.2     1      2  11.5  25.3
##  5    16    23.3     1      1  11.9  17.3
##  6    20    27.7     1      2  10.5  19.2
##  7    24    21.2     1      1  12.9  16.8
##  8    25    16.9     1      1  11.8  14.5
##  9    26    19.3     1      2  11.6  20  
## 10    29    35.8     1      2  17.0  25.2
## # ℹ 214 more rows

What is the arithmetic mean of LVMI by blood pressure group? (Round your answers to four decimal places.)

bp categories 1, 2, 3

We are looking to Group by bpcat and take the mean of each vector (1, 2, 3)

library(knitr)
library(kableExtra)

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

lvm_dat |> 
  group_by(bpcat) |> 
  summarize(
    mean_lvm = mean(lvmht27),
  )|> 
  kable(digits=4) |> 
  kable_styling()

bpcat	mean_lvm
1	29.3427
2	33.7910
3	34.1157

What is the geometric mean of LVMI by blood pressure group? (Round your answers to four decimal places.)
need to install a package “psych” to create a geometric mean function. function is called geometric.mean()

library(psych)

## 
## Attaching package: 'psych'

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

lvm_dat |> 
  group_by(bpcat) |> 
  summarize(
    geom_mean_lvm = geometric.mean(lvmht27),
  )|> 
  kable(digits=4) |> 
  kable_styling()

bpcat	geom_mean_lvm
1	28.6059
2	33.3481
3	32.8894

Provide a box plot of LVMI by blood pressure group.
We need to map a box plot - ggplot2

lvm_dat |> 
  ggplot(aes(x = factor(bpcat), y = lvmht27)) + # New term factor - 1, 2, and 3 are considered as a catergorical group instead of continous numeric axis (basically making it discrete instead of continous)
    geom_boxplot()

Lab Biostatistics 09.10.2025

Alex Tan

2025-09-10

Problem 2

Problem 3