library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tinytex)
library(readr)
library(summarytools)
##
## Attaching package: 'summarytools'
##
## The following object is masked from 'package:tibble':
##
## view
Dataset imported from excel file
Activated-protein-C (APC) resistance is a serum marker that has been associated with thrombosis (the formation of blood clots often leading to heart attacks) among adults. A study assessed this risk factor among adolescents. To assess the reproducibility of the assay, a split-sample technique was used in which a blood sample was provided by 10 people; each sample was split into two aliquots (sub-samples), and each aliquot was assessed separately. The following table gives the results.
library(readxl)
q2_table1 <- read_excel("~/utmb/classes_fall_2025/biostatistics/assignments/lab_09_10_2025/q2_table1.xlsx")
df <- q2_table1
rm(q2_table1)
# Combine columns in our dataset with the mutate function
mean_ab <- df |>
rowwise() |>
mutate(mean(c(A,B)))
# This created a new table with a new df with mean added
# try it again with mutate encompassing and name the new columns
df_mean_sd_cv <- df |>
rowwise() |>
mutate(
mean_ab = mean(c(A,B)),
sd_ab = sd(c(A,B)),
cv_ab = sd_ab/mean_ab * 100,
cv_ab = round(cv_ab, 1)
)
mean_cv = mean(df_mean_sd_cv$cv_ab)
The Left Ventricular Mass lndex (LVMI) is a measure of the enlargement of the left side of the heart and is expressed in the units (gm/ht(m)2.7). High values may predict subsequent cardiovascular disease in children as they get older.† A study is performed to relate the level of LVMI to blood pressure category in children and adolescents age 10–18. The bp level of children was categorized as either Normal (bpcat = 1 or bp percentile < 80% for a given age, gender, and height), Pre-Hypertensive (bpcat = 2 or bp percentile ≥ 80% and bp percentile < 90%), or Hypertensive (bpcat = 3 or bp percentile ≥ 90%). A description of the variables in the data is given below.
Lets Import that dataset
library(readxl)
lvm_dat <- read_excel("~/utmb/classes_fall_2025/biostatistics/assignments/lab_09_10_2025/lvm_dat.xlsx")
View(lvm_dat)
And rename some variables
lvm_dat |>
rename(
id = ID,
bmi = BMI
)
## # A tibble: 224 × 6
## id lvmht27 bpcat gender age bmi
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 31.3 1 1 17.6 21.4
## 2 2 36.8 1 2 16.1 19.8
## 3 6 20.7 1 2 17.0 20.6
## 4 10 44.2 1 2 11.5 25.3
## 5 16 23.3 1 1 11.9 17.3
## 6 20 27.7 1 2 10.5 19.2
## 7 24 21.2 1 1 12.9 16.8
## 8 25 16.9 1 1 11.8 14.5
## 9 26 19.3 1 2 11.6 20
## 10 29 35.8 1 2 17.0 25.2
## # ℹ 214 more rows
We are looking to Group by bpcat and take the mean of each vector (1, 2, 3)
library(knitr)
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
lvm_dat |>
group_by(bpcat) |>
summarize(
mean_lvm = mean(lvmht27),
)|>
kable(digits=4) |>
kable_styling()
bpcat | mean_lvm |
---|---|
1 | 29.3427 |
2 | 33.7910 |
3 | 34.1157 |
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
lvm_dat |>
group_by(bpcat) |>
summarize(
geom_mean_lvm = geometric.mean(lvmht27),
)|>
kable(digits=4) |>
kable_styling()
bpcat | geom_mean_lvm |
---|---|
1 | 28.6059 |
2 | 33.3481 |
3 | 32.8894 |
lvm_dat |>
ggplot(aes(x = factor(bpcat), y = lvmht27)) + # New term factor - 1, 2, and 3 are considered as a catergorical group instead of continous numeric axis (basically making it discrete instead of continous)
geom_boxplot()