Data Summary in R

Run Date: 2024-10-09

Performing data summary involves using various methods to explore and understand data sets. In Base R, we could use functions like summary(), str(), and head() to provide quick insights into the data and key statistics. We could use other packages like psych and crosstable to create nicer summary tables. Additionally, packages like dplyr and tidyverse enable the analyst to clean and manage data sets. Visualization is a crucial part of summarizing and understanding the data. The ggplot2 function allow us to create different types of figures. Together, these techniques (tables, figures, listings …etc) form a comprehensive approach to the data analysis.

1 Library

Load some useful libraries for the data management and data analysis.
Use results=‘hide’ in the chunk options to hide the results and message=FALSE to suppress messages like this … {r lib, results=‘hide’, message=FALSE}

# Data management 
library(tidyverse) # data management & other 
library(tidyr) # organize tabular data 
library(janitor)  # data cleaning like clean_names 
library(data.table) # rbindlist function "makes one data.table from a list of many"


# Data summary 
library(psych) # numeric data summary 
library(crosstable) # cross-tabulation 
library(summarytools)  # data summary tools 
library(epitools)  # epidemiology tools 
library(ggsci) # themes for plots 


# Data analysis 
library(Hmisc)  #  many functions useful for  analysis, graphics, computing sample size, simulation, importing and annotating, imputation 
library(stats) # statistical tests 
library(pROC) # ROC analysis 

# Power 
library(pwr) # Power calculations  
library(WebPower) # basic and advanced statistical power analysis 
library(rpact) # adaptive trial design 


# Data sets 
library(medicaldata)

2 Summary

To summarize numeric data, we could use the describeBy function of the psych package. To create a summary table by group, we could use the crosstable function of the crosstable package

3 Plots