library(knitr)
library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(broom)
library(ggplot2)
library(DataExplorer)

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

merged.DTI.MD <- read.csv("/projects/neda/FINAL.ANALYSIS.DTI/QCd.ALL/All.MD.csv")
merged.DTI.MD$EduCateg <- as.factor(merged.DTI.MD$EduCateg)
merged.DTI.MD$Sex <- as.factor(merged.DTI.MD$Sex)

Missing values

##   rows columns discrete_columns continuous_columns all_missing_columns
## 1  310      70                6                 64                   0
##   total_missing_values complete_rows total_observations memory_usage
## 1                   91           219              21700       206504

To visualize frequency distributions for all discrete features

## 1 columns ignored with more than 50 categories.
## ID: 310 categories

To visualize distributions for all continuous features

QQ plot

#From the chart, aUNC.L_MD“,”UNC.R_MD" seems skewed on both tails. So will apply a simple log transformation and plot them again: it didnt change

log_qq_data <- update_columns(qq_data, 5:6, function(x) log(x + 1))
plot_qq(log_qq_data[, 5:6], sampled_rows = 1000L)

plot_qq(qq_data[, 5:6], sampled_rows = 1000L)

QQ plot by Dx group

To visualize correlation heatmap for all non-missing features

## 2 features with more than 5 categories ignored!
## ID: 219 categories
## EduCateg: 7 categories

To visualize correlation heatmap for only continuous features

To perform and visualize PCA on some selected features - variance explained by PC

Boxplots data visualization

Scatterplots data visualization

#create_report(All.subcort.Vol)

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.