Part I. Overview

The data used in this project is provided by Kaggle competition “RSNA-MICCAI Brain Tumor Radiogenomic Classification”.

The goal of this project is to improve the diagnosis of giloblastoma by defining the genetic subtype of giloblastoma using MRI imaging scan. The data is composed with a training set (N = 585) and a testing set (N = 87). The subject-level MGMT promoter methylation value for the training set is also provided in the csv file train_labels.csv. The model performance will be evaluated by predicting the hidden label of the testing set. The data size is 136.85 GB.

The training and testing imaging data are stored in DICOM format. Every DICOM folder represents the MRI scan for one subject that has four sub folders, FLAIR, T1W, T1GD and T2. There are 585 DICOM folders and 87 DICOM folders representing the MRI scans for the training data set and testing data set correspondingly. The task is to predict the MGMT value of the test set.

The purpose of this markdown file is to use FLAIR image as an example. We want to visualize the different between the methylated and unmethylated subject. I use one methylated subject 359 and one unmethylated subject 308 as examples. The learnings from these visualization could help us to decide the features that we can derive for classification analysis.

Part II. Overview of train_labels.csv

The csv file train_labels.csv contains two columns and 585 rows. The first column corresponds to the subject ID of the 585 subjects from the training set. The second column represents the MGMT promoter methylation value for the subject from the same row. The MGMT promoter methylation value represents the methylation status, and it is binary: 1 is defined that the corresponding subject on that row is unmethylated, 0 means methylated.

Count of subjects under the binary MGMT promoter methylation status in training set

Count of subjects under the binary MGMT promoter methylation status in training set

Part III. Ad-hoc visualization of FLAIR for training subject 359 who is methylated

The FLAIR image folder of subject 359 has 60 images. Each image file is a data frame with 256 rows and 256 columns.

The figure below shows the visualization of the selected FLAIR image: 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55 and 60 ordered by row. We can see that the 55th image is grey, which means all values in the data frame used for plotting the image are 0. These type of images contain non-information for performing analysis. We need to exclude such images.

Meanwhile, we can easily observe from the image that there are large white matters which implies methylation.

Selected FLAIR image for subject 359

Selected FLAIR image for subject 359

We can observe that the image from 5 to 53 are not all imputed with 0. Here is a quick look at the first and last 12 FLAIR for subject 359 that is not all 0.

First 12 FLAIR for subject 359 that is not all 0

First 12 FLAIR for subject 359 that is not all 0

Last 12 FLAIR for subject 359 that is not all 0

Last 12 FLAIR for subject 359 that is not all 0

Part IV. Ad-hoc visualization of FLAIR for training subject 308 who is unmethylated

The FLAIR image folder of subject 308 also has 60 images. Each image file is a data frame with 256 rows and 192 columns. The dimension of the image data set is NOT the same as the subject 359.

The figure below shows the visualization of the selected FLAIR image for subject 308: 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55 and 60 ordered by row. We can see that the 55th image and the 60th image are grey, which means all values in the data frame used for plotting the image are 0. These type of images contain non-information for performing analysis. We need to exclude these images. Comparing to the subject 359, this subject without methylationn seems to have less white matters. This can help us to understand the data and plan for the adequate analysis plan.

Selected FLAIR image for subject 308

Selected FLAIR image for subject 308

We can observe that the image from 7 to 51 are not all imputed with 0. Here is a quick look at the first and last 12 FLAIR for subject 359 that is not all 0.

Comparing to the image of subject 359, these images does not contain the large white matter, which is a typical image of unmethylated subject.

First 12 FLAIR for subject 308 that is not all 0

First 12 FLAIR for subject 308 that is not all 0

Last 12 FLAIR for subject 308 that is not all 0

Last 12 FLAIR for subject 308 that is not all 0

Part V. Compare the aggregated average heatmap of subject 359 and subject 308

With the non-informative images being removed, we can do a quick visualization of the aggregated heatmap of subject 359 and subject 308 to explore if there is any different in terms of methylation status.

The heatmap of subject 308 is more evenly colored than the subject 359, even the color of the heatmap of subject 359 is lighter. This might induce the features that we can use for future analysis.

Heat map of the aggregated average of the images for subject 308 (images with 0 value only were excluded)

Heat map of the aggregated average of the images for subject 308 (images with 0 value only were excluded)

Heat map of the aggregated average of the images for subject 308 (images with 0 value only were excluded)

Heat map of the aggregated average of the images for subject 308 (images with 0 value only were excluded)