\(~\)

Polycystic Ovarian Syndrome (PCOS)

\(~\)

Background:

Polycystic ovarian syndrome (PCOS) is a reproductive hormonal imbalance condition affecting as much as 5 million women in the US alone. It’s believed that genetics and environmental factors can cause PCOS that affect their body physically and emotionally along with their metabolism, overall health and appearance in women. PCOS causes problems in ovaries making it hard for women to have a healthy menstrual cycle leading to the development of cysts and infertility. Although very common in women of reproductive age, PCOS may begin shortly after puberty but can also develop during the later teenage years and early adulthood.

\(~\)

Data Source:

The data sets I will be using are from Kaggle collected from 10 different hospital across Kerala, India:

The first data set pcos includes 541 entries and a total of 6 columns. The second data set pcos_infertility includes 541 entires and a total of 45 columns.

Things to consider and understand the data set:

  • Units used are feet to cm
  • For Yes | No questions
    • Yes = 1
    • No = 0
  • Blood Groups:
    • A+ = 11
    • A- = 12
    • B+ = 13
    • B- = 14
    • O+ =15
    • O- = 16
    • AB+ = 17
    • AB- = 18
  • RBS means Random glucose test
  • Beta-HCG cases are mentioned as Case I and II.

\(~\)

Libraries intended to use:

library(tidyverse)
library(corrplot)
library(dplyr)
library(ggplot2)
library(plotly)
library(shiny)

Load data:

pcos <- read.csv("https://raw.githubusercontent.com/letisalba/DATA_608/master/Final%20Project/csv/PCOS_infertility.csv")
pcos_infertility <- read.csv("https://raw.githubusercontent.com/letisalba/DATA_608/master/Final%20Project/csv/PCOSData_without_infertility.csv")

Data Exploration:

Overview of the datasets I am working with

glimpse(pcos)
## Rows: 541
## Columns: 6
## $ Sl..No                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
## $ Patient.File.No.       <int> 10001, 10002, 10003, 10004, 10005, 10006, 10007…
## $ PCOS..Y.N.             <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,…
## $ I...beta.HCG.mIU.mL.   <dbl> 1.99, 60.80, 494.08, 1.99, 801.45, 237.97, 1.99…
## $ II....beta.HCG.mIU.mL. <dbl> 1.99, 1.99, 494.08, 1.99, 801.45, 1.99, 1.99, 1…
## $ AMH.ng.mL.             <chr> "2.07", "1.53", "6.63", "1.22", "2.26", "6.74",…
glimpse(pcos_infertility)
## Rows: 541
## Columns: 45
## $ Sl..No                 <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
## $ Patient.File.No.       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
## $ PCOS..Y.N.             <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,…
## $ Age..yrs.              <int> 28, 36, 33, 37, 25, 36, 34, 33, 32, 36, 20, 26,…
## $ Weight..Kg.            <dbl> 44.6, 65.0, 68.8, 65.0, 52.0, 74.1, 64.0, 58.5,…
## $ Height.Cm.             <dbl> 152.0, 161.5, 165.0, 148.0, 161.0, 165.0, 156.0…
## $ BMI                    <chr> "19.3", "#NAME?", "#NAME?", "#NAME?", "#NAME?",…
## $ Blood.Group            <int> 15, 15, 11, 13, 11, 15, 11, 13, 11, 15, 15, 13,…
## $ Pulse.rate.bpm.        <int> 78, 74, 72, 72, 72, 78, 72, 72, 72, 80, 80, 72,…
## $ RR..breaths.min.       <int> 22, 20, 18, 20, 18, 28, 18, 20, 18, 20, 20, 20,…
## $ Hb.g.dl.               <dbl> 10.48, 11.70, 11.80, 12.00, 10.00, 11.20, 10.90…
## $ Cycle.R.I.             <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 2, 2, 4, 2, 2, 2,…
## $ Cycle.length.days.     <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 2, 5, 5, 2, 5, 5, 5,…
## $ Marraige.Status..Yrs.  <dbl> 7, 11, 10, 4, 1, 8, 2, 13, 8, 4, 4, 3, 7, 15, 9…
## $ Pregnant.Y.N.          <int> 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0,…
## $ No..of.aborptions      <int> 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 2, 1, 0, 0, 0, 0,…
## $ I...beta.HCG.mIU.mL.   <dbl> 1.99, 60.80, 494.08, 1.99, 801.45, 237.97, 1.99…
## $ II....beta.HCG.mIU.mL. <chr> "1.99", "1.99", "494.08", "1.99", "801.45", "1.…
## $ FSH.mIU.mL.            <dbl> 7.95, 6.73, 5.54, 8.06, 3.98, 3.24, 2.85, 4.86,…
## $ LH.mIU.mL.             <dbl> 3.68, 1.09, 0.88, 2.36, 0.90, 1.07, 0.31, 3.07,…
## $ FSH.LH                 <chr> "#NAME?", "#NAME?", "#NAME?", "#NAME?", "#NAME?…
## $ Hip.inch.              <int> 36, 38, 40, 42, 37, 44, 39, 44, 39, 40, 39, 39,…
## $ Waist.inch.            <int> 30, 32, 36, 36, 30, 38, 33, 38, 35, 38, 35, 33,…
## $ Waist.Hip.Ratio        <chr> "#NAME?", "#NAME?", "#NAME?", "#NAME?", "#NAME?…
## $ TSH..mIU.L.            <dbl> 0.68, 3.16, 2.54, 16.41, 3.57, 1.60, 1.51, 12.1…
## $ AMH.ng.mL.             <chr> "2.07", "1.53", "6.63", "1.22", "2.26", "6.74",…
## $ PRL.ng.mL.             <dbl> 45.16, 20.09, 10.52, 36.90, 30.09, 16.18, 26.41…
## $ Vit.D3..ng.mL.         <dbl> 17.10, 61.30, 49.70, 33.40, 43.80, 52.40, 42.70…
## $ PRG.ng.mL.             <dbl> 0.57, 0.97, 0.36, 0.36, 0.38, 0.30, 0.46, 0.26,…
## $ RBS.mg.dl.             <dbl> 92, 92, 84, 76, 84, 76, 93, 91, 116, 125, 108, …
## $ Weight.gain.Y.N.       <int> 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1,…
## $ hair.growth.Y.N.       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,…
## $ Skin.darkening..Y.N.   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,…
## $ Hair.loss.Y.N.         <int> 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,…
## $ Pimples.Y.N.           <int> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,…
## $ Fast.food..Y.N.        <int> 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0,…
## $ Reg.Exercise.Y.N.      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,…
## $ BP._Systolic..mmHg.    <int> 110, 120, 120, 120, 120, 110, 120, 120, 120, 11…
## $ BP._Diastolic..mmHg.   <int> 80, 70, 80, 70, 80, 70, 80, 80, 80, 80, 80, 80,…
## $ Follicle.No...L.       <int> 3, 3, 13, 2, 3, 9, 6, 7, 5, 1, 7, 4, 15, 3, 4, …
## $ Follicle.No...R.       <int> 3, 5, 15, 2, 4, 6, 6, 6, 7, 1, 15, 2, 8, 3, 1, …
## $ Avg..F.size..L...mm.   <dbl> 18, 15, 18, 15, 16, 16, 15, 15, 17, 14, 17, 18,…
## $ Avg..F.size..R...mm.   <dbl> 18, 14, 20, 14, 14, 20, 16, 18, 17, 17, 20, 19,…
## $ Endometrium..mm.       <dbl> 8.5, 3.7, 10.0, 7.5, 7.0, 8.0, 6.8, 7.1, 4.2, 2…
## $ X                      <chr> "", "", "", "", "", "", "", "", "", "", "", "",…

\(~\)

Relevancy:

Being diagnosed almost 2 years ago with PCOS I was intrigued to learn more about it and explore the data that was available. Although not much data was easily accessible nor was this the data set I had in mind, it’s a start to explore differences or similarities women share with their physical aspect and blood work. It’s also important to be aware of symptoms experienced by PCOS because it’s such symptoms that can always be misinterpreted as “too much stress” or “just lose weight” by doctors. After advocating for myself with multiple doctors over the span of 6 years I saw the true value in listening to your body and sharing my experience with others.

\(~\)

Project Plan and Visualizations:

Data preparation will consist of merging two csv files into one but before attempting to do this I will have to target the missing values in the data set pcos_infertility such as BMI, FSH.LH and Waist.Hip.Ratio. I will most likely use Shiny App and have interactive visualizations with Plotly to go through each category and define any patterns women from Indian with PCOS may have. Essentially we could use these commonalities to help women all over learn to distinguish symptoms not only by the physical aspects PCOS brings but also during a routine lab work.

\(~\)

References:

  • Kottarathil, P. (2020, October 11). Polycystic ovary syndrome (PCOS). Kaggle. Retrieved October 9, 2022, from https://www.kaggle.com/datasets/prasoonkottarathil/polycystic-ovary-syndrome-pcos

  • Stewart, M. M., & Foster, S. (2012). PCOS awareness association. PCOS Awareness Association. Retrieved October 9, 2022, from https://www.pcosaa.org/

  • Bartlett, E., & Erlich, L. (2015). Feed your fertility: Your guide to cultivating a healthy pregnancy with traditional Chinese medicine, real food, and holistic living. Fair Winds Press.