WiBD-Data

Chenxi Zhao

WiDS Dataset Description

Overview

Overview

我们将分析来自健康大脑网络(HBN)的诊断数据社会人口统计数据情感数据育儿数据,以及功能性磁共振成像(fMRI)数据。采用社区推荐的招募模式。

功能性磁共振成像数据用于提取每个大脑区域的活动时间序列,并将这些区域的时间序列进行相关分析,以获得功能性磁共振成像连接组矩阵

目标:

是建立一个模型,以预测个体的性别ADHD诊断,使用儿童和青少年的功能性脑成像数据以及他们的社会人口统计、情感和育儿信息。

Dataset Folders

Training Folder

训练文件夹 train_tsv 包含关于1200多名受试者的三种类型的信息。它们是:

  1. 目标(ADHD诊断和性别)
  2. 功能性磁共振成像连接组矩阵
  3. 社会人口统计信息,例如受试者的“handedness”或“parent’s education level”,emotions(“Strength and Difficulties Questionnaire, SDQ”)和parenting information(“Alabama Parenting Questionnaire, APQ”)。这些信息包括定量和分类的元数据。

温馨提示:参与者需要处理分类数据(可能需要创建虚拟变量),然后将处理后的数据集与功能连接组数据集结合,以创建最终的训练数据集,用于他们的模型。

Testing Folder

测试文件夹 test_tsv 包含300多名受试者的未见数据框。这些数据框包括:
a) 功能性磁共振成像连接组矩阵
b) 社会人口统计信息、情感和育儿信息

您需要提交一个解决方案文件,包含测试数据集中每一行的ADHD诊断类型和性别。
您还将获得一个为提交准备的示例解决方案文件。

外部数据使用

本次数据竞赛的任务可以在不使用外部数据的情况下成功完成。

事实上,我们对数据的匿名化程度使得将额外数据与竞赛数据结合变得困难。

Submit

Target Variables

ADHD_Outcome: Type of Diagnosis (0=Other/None, 1=ADHD)
Sex_F: Sex of participant (0=Male, 1=Female)

You will also be provided a full data dictionary

Data Screen

online

https://www.kaggle.com/competitions/widsdatathon2025/data

EDA

Data Dictionary

Data Dictionary.xlsx 文件中包含五个工作表:

/Users/zhaochenxi/Documents/PrivateApplications/WiDS/Backgrounds

EDA2

数据集 Dictionary 的行数为37,列数为6
-------------------------------------------------
数据集 Instrument_Description 的行数为8,列数为2
-------------------------------------------------
数据集 EHQ 的行数为16,列数为2
-------------------------------------------------
数据集 APQ 的行数为43,列数为2
-------------------------------------------------
数据集 SDQ 的行数为34,列数为2
-------------------------------------------------

分类数据

TRAIN_CATEGORICAL_METADATA.xlsx 文件中包含了分类数据的元数据,如下所示:

EHQ_EHQ_Total ColorVision_CV_Score APQ_P_APQ_P_CP APQ_P_APQ_P_ID APQ_P_APQ_P_INV APQ_P_APQ_P_OPD APQ_P_APQ_P_PM APQ_P_APQ_P_PP SDQ_SDQ_Conduct_Problems SDQ_SDQ_Difficulties_Total SDQ_SDQ_Emotional_Problems SDQ_SDQ_Externalizing SDQ_SDQ_Generating_Impact SDQ_SDQ_Hyperactivity SDQ_SDQ_Internalizing SDQ_SDQ_Peer_Problems SDQ_SDQ_Prosocial MRI_Track_Age_at_Scan
participant_id
UmrK0vMLopoR 40.00 13 3 10 47 13 11 28 0 6 1 5 0 5 1 0 10 NaN
CPaeQkhcjg7d -94.47 14 3 13 34 18 23 30 0 18 6 8 7 8 10 4 5 NaN
Nb4EetVPm3gs -46.67 14 4 10 35 16 10 29 1 14 2 8 5 7 6 4 9 8.239904
p4vPhVu91o4b -26.68 10 5 12 39 19 16 28 6 24 4 16 9 10 8 4 6 NaN
M09PXs7arQ5E 0.00 14 5 15 40 20 24 28 1 18 4 11 4 10 7 3 9 8.940679
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Atx7oub96GXS 87.80 14 5 14 39 20 15 21 1 9 2 7 3 6 2 0 9 10.697923
groSbUfkQngM 77.80 14 3 10 32 20 11 23 6 18 3 11 9 5 7 4 7 13.964750
zmxGvIrOD0bt 16.68 14 3 16 28 15 19 27 3 4 1 3 0 0 1 0 10 NaN
rOmWFuJCud5G 53.40 14 3 14 34 18 23 23 4 9 0 9 3 5 0 0 9 12.089094
k8HhHnnu2wmt -57.80 14 5 11 42 18 16 24 3 12 0 12 6 9 0 0 8 12.595710

1213 rows × 18 columns

表格一共包含1213个被试,10种测量分数
-----------------------------------------------------------------------
Index(['participant_id', 'Basic_Demos_Enroll_Year', 'Basic_Demos_Study_Site',
       'PreInt_Demos_Fam_Child_Ethnicity', 'PreInt_Demos_Fam_Child_Race',
       'MRI_Track_Scan_Location', 'Barratt_Barratt_P1_Edu',
       'Barratt_Barratt_P1_Occ', 'Barratt_Barratt_P2_Edu',
       'Barratt_Barratt_P2_Occ'],
      dtype='object')

分类数据–Basic_Demos

1=Staten Island 2=MRV 3=Midtown 4=Harlem 5=SI RUMC

+---+-------------------------+------------------------+
|   | Basic_Demos_Enroll_Year | Basic_Demos_Study_Site |
+---+-------------------------+------------------------+
| 0 |          2016           |           1            |
| 1 |          2019           |           3            |
| 2 |          2016           |           1            |
| 3 |          2018           |           3            |
| 4 |          2019           |           3            |
+---+-------------------------+------------------------+

分类数据–Pre-Interview- Demographics/Family

  • PreInt_Demos_Fam_Child_Ethnicity: 0= Not Hispanic or Latino,1= Hispanic or Latino, 2= Decline to specify, 3= Unknown

  • PreInt_Demos_Fam_Child_Race 0= White/Caucasian, 1= Black/African American,2= Hispanic,3= Asian,4= Indian, 5= Native American Indian,6= American Indian/Alaskan Native, 7= Native Hawaiian/Other Pacific Islander,8= Two or more,9= Other race,10= Unknown,11=Choose not to specify

+---+----------------------------------+-----------------------------+
|   | PreInt_Demos_Fam_Child_Ethnicity | PreInt_Demos_Fam_Child_Race |
+---+----------------------------------+-----------------------------+
| 0 |               0.0                |             0.0             |
| 1 |               1.0                |             2.0             |
| 2 |               1.0                |             8.0             |
| 3 |               0.0                |             8.0             |
| 4 |               0.0                |             1.0             |
+---+----------------------------------+-----------------------------+

分类数据–MRI_Track_Scan_Location

1=Staten Island, 2=RUBIC,3=CBIC,4=CUNY

+---+-------------------------+
|   | MRI_Track_Scan_Location |
+---+-------------------------+
| 0 |            1            |
| 1 |            3            |
| 2 |            1            |
| 3 |            3            |
| 4 |            3            |
+---+-------------------------+

分类数据–Barratt Simplified Measure of Social Status

Hollingshead 根据婚姻状况、退休/就业状况(退休人员使用其最后职业)、教育程度和职业声望设计了一种简单的社会地位衡量方法。可以代表社会经济地位。

+----+------------------------+------------------------------------------------------------------------+
|    | Field                  | Labels                                                                 |
+====+========================+========================================================================+
| 28 | Barratt_Barratt_P1_Edu | 3=Less than 7th grade 6=Junior high/Middle school (9th grade)          |
|    |                        | 9=Partial high school (10th or 11th grade) 12=High school graduate     |
|    |                        | 15=Partial college (at least one year) 18=College education            |
|    |                        | 21=Graduate degree                                                     |
+----+------------------------+------------------------------------------------------------------------+
| 29 | Barratt_Barratt_P1_Occ | 0=Homemaker, stay at home parent. 5=Day laborer, janitor, house        |
|    |                        | cleaner, farm worker, food counter sales, food preparation worker,     |
|    |                        | busboy. 10=Garbage collector, short-order cook, cab driver, shoe       |
|    |                        | sales, assembly line workers, masons, baggage porter. 15=Painter,      |
|    |                        | skilled construction trade, sales clerk, truck driver, cook, sales     |
|    |                        | counter or general office clerk. 20=Automobile mechanic, typist,       |
|    |                        | locksmith, farmer, carpenter, receptionist, construction laborer,      |
|    |                        | hairdresser. 25=Machinist, musician, bookkeeper, secretary, insurance  |
|    |                        | sales, cabinet maker, personnel specialist, welder. 30=Supervisor,     |
|    |                        | librarian, aircraft mechanic, artist and artisan, electrician,         |
|    |                        | administrator, military enlisted personnel, buyer. 35=Nurse, skilled   |
|    |                        | technician, medical technician, counselor, manager, police and fire    |
|    |                        | personnel, financial manager, physical, occupational, speech           |
|    |                        | therapist. 40=Mechanical, nuclear, and electrical engineer,            |
|    |                        | educational administrator, veterinarian, military officer, elementary, |
|    |                        | high school and special education teacher. 45=Physician, attorney,     |
|    |                        | professor, chemical and aerospace engineer, judge, CEO, senior         |
|    |                        | manager, public official, psychologist, pharmacist, accountant.        |
+----+------------------------+------------------------------------------------------------------------+
| 30 | Barratt_Barratt_P2_Edu | 3=Less than 7th grade 6=Junior high/Middle school (9th grade)          |
|    |                        | 9=Partial high school (10th or 11th grade) 12=High school graduate     |
|    |                        | 15=Partial college (at least one year) 18=College education            |
|    |                        | 21=Graduate degree                                                     |
+----+------------------------+------------------------------------------------------------------------+
| 31 | Barratt_Barratt_P2_Occ | 0=Homemaker, stay at home parent. 5=Day laborer, janitor, house        |
|    |                        | cleaner, farm worker, food counter sales, food preparation worker,     |
|    |                        | busboy. 10=Garbage collector, short-order cook, cab driver, shoe       |
|    |                        | sales, assembly line workers, masons, baggage porter. 15=Painter,      |
|    |                        | skilled construction trade, sales clerk, truck driver, cook, sales     |
|    |                        | counter or general office clerk. 20=Automobile mechanic, typist,       |
|    |                        | locksmith, farmer, carpenter, receptionist, construction laborer,      |
|    |                        | hairdresser. 25=Machinist, musician, bookkeeper, secretary, insurance  |
|    |                        | sales, cabinet maker, personnel specialist, welder. 30=Supervisor,     |
|    |                        | librarian, aircraft mechanic, artist and artisan, electrician,         |
|    |                        | administrator, military enlisted personnel, buyer. 35=Nurse, skilled   |
|    |                        | technician, medical technician, counselor, manager, police and fire    |
|    |                        | personnel, financial manager, physical, occupational, speech           |
|    |                        | therapist. 40=Mechanical, nuclear, and electrical engineer,            |
|    |                        | educational administrator, veterinarian, military officer, elementary, |
|    |                        | high school and special education teacher. 45=Physician, attorney,     |
|    |                        | professor, chemical and aerospace engineer, judge, CEO, senior         |
|    |                        | manager, public official, psychologist, pharmacist, accountant.        |
+----+------------------------+------------------------------------------------------------------------+

分类数据–Barratt Simplified Measure of Social Status

+---+------------------------+------------------------+------------------------+------------------------+
|   | Barratt_Barratt_P1_Edu | Barratt_Barratt_P1_Occ | Barratt_Barratt_P2_Edu | Barratt_Barratt_P2_Occ |
+---+------------------------+------------------------+------------------------+------------------------+
| 0 |           21           |           45           |           21           |           45           |
| 1 |           15           |           15           |           0            |           0            |
| 2 |           18           |           40           |           0            |           0            |
| 3 |           15           |           30           |           18           |           0            |
| 4 |           15           |           20           |           0            |           0            |
+---+------------------------+------------------------+------------------------+------------------------+

分类数据–Diagnosis:ADHD_type

存储在:TRAIN/TRAINING_SOLUTIONS.xlsx 文件中

ADHD_Outcome Sex_F
participant_id
UmrK0vMLopoR 1 1
CPaeQkhcjg7d 1 0
Nb4EetVPm3gs 1 0
p4vPhVu91o4b 1 1
M09PXs7arQ5E 1 1
... ... ...
Atx7oub96GXS 0 0
groSbUfkQngM 0 1
zmxGvIrOD0bt 0 1
rOmWFuJCud5G 0 0
k8HhHnnu2wmt 0 0

1213 rows × 2 columns

  • ADHD_Outcome: 0= Other/None, 1=ADHD
  • Sex_F: 0=Male, 1=Female
+---+----------------+--------------+-------+
|   | participant_id | ADHD_Outcome | Sex_F |
+---+----------------+--------------+-------+
| 0 |  UmrK0vMLopoR  |      1       |   1   |
| 1 |  CPaeQkhcjg7d  |      1       |   0   |
| 2 |  Nb4EetVPm3gs  |      1       |   0   |
| 3 |  p4vPhVu91o4b  |      1       |   1   |
| 4 |  M09PXs7arQ5E  |      1       |   1   |
+---+----------------+--------------+-------+

连续数据–简介

df_quantitative基本信息:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1213 entries, 0 to 1212
Data columns (total 19 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   participant_id              1213 non-null   object 
 1   EHQ_EHQ_Total               1213 non-null   float64
 2   ColorVision_CV_Score        1213 non-null   int64  
 3   APQ_P_APQ_P_CP              1213 non-null   int64  
 4   APQ_P_APQ_P_ID              1213 non-null   int64  
 5   APQ_P_APQ_P_INV             1213 non-null   int64  
 6   APQ_P_APQ_P_OPD             1213 non-null   int64  
 7   APQ_P_APQ_P_PM              1213 non-null   int64  
 8   APQ_P_APQ_P_PP              1213 non-null   int64  
 9   SDQ_SDQ_Conduct_Problems    1213 non-null   int64  
 10  SDQ_SDQ_Difficulties_Total  1213 non-null   int64  
 11  SDQ_SDQ_Emotional_Problems  1213 non-null   int64  
 12  SDQ_SDQ_Externalizing       1213 non-null   int64  
 13  SDQ_SDQ_Generating_Impact   1213 non-null   int64  
 14  SDQ_SDQ_Hyperactivity       1213 non-null   int64  
 15  SDQ_SDQ_Internalizing       1213 non-null   int64  
 16  SDQ_SDQ_Peer_Problems       1213 non-null   int64  
 17  SDQ_SDQ_Prosocial           1213 non-null   int64  
 18  MRI_Track_Age_at_Scan       853 non-null    float64
dtypes: float64(2), int64(16), object(1)
memory usage: 180.2+ KB
None
表格一共包含1213个被试,19种测量分数

连续数据–介绍EHQ

EHQ_EHQ_Total

Edinburgh Handedness Questionnaire
-100 = 10th left; −28 ≤ LI < 48 = middle; 100 = 10th right 爱丁堡惯用手量表是一种测量量表,用于评估一个人在日常活动中右手或左手的主导地位。

该量表可由观察者对个人进行评估,也可由个人自我报告手的使用情况。

连续数据–介绍ColorVision

ColorVision_CV_Score Ishihara Color Vision Test

石原色觉测试是一种针对红绿色彩缺陷的色觉测试。测试由 24 块平板组成,平板上有一圈不同大小和颜色的点。图案中的圆点组成一个数字,色觉正常的人可以看到这个数字,而有红绿色盲的人则看不到或很难看到这个数字。

连续数据–介绍APQ

APQ_Parenting_Score Alabama Parenting Questionnaire - Parent Report 《阿拉巴马州养育问卷》是一份包含 42 个项目的问卷, 用于测量与儿童外化问题的病因和治疗相关的五个养育领域:积极的参与、监督/监测、使用积极的管教方式、使用此类管教方式的一致性以及使用体罚。

APQ_P_APQ_P_CP Corporal Punishment Score

APQ_P_APQ_P_ID Inconsistent Discipline Score

APQ_P_APQ_P_INV Involvement Score

APQ_P_APQ_P_OPD Other Discipline Practices Score (Not factored into total score but provides item level information)

APQ_P_APQ_P_PM Poor Monitoring/Supervision Score

APQ_P_APQ_P_PP Positive Parenting Score

连续数据–介绍APQ2

APQ_P_APQ_P_CP APQ_P_APQ_P_ID APQ_P_APQ_P_INV APQ_P_APQ_P_OPD APQ_P_APQ_P_PM APQ_P_APQ_P_PP
count 1213.000000 1213.000000 1213.000000 1213.000000 1213.000000 1213.000000
mean 3.781533 13.205276 39.374279 17.785655 16.393240 25.246496
std 1.376700 3.811772 6.245928 3.764112 5.376994 3.950529
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 3.000000 11.000000 36.000000 16.000000 13.000000 23.000000
50% 3.000000 13.000000 40.000000 18.000000 16.000000 26.000000
75% 4.000000 16.000000 43.000000 20.000000 19.000000 28.000000
max 12.000000 28.000000 50.000000 28.000000 37.000000 30.000000

连续数据–介绍SDQ

SDQ* Strengths and Difficulties Questionnaire 优势与困难问卷是一份简短的行为筛查问卷,共有 25 个心理属性项目,分为 5 个量表。 扩展版包括一个影响补充项目,询问照顾者被试是否遇到过问题,以及问题的长期性、痛苦程度、社交障碍和对他人造成的负担。

SDQ_SDQ_Conduct_Problems Conduct problems scale

SDQ_SDQ_Difficulties_Total Total Difficulties Score

SDQ_SDQ_Emotional_Problems Emotional Problems Scale

SDQ_SDQ_Externalizing Externalizing Score

SDQ_SDQ_Generating_Impact Generating Impact Scores

SDQ_SDQ_Hyperactivity Hyperactivity Scale

SDQ_SDQ_Internalizing Internalizing Score

SDQ_SDQ_Peer_Problems Peer Problems Scale

SDQ_SDQ_Prosocial Prosocial Scale

连续数据–介绍SDQ2

+-------+--------------------------+----------------------------+----------------------------+-----------------------+
|       | SDQ_SDQ_Conduct_Problems | SDQ_SDQ_Difficulties_Total | SDQ_SDQ_Emotional_Problems | SDQ_SDQ_Externalizing |
+-------+--------------------------+----------------------------+----------------------------+-----------------------+
| count |          1213.0          |           1213.0           |           1213.0           |        1213.0         |
| mean  |          2.059           |           12.123           |           2.308            |         7.557         |
|  std  |          2.023           |           6.577            |           2.168            |         4.167         |
|  min  |           0.0            |            0.0             |            0.0             |          0.0          |
|  25%  |           0.0            |            7.0             |            1.0             |          4.0          |
|  50%  |           2.0            |            12.0            |            2.0             |          7.0          |
|  75%  |           3.0            |            17.0            |            4.0             |         10.0          |
|  max  |           10.0           |            34.0            |            10.0            |         20.0          |
+-------+--------------------------+----------------------------+----------------------------+-----------------------+
+-------+---------------------------+-----------------------+-----------------------+-----------------------+-------------------+
|       | SDQ_SDQ_Generating_Impact | SDQ_SDQ_Hyperactivity | SDQ_SDQ_Internalizing | SDQ_SDQ_Peer_Problems | SDQ_SDQ_Prosocial |
+-------+---------------------------+-----------------------+-----------------------+-----------------------+-------------------+
| count |          1213.0           |        1213.0         |        1213.0         |        1213.0         |      1213.0       |
| mean  |           4.073           |         5.498         |         4.566         |         2.258         |       7.683       |
|  std  |           2.82            |         2.837         |         3.52          |         2.09          |       2.19        |
|  min  |            0.0            |          0.0          |          0.0          |          0.0          |        0.0        |
|  25%  |            2.0            |          4.0          |          2.0          |          0.0          |        6.0        |
|  50%  |            4.0            |          6.0          |          4.0          |          2.0          |        8.0        |
|  75%  |            6.0            |          8.0          |          7.0          |          4.0          |       10.0        |
|  max  |           10.0            |         10.0          |         17.0          |          9.0          |       10.0        |
+-------+---------------------------+-----------------------+-----------------------+-----------------------+-------------------+