/Users/zhaochenxi/Documents/PrivateApplications/WiDS/Backgrounds
我们将分析来自健康大脑网络(HBN)的诊断数据、社会人口统计数据、情感数据和育儿数据,以及功能性磁共振成像(fMRI)数据。采用社区推荐的招募模式。
功能性磁共振成像数据用于提取每个大脑区域的活动时间序列,并将这些区域的时间序列进行相关分析,以获得功能性磁共振成像连接组矩阵。
目标:
是建立一个模型,以预测个体的性别和ADHD诊断,使用儿童和青少年的功能性脑成像数据以及他们的社会人口统计、情感和育儿信息。
训练文件夹 train_tsv
包含关于1200多名受试者的三种类型的信息。它们是:
温馨提示:参与者需要处理分类数据(可能需要创建虚拟变量),然后将处理后的数据集与功能连接组数据集结合,以创建最终的训练数据集,用于他们的模型。
测试文件夹 test_tsv
包含300多名受试者的未见数据框。这些数据框包括:
a) 功能性磁共振成像连接组矩阵
b) 社会人口统计信息、情感和育儿信息
您需要提交一个解决方案文件,包含测试数据集中每一行的ADHD诊断类型和性别。
您还将获得一个为提交准备的示例解决方案文件。
本次数据竞赛的任务可以在不使用外部数据的情况下成功完成。
事实上,我们对数据的匿名化程度使得将额外数据与竞赛数据结合变得困难。
ADHD_Outcome
: Type of Diagnosis (0=Other/None, 1=ADHD)
Sex_F
: Sex of participant (0=Male, 1=Female)
You will also be provided a full data dictionary
Data Dictionary.xlsx
文件中包含五个工作表:
/Users/zhaochenxi/Documents/PrivateApplications/WiDS/Backgrounds
数据集 Dictionary 的行数为37,列数为6
-------------------------------------------------
数据集 Instrument_Description 的行数为8,列数为2
-------------------------------------------------
数据集 EHQ 的行数为16,列数为2
-------------------------------------------------
数据集 APQ 的行数为43,列数为2
-------------------------------------------------
数据集 SDQ 的行数为34,列数为2
-------------------------------------------------
TRAIN_CATEGORICAL_METADATA.xlsx 文件中包含了分类数据的元数据,如下所示:
EHQ_EHQ_Total | ColorVision_CV_Score | APQ_P_APQ_P_CP | APQ_P_APQ_P_ID | APQ_P_APQ_P_INV | APQ_P_APQ_P_OPD | APQ_P_APQ_P_PM | APQ_P_APQ_P_PP | SDQ_SDQ_Conduct_Problems | SDQ_SDQ_Difficulties_Total | SDQ_SDQ_Emotional_Problems | SDQ_SDQ_Externalizing | SDQ_SDQ_Generating_Impact | SDQ_SDQ_Hyperactivity | SDQ_SDQ_Internalizing | SDQ_SDQ_Peer_Problems | SDQ_SDQ_Prosocial | MRI_Track_Age_at_Scan | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
participant_id | ||||||||||||||||||
UmrK0vMLopoR | 40.00 | 13 | 3 | 10 | 47 | 13 | 11 | 28 | 0 | 6 | 1 | 5 | 0 | 5 | 1 | 0 | 10 | NaN |
CPaeQkhcjg7d | -94.47 | 14 | 3 | 13 | 34 | 18 | 23 | 30 | 0 | 18 | 6 | 8 | 7 | 8 | 10 | 4 | 5 | NaN |
Nb4EetVPm3gs | -46.67 | 14 | 4 | 10 | 35 | 16 | 10 | 29 | 1 | 14 | 2 | 8 | 5 | 7 | 6 | 4 | 9 | 8.239904 |
p4vPhVu91o4b | -26.68 | 10 | 5 | 12 | 39 | 19 | 16 | 28 | 6 | 24 | 4 | 16 | 9 | 10 | 8 | 4 | 6 | NaN |
M09PXs7arQ5E | 0.00 | 14 | 5 | 15 | 40 | 20 | 24 | 28 | 1 | 18 | 4 | 11 | 4 | 10 | 7 | 3 | 9 | 8.940679 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
Atx7oub96GXS | 87.80 | 14 | 5 | 14 | 39 | 20 | 15 | 21 | 1 | 9 | 2 | 7 | 3 | 6 | 2 | 0 | 9 | 10.697923 |
groSbUfkQngM | 77.80 | 14 | 3 | 10 | 32 | 20 | 11 | 23 | 6 | 18 | 3 | 11 | 9 | 5 | 7 | 4 | 7 | 13.964750 |
zmxGvIrOD0bt | 16.68 | 14 | 3 | 16 | 28 | 15 | 19 | 27 | 3 | 4 | 1 | 3 | 0 | 0 | 1 | 0 | 10 | NaN |
rOmWFuJCud5G | 53.40 | 14 | 3 | 14 | 34 | 18 | 23 | 23 | 4 | 9 | 0 | 9 | 3 | 5 | 0 | 0 | 9 | 12.089094 |
k8HhHnnu2wmt | -57.80 | 14 | 5 | 11 | 42 | 18 | 16 | 24 | 3 | 12 | 0 | 12 | 6 | 9 | 0 | 0 | 8 | 12.595710 |
1213 rows × 18 columns
表格一共包含1213个被试,10种测量分数
-----------------------------------------------------------------------
Index(['participant_id', 'Basic_Demos_Enroll_Year', 'Basic_Demos_Study_Site',
'PreInt_Demos_Fam_Child_Ethnicity', 'PreInt_Demos_Fam_Child_Race',
'MRI_Track_Scan_Location', 'Barratt_Barratt_P1_Edu',
'Barratt_Barratt_P1_Occ', 'Barratt_Barratt_P2_Edu',
'Barratt_Barratt_P2_Occ'],
dtype='object')
1=Staten Island 2=MRV 3=Midtown 4=Harlem 5=SI RUMC
+---+-------------------------+------------------------+
| | Basic_Demos_Enroll_Year | Basic_Demos_Study_Site |
+---+-------------------------+------------------------+
| 0 | 2016 | 1 |
| 1 | 2019 | 3 |
| 2 | 2016 | 1 |
| 3 | 2018 | 3 |
| 4 | 2019 | 3 |
+---+-------------------------+------------------------+
PreInt_Demos_Fam_Child_Ethnicity: 0= Not Hispanic or Latino,1= Hispanic or Latino, 2= Decline to specify, 3= Unknown
PreInt_Demos_Fam_Child_Race 0= White/Caucasian, 1= Black/African American,2= Hispanic,3= Asian,4= Indian, 5= Native American Indian,6= American Indian/Alaskan Native, 7= Native Hawaiian/Other Pacific Islander,8= Two or more,9= Other race,10= Unknown,11=Choose not to specify
+---+----------------------------------+-----------------------------+
| | PreInt_Demos_Fam_Child_Ethnicity | PreInt_Demos_Fam_Child_Race |
+---+----------------------------------+-----------------------------+
| 0 | 0.0 | 0.0 |
| 1 | 1.0 | 2.0 |
| 2 | 1.0 | 8.0 |
| 3 | 0.0 | 8.0 |
| 4 | 0.0 | 1.0 |
+---+----------------------------------+-----------------------------+
1=Staten Island, 2=RUBIC,3=CBIC,4=CUNY
+---+-------------------------+
| | MRI_Track_Scan_Location |
+---+-------------------------+
| 0 | 1 |
| 1 | 3 |
| 2 | 1 |
| 3 | 3 |
| 4 | 3 |
+---+-------------------------+
Hollingshead 根据婚姻状况、退休/就业状况(退休人员使用其最后职业)、教育程度和职业声望设计了一种简单的社会地位衡量方法。可以代表社会经济地位。
+----+------------------------+------------------------------------------------------------------------+
| | Field | Labels |
+====+========================+========================================================================+
| 28 | Barratt_Barratt_P1_Edu | 3=Less than 7th grade 6=Junior high/Middle school (9th grade) |
| | | 9=Partial high school (10th or 11th grade) 12=High school graduate |
| | | 15=Partial college (at least one year) 18=College education |
| | | 21=Graduate degree |
+----+------------------------+------------------------------------------------------------------------+
| 29 | Barratt_Barratt_P1_Occ | 0=Homemaker, stay at home parent. 5=Day laborer, janitor, house |
| | | cleaner, farm worker, food counter sales, food preparation worker, |
| | | busboy. 10=Garbage collector, short-order cook, cab driver, shoe |
| | | sales, assembly line workers, masons, baggage porter. 15=Painter, |
| | | skilled construction trade, sales clerk, truck driver, cook, sales |
| | | counter or general office clerk. 20=Automobile mechanic, typist, |
| | | locksmith, farmer, carpenter, receptionist, construction laborer, |
| | | hairdresser. 25=Machinist, musician, bookkeeper, secretary, insurance |
| | | sales, cabinet maker, personnel specialist, welder. 30=Supervisor, |
| | | librarian, aircraft mechanic, artist and artisan, electrician, |
| | | administrator, military enlisted personnel, buyer. 35=Nurse, skilled |
| | | technician, medical technician, counselor, manager, police and fire |
| | | personnel, financial manager, physical, occupational, speech |
| | | therapist. 40=Mechanical, nuclear, and electrical engineer, |
| | | educational administrator, veterinarian, military officer, elementary, |
| | | high school and special education teacher. 45=Physician, attorney, |
| | | professor, chemical and aerospace engineer, judge, CEO, senior |
| | | manager, public official, psychologist, pharmacist, accountant. |
+----+------------------------+------------------------------------------------------------------------+
| 30 | Barratt_Barratt_P2_Edu | 3=Less than 7th grade 6=Junior high/Middle school (9th grade) |
| | | 9=Partial high school (10th or 11th grade) 12=High school graduate |
| | | 15=Partial college (at least one year) 18=College education |
| | | 21=Graduate degree |
+----+------------------------+------------------------------------------------------------------------+
| 31 | Barratt_Barratt_P2_Occ | 0=Homemaker, stay at home parent. 5=Day laborer, janitor, house |
| | | cleaner, farm worker, food counter sales, food preparation worker, |
| | | busboy. 10=Garbage collector, short-order cook, cab driver, shoe |
| | | sales, assembly line workers, masons, baggage porter. 15=Painter, |
| | | skilled construction trade, sales clerk, truck driver, cook, sales |
| | | counter or general office clerk. 20=Automobile mechanic, typist, |
| | | locksmith, farmer, carpenter, receptionist, construction laborer, |
| | | hairdresser. 25=Machinist, musician, bookkeeper, secretary, insurance |
| | | sales, cabinet maker, personnel specialist, welder. 30=Supervisor, |
| | | librarian, aircraft mechanic, artist and artisan, electrician, |
| | | administrator, military enlisted personnel, buyer. 35=Nurse, skilled |
| | | technician, medical technician, counselor, manager, police and fire |
| | | personnel, financial manager, physical, occupational, speech |
| | | therapist. 40=Mechanical, nuclear, and electrical engineer, |
| | | educational administrator, veterinarian, military officer, elementary, |
| | | high school and special education teacher. 45=Physician, attorney, |
| | | professor, chemical and aerospace engineer, judge, CEO, senior |
| | | manager, public official, psychologist, pharmacist, accountant. |
+----+------------------------+------------------------------------------------------------------------+
+---+------------------------+------------------------+------------------------+------------------------+
| | Barratt_Barratt_P1_Edu | Barratt_Barratt_P1_Occ | Barratt_Barratt_P2_Edu | Barratt_Barratt_P2_Occ |
+---+------------------------+------------------------+------------------------+------------------------+
| 0 | 21 | 45 | 21 | 45 |
| 1 | 15 | 15 | 0 | 0 |
| 2 | 18 | 40 | 0 | 0 |
| 3 | 15 | 30 | 18 | 0 |
| 4 | 15 | 20 | 0 | 0 |
+---+------------------------+------------------------+------------------------+------------------------+
存储在:TRAIN/TRAINING_SOLUTIONS.xlsx 文件中
ADHD_Outcome | Sex_F | |
---|---|---|
participant_id | ||
UmrK0vMLopoR | 1 | 1 |
CPaeQkhcjg7d | 1 | 0 |
Nb4EetVPm3gs | 1 | 0 |
p4vPhVu91o4b | 1 | 1 |
M09PXs7arQ5E | 1 | 1 |
... | ... | ... |
Atx7oub96GXS | 0 | 0 |
groSbUfkQngM | 0 | 1 |
zmxGvIrOD0bt | 0 | 1 |
rOmWFuJCud5G | 0 | 0 |
k8HhHnnu2wmt | 0 | 0 |
1213 rows × 2 columns
+---+----------------+--------------+-------+
| | participant_id | ADHD_Outcome | Sex_F |
+---+----------------+--------------+-------+
| 0 | UmrK0vMLopoR | 1 | 1 |
| 1 | CPaeQkhcjg7d | 1 | 0 |
| 2 | Nb4EetVPm3gs | 1 | 0 |
| 3 | p4vPhVu91o4b | 1 | 1 |
| 4 | M09PXs7arQ5E | 1 | 1 |
+---+----------------+--------------+-------+
df_quantitative基本信息:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1213 entries, 0 to 1212
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 participant_id 1213 non-null object
1 EHQ_EHQ_Total 1213 non-null float64
2 ColorVision_CV_Score 1213 non-null int64
3 APQ_P_APQ_P_CP 1213 non-null int64
4 APQ_P_APQ_P_ID 1213 non-null int64
5 APQ_P_APQ_P_INV 1213 non-null int64
6 APQ_P_APQ_P_OPD 1213 non-null int64
7 APQ_P_APQ_P_PM 1213 non-null int64
8 APQ_P_APQ_P_PP 1213 non-null int64
9 SDQ_SDQ_Conduct_Problems 1213 non-null int64
10 SDQ_SDQ_Difficulties_Total 1213 non-null int64
11 SDQ_SDQ_Emotional_Problems 1213 non-null int64
12 SDQ_SDQ_Externalizing 1213 non-null int64
13 SDQ_SDQ_Generating_Impact 1213 non-null int64
14 SDQ_SDQ_Hyperactivity 1213 non-null int64
15 SDQ_SDQ_Internalizing 1213 non-null int64
16 SDQ_SDQ_Peer_Problems 1213 non-null int64
17 SDQ_SDQ_Prosocial 1213 non-null int64
18 MRI_Track_Age_at_Scan 853 non-null float64
dtypes: float64(2), int64(16), object(1)
memory usage: 180.2+ KB
None
表格一共包含1213个被试,19种测量分数
EHQ_EHQ_Total
Edinburgh Handedness Questionnaire
-100 = 10th left; −28 ≤ LI < 48 = middle; 100 = 10th right 爱丁堡惯用手量表是一种测量量表,用于评估一个人在日常活动中右手或左手的主导地位。
该量表可由观察者对个人进行评估,也可由个人自我报告手的使用情况。
ColorVision_CV_Score Ishihara Color Vision Test
石原色觉测试是一种针对红绿色彩缺陷的色觉测试。测试由 24 块平板组成,平板上有一圈不同大小和颜色的点。图案中的圆点组成一个数字,色觉正常的人可以看到这个数字,而有红绿色盲的人则看不到或很难看到这个数字。
APQ_Parenting_Score Alabama Parenting Questionnaire - Parent Report 《阿拉巴马州养育问卷》是一份包含 42 个项目的问卷, 用于测量与儿童外化问题的病因和治疗相关的五个养育领域:积极的参与、监督/监测、使用积极的管教方式、使用此类管教方式的一致性以及使用体罚。
APQ_P_APQ_P_CP Corporal Punishment Score
APQ_P_APQ_P_ID Inconsistent Discipline Score
APQ_P_APQ_P_INV Involvement Score
APQ_P_APQ_P_OPD Other Discipline Practices Score (Not factored into total score but provides item level information)
APQ_P_APQ_P_PM Poor Monitoring/Supervision Score
APQ_P_APQ_P_PP Positive Parenting Score
APQ_P_APQ_P_CP | APQ_P_APQ_P_ID | APQ_P_APQ_P_INV | APQ_P_APQ_P_OPD | APQ_P_APQ_P_PM | APQ_P_APQ_P_PP | |
---|---|---|---|---|---|---|
count | 1213.000000 | 1213.000000 | 1213.000000 | 1213.000000 | 1213.000000 | 1213.000000 |
mean | 3.781533 | 13.205276 | 39.374279 | 17.785655 | 16.393240 | 25.246496 |
std | 1.376700 | 3.811772 | 6.245928 | 3.764112 | 5.376994 | 3.950529 |
min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 3.000000 | 11.000000 | 36.000000 | 16.000000 | 13.000000 | 23.000000 |
50% | 3.000000 | 13.000000 | 40.000000 | 18.000000 | 16.000000 | 26.000000 |
75% | 4.000000 | 16.000000 | 43.000000 | 20.000000 | 19.000000 | 28.000000 |
max | 12.000000 | 28.000000 | 50.000000 | 28.000000 | 37.000000 | 30.000000 |
SDQ* Strengths and Difficulties Questionnaire 优势与困难问卷是一份简短的行为筛查问卷,共有 25 个心理属性项目,分为 5 个量表。 扩展版包括一个影响补充项目,询问照顾者被试是否遇到过问题,以及问题的长期性、痛苦程度、社交障碍和对他人造成的负担。
SDQ_SDQ_Conduct_Problems Conduct problems scale
SDQ_SDQ_Difficulties_Total Total Difficulties Score
SDQ_SDQ_Emotional_Problems Emotional Problems Scale
SDQ_SDQ_Externalizing Externalizing Score
SDQ_SDQ_Generating_Impact Generating Impact Scores
SDQ_SDQ_Hyperactivity Hyperactivity Scale
SDQ_SDQ_Internalizing Internalizing Score
SDQ_SDQ_Peer_Problems Peer Problems Scale
SDQ_SDQ_Prosocial Prosocial Scale
+-------+--------------------------+----------------------------+----------------------------+-----------------------+
| | SDQ_SDQ_Conduct_Problems | SDQ_SDQ_Difficulties_Total | SDQ_SDQ_Emotional_Problems | SDQ_SDQ_Externalizing |
+-------+--------------------------+----------------------------+----------------------------+-----------------------+
| count | 1213.0 | 1213.0 | 1213.0 | 1213.0 |
| mean | 2.059 | 12.123 | 2.308 | 7.557 |
| std | 2.023 | 6.577 | 2.168 | 4.167 |
| min | 0.0 | 0.0 | 0.0 | 0.0 |
| 25% | 0.0 | 7.0 | 1.0 | 4.0 |
| 50% | 2.0 | 12.0 | 2.0 | 7.0 |
| 75% | 3.0 | 17.0 | 4.0 | 10.0 |
| max | 10.0 | 34.0 | 10.0 | 20.0 |
+-------+--------------------------+----------------------------+----------------------------+-----------------------+
+-------+---------------------------+-----------------------+-----------------------+-----------------------+-------------------+
| | SDQ_SDQ_Generating_Impact | SDQ_SDQ_Hyperactivity | SDQ_SDQ_Internalizing | SDQ_SDQ_Peer_Problems | SDQ_SDQ_Prosocial |
+-------+---------------------------+-----------------------+-----------------------+-----------------------+-------------------+
| count | 1213.0 | 1213.0 | 1213.0 | 1213.0 | 1213.0 |
| mean | 4.073 | 5.498 | 4.566 | 2.258 | 7.683 |
| std | 2.82 | 2.837 | 3.52 | 2.09 | 2.19 |
| min | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
| 25% | 2.0 | 4.0 | 2.0 | 0.0 | 6.0 |
| 50% | 4.0 | 6.0 | 4.0 | 2.0 | 8.0 |
| 75% | 6.0 | 8.0 | 7.0 | 4.0 | 10.0 |
| max | 10.0 | 10.0 | 17.0 | 9.0 | 10.0 |
+-------+---------------------------+-----------------------+-----------------------+-----------------------+-------------------+