Before proceeding, ensure you have two vcf files available, generated
according to the instructions in scripts/get_fto_vcf.md
.
Additionally, specify the working directory as
work/larylab/Mao_Ding/Van_Andel_epigenetics
.
scripts/extract_snp.R
to filter 10 SNPs
information:1.1 Look up the 10 SNPs’ coordinates from UCSC Genome Browser(Use the GRCh37/hg19 version).
1.2 Input Chromosome_16 marker file from
/work/larylab/dbgap/data3/phg000835.v5.FHS_SHARE_imputed_HRC1.marker-info.MULTI/chr16.info
,
then it generates an output file
results/chr16_snp_filter.rds
scripts/extract_genotype.R
to extract genotype
for individuals and their parents:2.1 Read two vcf files from results/fto_c1.recode.vcf
and results/fto_c2.recode.vcf
2.2 Read results/chr16_snp_filter.rds
and generate
results/fto_c1_vcfr_tidy.RData
,
results/fto_c2_vcfr_tidy.RData
,
results/fto_c1_vcfr_tidy_filtered.RData
,results/fto_c2_vcfr_tidy_filtered.RData
,
results/fto_c1_vcfr_tidy_filtered_genotype.RData
,
results/fto_c2_vcfr_tidy_filtered_genotype.RData
2.3 Input
results/fto_c1_vcfr_tidy_filtered_genotype.RData
and
results/fto_c2_vcfr_tidy_filtered_genotype.RData
generated
above to create individual genotype file
results/genotype.rds
2.4 Read pedigree file data/share_ped_052517.csv
, then
merge father and mother genotype to generate a comprehensive genotype
file results/geno_pedi_f_m.rds
2.5 Save the current workspace as
results/geno_pedi_f_m.RData
scripts/extract_phenotype.R
to extract
phenotype for individuals and their parents:3.1 From dbgap/data9
, read 4 wkthru files: ex09, ex03,
ex32, ex04
3.2 Merge phenotype together to create all individual phenotype
results/phenotype.rds
3.3 Merge pedigree file data/share_ped_052517.csv
and
phenotype file generated above results/phenotype.rds
to
create phenotype file for individuals and their parents
results/pheno_pedi_f_m.rds
3.4 Save the current workspace as
results/pheno_pedi_f_m.rds
scripts/merge_geno_pheno_snp.R
to merge all
individuals and their parents’ genotype, phenotype and SNP
information4.1 Read vr_dates files from dbgap/data9
, save it as
results/vr_dates.rds
4.2 Merge results/geno_pedi_f_m.rds
and
results/pheno_pedi_f_m.rds
to form
results/geno_pheno_pedi_f_m.rds
4.3 Combine results/ch16_snp_filter.rds
to generate a
comprehensive summary including genotype, phenotype, and information on
10 SNPs for all individuals and their parents, saved as
results/geno_pheno_pedi_f_m_snp.rds
.
4.4 Save the cohort as results/cohort.rds
4.5 Save the current workspace as
results/geno_pheno_pedi_f_m_snp.RData
scripts/standard_association_test.R
to perform
linear model without considering the parent-of-origin effect.5.1 Read the file results/geno_pheno_pedi_f_m.rds
,
extract individual observations (excluding parental information), encode
categorical variables, apply a log transformation to BMI, and create a
ready-to-use file named results/fto_geno_pheno.rds
.
5.2 Generate a list of dataframes named fto_geno_pheno, where each
dataframe represents the data from one SNP. Specify the reference allele
for each SNP referring to results/chr16_snp_filter.rds
.
Finally, save the list as
results/fto_geno_pheno_list.rds
.
5.3 Iterate through the list to conduct a standard association test by fitting a linear model and performing a Tukey HSD test for each SNP.
5.4 This procedure generates three files:
results/anova_df_indiv.rds
,
results/summary_df_indiv.rds
,
results/tukey_df_indiv.rds
.
scripts/paternal_association_test.R
to conduct
association tests for paternal and maternal alleles separately.6.1 Read results/geno_pheno_pedi_f_m.rds
, confirm that
the first allele comes from father, the second allele comes from
mother.
6.2 Read results/fto_geno_pheno.rds
to perform parental
association by fitting linear model.
6.3 This process generates one summary table and one ANOVA table for
paternal tests (results/summary_df_pa.rds
and
results/anova_df_pa.rds
, respectively), and similarly, one
summary table and one ANOVA table for maternal tests
(results/summary_df_ma.rds
and
results/anova_df_ma.rds
, respectively).
scripts/Framingham_FTO_Analysis.Rmd
to present
the results7.1 Read results/cohort.rds
to display histogram of BMI
and log_BMI respectively, .
7.2 Print and save the characteristics table as
results/cohort_characteristic.docx
.
7.3 Read results/anova_df_indiv.rds
,
results/summary_df_indiv.rds
and
results/tukey_df_indiv.rds
to present standard association
test results.
7.4 Read results/summary_df_pa.rds
and
results/anova_df_pa.rds
to present paternal association
test.
7.5 Read results/summary_df_ma.rds
and
results/anova_df_ma.rds
to present maternal association
test.
7.6 Generate a html file
scripts/Framingham_FTO_Analysis.html
.