Analyzing Parental Impact on FTO Gene Variation in the Framingham Study

This instruction outlines the examination of parental influence on FTO Gene variation within the Framingham cohort.

Before proceeding, ensure you have two vcf files available, generated according to the instructions in scripts/get_fto_vcf.md. Additionally, specify the working directory as work/larylab/Mao_Ding/Van_Andel_epigenetics.

Execute scripts/extract_snp.R to filter 10 SNPs information:

1.1 Look up the 10 SNPs’ coordinates from UCSC Genome Browser(Use the GRCh37/hg19 version).

1.2 Input Chromosome_16 marker file from /work/larylab/dbgap/data3/phg000835.v5.FHS_SHARE_imputed_HRC1.marker-info.MULTI/chr16.info, then it generates an output file results/chr16_snp_filter.rds

Execute scripts/extract_genotype.R to extract genotype for individuals and their parents:

2.1 Read two vcf files from results/fto_c1.recode.vcf and results/fto_c2.recode.vcf

2.2 Read results/chr16_snp_filter.rds and generate results/fto_c1_vcfr_tidy.RData, results/fto_c2_vcfr_tidy.RData, results/fto_c1_vcfr_tidy_filtered.RData,results/fto_c2_vcfr_tidy_filtered.RData, results/fto_c1_vcfr_tidy_filtered_genotype.RData, results/fto_c2_vcfr_tidy_filtered_genotype.RData

2.3 Input results/fto_c1_vcfr_tidy_filtered_genotype.RData and results/fto_c2_vcfr_tidy_filtered_genotype.RData generated above to create individual genotype file results/genotype.rds

2.4 Read pedigree file data/share_ped_052517.csv, then merge father and mother genotype to generate a comprehensive genotype file results/geno_pedi_f_m.rds

2.5 Save the current workspace as results/geno_pedi_f_m.RData

Execute scripts/extract_phenotype.R to extract phenotype for individuals and their parents:

3.1 From dbgap/data9, read 4 wkthru files: ex09, ex03, ex32, ex04

3.2 Merge phenotype together to create all individual phenotype results/phenotype.rds

3.3 Merge pedigree file data/share_ped_052517.csv and phenotype file generated above results/phenotype.rds to create phenotype file for individuals and their parents results/pheno_pedi_f_m.rds

3.4 Save the current workspace as results/pheno_pedi_f_m.rds

Execute scripts/merge_geno_pheno_snp.R to merge all individuals and their parents’ genotype, phenotype and SNP information

4.1 Read vr_dates files from dbgap/data9, save it as results/vr_dates.rds

4.2 Merge results/geno_pedi_f_m.rds and results/pheno_pedi_f_m.rds to form results/geno_pheno_pedi_f_m.rds

4.3 Combine results/ch16_snp_filter.rds to generate a comprehensive summary including genotype, phenotype, and information on 10 SNPs for all individuals and their parents, saved as results/geno_pheno_pedi_f_m_snp.rds.

4.4 Save the cohort as results/cohort.rds

4.5 Save the current workspace as results/geno_pheno_pedi_f_m_snp.RData

Execute scripts/standard_association_test.R to perform linear model without considering the parent-of-origin effect.

5.1 Read the file results/geno_pheno_pedi_f_m.rds, extract individual observations (excluding parental information), encode categorical variables, apply a log transformation to BMI, and create a ready-to-use file named results/fto_geno_pheno.rds.

5.2 Generate a list of dataframes named fto_geno_pheno, where each dataframe represents the data from one SNP. Specify the reference allele for each SNP referring to results/chr16_snp_filter.rds. Finally, save the list as results/fto_geno_pheno_list.rds.

5.3 Iterate through the list to conduct a standard association test by fitting a linear model and performing a Tukey HSD test for each SNP.

5.4 This procedure generates three files: results/anova_df_indiv.rds, results/summary_df_indiv.rds, results/tukey_df_indiv.rds.

Execute scripts/paternal_association_test.R to conduct association tests for paternal and maternal alleles separately.

6.1 Read results/geno_pheno_pedi_f_m.rds, confirm that the first allele comes from father, the second allele comes from mother.

6.2 Read results/fto_geno_pheno.rds to perform parental association by fitting linear model.

6.3 This process generates one summary table and one ANOVA table for paternal tests (results/summary_df_pa.rds and results/anova_df_pa.rds, respectively), and similarly, one summary table and one ANOVA table for maternal tests (results/summary_df_ma.rds and results/anova_df_ma.rds, respectively).

Execute scripts/Framingham_FTO_Analysis.Rmd to present the results

7.1 Read results/cohort.rds to display histogram of BMI and log_BMI respectively, .

7.2 Print and save the characteristics table as results/cohort_characteristic.docx.

7.3 Read results/anova_df_indiv.rds, results/summary_df_indiv.rds and results/tukey_df_indiv.rds to present standard association test results.

7.4 Read results/summary_df_pa.rds and results/anova_df_pa.rds to present paternal association test.

7.5 Read results/summary_df_ma.rds and results/anova_df_ma.rds to present maternal association test.

7.6 Generate a html file scripts/Framingham_FTO_Analysis.html.