Type ID Version Number of files Size in bytes (total) Summary
samocha_enrichment_background enrichment/samocha_background 0 4 1.38 MB ## Samocha's Enrichment Background Model From Samocha KE, et al. A framework for the interpretation of de novo mutation in human disease. Nat Genet. 2014 Sep;46(9):944-50. doi: 10.1038/ng.3050. Epub 2014 Aug 3. PMID: 25086666; PMCID: PMC4222185.
gene_score gene_properties/gene_scores/Iossifov_Wigler_PNAS_2015 0 6 7.8 MB Probability for gene associations with autims computed based on the genes vulnerability to damaging coding mutation and the load of damaging de novo mutations in individuals with autism.
gene_score gene_properties/gene_scores/LGD 0 7 533.06 KB Gene vulnerability/intollerance score based on the rare LGD variants.
gene_score gene_properties/gene_scores/RVIS 0 7 451.29 KB The Residual Variation Intolerance Score (RVIS) gene intollerance score.
gene_score gene_properties/gene_scores/SFARI_gene_score 0 6 124.38 KB
gene_score gene_properties/gene_scores/Satterstrom_Buxbaum_Cell_2020 0 6 613.41 KB TADA derived gene autism association score.
gene_score gene_properties/gene_scores/pLI 0 7 893.56 KB
gene_score gene_properties/gene_scores/pRec 0 7 895.39 KB
gene_set gene_properties/gene_sets/GO 0 4 12.07 MB
gene_set gene_properties/gene_sets/MSigDB_curated 0 3 3.12 MB
gene_set gene_properties/gene_sets/autism 0 7 4.37 KB
gene_set gene_properties/gene_sets/disease 0 3 66.76 KB
gene_set gene_properties/gene_sets/domain 0 3 391.19 KB
gene_set gene_properties/gene_sets/main 0 17 103.89 KB
gene_set gene_properties/gene_sets/miRNA 0 4 1.6 MB
gene_set gene_properties/gene_sets/miRNA_Darnell 0 3 95.15 KB
gene_set gene_properties/gene_sets/relevant 0 15 101.75 KB
gene_set gene_properties/gene_sets/sfari 0 11 11.53 KB
gene_set gene_properties/gene_sets/spark 0 5 1.22 KB
gene_score hg19/enrichment/coding_length_in_target_ref_gene_v20190211 0 5 106.76 KB Coding length in target enrichment background using refGene gene models for HG19 from 20190211. Target regions are from the SSC WES study.
gene_score hg19/enrichment/coding_length_ref_gene_v20190211 0 5 157.76 KB Coding length enrichment background using refGene gene models for HG19 from 20190211.
gene_models hg19/gene_models/ccds_v201309 0 5 2.56 MB
gene_models hg19/gene_models/knownGene_v201304 0 5 5.49 MB
gene_models hg19/gene_models/refGeneMito_v201309 0 5 3.98 MB
gene_models hg19/gene_models/refGene_v201309 0 5 3.97 MB
gene_models hg19/gene_models/refGene_v20190211 0 5 5.47 MB
genome hg19/genomes/GATK_ResourceBundle_5777_b37_phiX174 0 93 2.94 GB ## HG19 Reference Genome Default HG19 reference genome used by GPF
np_score hg19/scores/CADD 0 10 79.37 GB CADD score for functional prediction of a SNP. Please refer to Kircher et al. (2014) Nature Genetics 46(3):310-5 for details. The larger the score the more likely the SNP has damaging effect.
position_score hg19/scores/FitCons-i6-merged 0 8 105.17 MB fitCons score predicts the fraction of genomic positions belonging to a specific function class (defined by epigenomic "fingerprint") that are under selective pressure. Scores range from 0 to 1, with a larger score indicating a higher proportion of nucleic sites of the functional class the genomic position belong to are under selective pressure, therefore more likely to be functional important. Integrated (i6) scores are integrated across three cell types (GM12878, H1-hESC and HUVEC). More details can be found in doi:10.1038/ng.3196.
position_score hg19/scores/FitCons2_E035 0 8 291.52 MB FitCons2 score computed for the Primary haematopoietic stem cells (HSCs) (E035).
position_score hg19/scores/FitCons2_E067 0 8 260.97 MB FitCons2 score computed for the Brain Angular Gyrus (E067) tissue.
position_score hg19/scores/FitCons2_E068 0 8 270.23 MB FitCons2 score computed for the Brain Anterior Caudate (E068) tissue.
position_score hg19/scores/FitCons2_E069 0 8 262.13 MB FitCons2 score computed for the Brain Cingulate Gyrus (E069) tissue.
position_score hg19/scores/FitCons2_E070 0 8 262.32 MB FitCons2 score computed for the Brain Germinal Matrix (E070) tissue.
position_score hg19/scores/FitCons2_E071 0 8 255.46 MB FitCons2 score computed for the Brain Hippocampus Middle (E071) tissue.
position_score hg19/scores/FitCons2_E072 0 8 257.61 MB FitCons2 score computed for the Brain Inferior Temporal Lobe (E072) tissue.
position_score hg19/scores/FitCons2_E073 0 8 266.95 MB FitCons2 score computed for the Brain Dorsolateral Prefrontal Cortex (E073) tissue.
position_score hg19/scores/FitCons2_E074 0 8 262.12 MB FitCons2 score computed for the Brain Substantia Nigra (E074) tissue.
position_score hg19/scores/FitCons2_E081 0 8 276.04 MB FitCons2 score computed for the Fetal Brain Male (E081) tissue.
position_score hg19/scores/FitCons2_E082 0 8 278.88 MB FitCons2 score computed for the Fetal Brain Female (E082) tissue.
position_score hg19/scores/Linsight 0 8 1.35 GB LINSIGHT improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which, therefore, are likely to be phenotypically important. LINSIGHT combines a generalized linear model for functional genomic data with a probabilistic model of molecular evolution. Huang, YF., Gulko, B. & Siepel, A. Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data. *Nat Genet* **49**, 618-624 (2017). https://doi.org/10.1038/ng.3810
np_score hg19/scores/MPC 0 9 2.26 GB A deleteriousness prediction score for missense variants based on regional missense constraint. The range of MPC score is 0 to 5. The larger the score, the more likely the variant is pathogenic. Given increasing numbers of patients who are undergoing exome or genome sequencing, it is critical to establish tools and methods to interpret the impact of genetic variation. While the ability to predict deleteriousness for any given variant is limited, missense variants remain a particularly challenging class of variation to interpret, since they can have drastically different effects depending on both the precise location and specific amino acid substitution of the variant. In order to better evaluate missense variation, we leveraged the exome sequencing data of 60,706 individuals from the Exome Aggregation Consortium (ExAC) dataset to identify sub-genic regions that are depleted of missense variation. We further used this depletion as part of a novel missense deleteriousness metric named MPC. We applied MPC to de novo missense variants and identified a category of de novo missense variants with the same impact on neurodevelopmental disorders as truncating mutations in intolerant genes, supporting the value of incorporating regional missense constraint in variant interpretation. Details see doi: http://dx.doi.org/10.1101/148353.
position_score hg19/scores/phastCons46_placentals 0 8 10.55 GB phastCons46_placentals is a conservation score based on the placental mammal subset of species. The larger the score, the more conserved the site. Scores range from 0 to 1.
position_score hg19/scores/phastCons46_primates 0 8 14.02 GB phastCons46_primates is a conservation score based on the primates subset of species. The larger the score, the more conserved the site. Scores range from 0 to 1.
position_score hg19/scores/phastCons46_vertebrates 0 8 10.81 GB phastCons46_vertebrates is a conservation score based on a multiple alignments of 45 vertebrate genomes to the human genome. The larger the score, the more conserved the site. Scores range from 0 to 1.
position_score hg19/scores/phyloP46_placentals 0 8 14.62 GB phyloP (phylogenetic p-values) conservation score based on the multiple alignments of the placental mammal species in the phyloP46way alignment. The higher the more conservative.
position_score hg19/scores/phyloP46_primates 0 8 10.81 GB phyloP (phylogenetic p-values) conservation score based on the multiple alignments of the primate species in the phyloP46way alignment. The higher the more conservative.
position_score hg19/scores/phyloP46_vertebrates 0 8 14.72 GB phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 45 vertebrate genomes to the human genome. The higher the more conservative.
allele_score hg19/variant_frequencies/gnomAD_v2.1.1/exomes 0 32 959.44 MB gnomAD exomes v2.1.1 variants build from ~260,000 whole exome samples published by the Broad Institute.
allele_score hg19/variant_frequencies/gnomAD_v2.1.1/genomes 0 31 12.27 GB gnomAD genomes v2.1.1 variants build from ~32,000 whole genome sequencing samples published by the Broad Institute.
gene_score hg38/enrichment/coding_length_ref_gene_v20170601 0 5 154.79 KB Coding length enrichment background using refGene gene models for HG38 from 20170601
gene_score hg38/enrichment/ur_synonymous_AGRE_WG38_859 0 5 168.8 KB Ultra rare synonymous enrichment background build from AGRE WGS CSHL.
gene_score hg38/enrichment/ur_synonymous_SFARI_SSC_WGS_2 0 5 180.76 KB Ultra rare synonymous enrichment background build from SFARI SSC WGS NYGC.
gene_score hg38/enrichment/ur_synonymous_SFARI_SSC_WGS_CSHL 0 5 180.24 KB Ultra rare synonymous enrichment background build from SFARI SSC WGS NYGC.
gene_score hg38/enrichment/ur_synonymous_iWES_v1_1 0 5 186.89 KB Ultra rare synonymous enrichment background build from SPARK iWES v1.1.
gene_score hg38/enrichment/ur_synonymous_iWES_v2 0 5 190.01 KB Ultra rare synonymous enrichment background build from SPARK iWES v2.
gene_score hg38/enrichment/ur_synonymous_iWGS_v1_1 0 5 187.1 KB Ultra rare synonymous enrichment background build from SPARK iWGS v1.1.
gene_score hg38/enrichment/ur_synonymous_w1202s766e611_liftover 0 5 172.46 KB Ultra rare synonymous enrichment background build from SFARI SSC WES CSHL liftover.
gene_models hg38/gene_models/refGene_v20170601 0 5 5.42 MB ## refSeq gene models for HG38 from 20170601
gene_models hg38/gene_models/refSeq_v20200330 0 5 4.15 MB ## refSeq gene models for HG38 from 2020-03 Default gene models used by GPF for HG38.
genome hg38/genomes/GRCh38-hg38 0 3375 3.04 GB ## HG38 Reference Genome Default HG38 reference genome used by GPF
np_score hg38/scores/CADD_v1.4 0 18 79.42 GB CADD score for functional prediction of a SNP. Please refer to Kircher et al. (2014) Nature Genetics 46(3):310-5 for details. The larger the score the more likely the SNP has damaging effect.
np_score hg38/scores/CADD_v1.6 0 12 80.65 GB CADD score for functional prediction of a SNP. Please refer to Kircher et al. (2014) Nature Genetics 46(3):310-5 for details. The larger the score the more likely the SNP has damaging effect.
allele_score hg38/scores/clinvar_20221105 0 44 115.55 MB ClinVar resource downloaded on 2022-11-05. Chromosome names are remapped to have `chr` prefix. ClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication about the relationships asserted between human variation and observed health status, and the history of that interpretation. ClinVar processes submissions reporting variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other supporting data. The alleles described in submissions are mapped to reference sequences, and reported according to the HGVS standard. ClinVar then presents the data for interactive users as well as those wishing to use ClinVar in daily workflows and other local applications. ClinVar works in collaboration with interested organizations to meet the needs of the medical genetics community as efficiently and effectively as possible
position_score hg38/scores/phastCons100way 0 8 10.15 GB phastCons100way is a conservation score based on a multiple alignments of 99 vertebrate genomes to the human genome. The larger the score, the more conserved the site. Scores range from 0 to 1.
position_score hg38/scores/phastCons20way 0 8 13.6 GB phastCons20way is a conservation score based on a multiple alignments of 19 vertebrate genomes to the human genome. The larger the score, the more conserved the site. Scores range from 0 to 1.
position_score hg38/scores/phastCons30way 0 8 12.97 GB phastCons30way is a conservation score based on a multiple alignments of 29 genomes to the human genome. The larger the score, the more conserved the site. Scores range from 0 to 1.
position_score hg38/scores/phastCons7way 0 11 14.64 GB phastCons7way is a conservation score based on a multiple alignments of 6 vertebrate genomes to the human genome. The larger the score, the more conserved the site. Scores range from 0 to 1.
position_score hg38/scores/phyloP100way 0 8 16.1 GB phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 100 vertebrate genomes (including human).
position_score hg38/scores/phyloP20way 0 8 14.56 GB phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 19 genome sequences to the human genome. The higher the more conservative. Scores range from from -14.191 to 1.199.
position_score hg38/scores/phyloP30way 0 8 14.72 GB phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 29 genome sequences to the human genome. The higher the more conservative. Scores range from -20.000 to 1.312.
position_score hg38/scores/phyloP7way 0 8 12.54 GB phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 6 vertebrate genomes to the human genome. The higher the more conservative. Scores range from from -5.220 to 1.062.
allele_score hg38/variant_frequencies/SSC_WG38_CSHL_2380 0 12 783.71 MB TODO exported from SFARI_SSC_WGS_CSHL using `gpf_validation_data/data_hg38/exports/SFARI_SSC_WGS_CSHL_frequency` scripts
allele_score hg38/variant_frequencies/gnomAD_v2.1.1_liftover/exomes 0 35 928.26 MB Liftover of gnomAD exomes v2.1.1 to hg38 published by the Broad Institute.
allele_score hg38/variant_frequencies/gnomAD_v2.1.1_liftover/genomes 0 35 11.53 GB ## gnomAD genomes v2.1.1 liftover Original gnomAD genomes v2.1.1 liftover is downloaded on October 19, 2020 from https://gnomad.broadinstitute.org/. tabix -s 1 -b 2 -e 2 -f gnomad..r2.1.1.extract.tsv.gz
allele_score hg38/variant_frequencies/gnomAD_v3/genomes 0 18 17.64 GB gnomAD v3.0 variants built from ~150,000 samples with whole genome sequence data.
liftover_chain liftover/hg19ToHg38 0 6 450.19 KB ## Liftover Chain Hg19 to Hg38
liftover_chain liftover/hg38ToHg19 0 6 2.4 MB