Id: | gene_properties/gene_sets/PFAM_37.0_domains |
Type: | gene_set |
Version: | 0 |
Summary: |
PFAM 37.0 domains |
Description: |
Pfam (Protein Families) is a comprehensive database of protein families that includes their annotations and multiple sequence alignments generated using hidden Markov models (HMMs). Pfam domains are conserved regions within proteins that often correlate with specific structural or functional roles. Each domain in Pfam is represented by a curated alignment of related sequences and an HMM profile, which can be used to identify and analyze these domains in other proteins. This allows researchers to predict the function of uncharacterized proteins, study protein evolution, and understand protein domain architecture across different organisms. Finn et al, Pfam: The protein families database, Nucleic Acids Research 2014 Mistry et al, Pfam: The protein families database in 2021, Nucleic Acids Research 2021 Processing DetailsDownloaded PFAM domain id - Gene name data from https://useast.ensembl.org/biomart/martview/ on 06/23/2024. Filtered to include genes with PFAM id (mart_export.xlsx). Downloaded PFAM domain id - PFAM domain name data from https://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam37.0/Pfam-A.clans.tsv.gz on 06/23/2024 (Pfam-A.clans.xlsx). PFAMdataprep.py merges the two files, prepares domain-map.txt listing the domains per gene and domain-mapnames.txt listing the description of each domain. |
Labels: |
Format: map
Web label: Protein domains
Web label: key| (|count|): |desc
Filename | Size | md5 |
---|---|---|
PFAMdataprep.py | 1.24 KB | 877182d5106bab82fd361d9112cac832 |
Pfam-A.clans.xlsx | 977.25 KB | a788714d5cf6b12ca575c78db5c248a1 |
domain-map.txt | 388.88 KB | 5a6886188467d28a85d4b5a91633169e |
domain-mapnames.txt | 934.78 KB | 61147b89e31e0b0dbf06da8fa99f6b75 |
genomic_resource.yaml | 1.78 KB | 63dde98a311f2228774533c649509f57 |
mart_export.xlsx | 603.91 KB | a28ced2cbc46699dbf66499fee45f534 |
statistics/ |