Others titles
- Genes Involved in Molecular Diseases
- Literature Curated Database of Genes and Disease Relationships
- Genetic Base of Disease
Keywords
- Toxicogenomics
- Gene Disease Association
- Gene Chemical Pathways
- Gene and Disease Relationship
- Heterogeneous Exposure Information
- Comparative Toxicogenomics Database
- Relationships Between Genes and Diseases
- Chemical and Disease Inferences
- Chemical Disease Hypotheses
Gene Disease Associations
This dataset contains the relationships between genes and diseases. These relationships were inferred due to the fact that the gene and the disease in some way share independent relationships with the same chemical; the inference was made through curation of research publications, the building of diagrams and statistical analysis.
Get The Data
- ResearchNon-Commercial, Share-Alike, Attribution Free Forever
- CommercialCommercial Use, Remix & Adapt, White Label Log in to download
Description
This dataset from the Comparative Toxicogenomics Database (CTD) contains different types of standardized identifications for the gene and the disease to provide a cross-platform compatibility making able to identify the gene and the disease in major science databases and to locate the references for the research in which the inference was based. It also provides the inference score that allows determining the importance of the inference.
Chemicals are among the main environmental factors that influence health and the way these can cause disease is not totally understood. The Comparative Toxicogenomics Database (CTD) purpose is to provide a tool to generate new hypotheses on the mechanism of chemicals in the development of diseases by collecting curated data reported in the scientific literature on chemicals, genes and diseases and making inferences on the relationships of these three elements. This is accomplished through transitive inference, which happens when for example a chemical and a disease share interactions with one or more genes, thus inferring that there is a relationship between the chemical and the disease linked to a process or product of the particular genes, with this information could be inferred the mechanism of action of the chemical upon the gene to produce the disease, the genes linked to the disease, the physiopathology of the disease and other inferences. “For example, if chemical A interacts with gene B, and independently gene B is associated with disease C, then chemical A is inferred to have a relationship with disease C (via gene B).” (1) These inferences could be given in other directions, for example, a gene and a disease could share the same group of chemicals; also the inferences could have direct evidence in which there are published research with evidence of the relationship, while other inferences don’t have direct evidence in the literature and can be used to create new testable hypothesis about the mechanism of disease, initiate new research on the relationship and potentially predict disease treatment and prevention.
The CTD datasets can be used to create a tool for input of queries to obtain inferred relationships between genes, chemicals and diseases and the significance of the inferences. To prioritize inferences CTD uses the inference score, which ranks how true is the inferred relationship; this is accomplished by a network diagram where the chemicals, genes and disease are nodes and the relationships between them (inferences) are edges (lines), then the statistical analysis takes into account the number of nodes (genes, diseases or chemicals) that interact with the node of interest (gene, disease or chemical), the number of inferences with direct evidence, and the location of the node of interest using the hypergeometric clustering coefficient and common neighbor statistics. Finally, the inferences should be ranked from higher to lower inference score, being the ones with higher score the most significant ones.
1. Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. The Comparative Toxicogenomics Database: update 2017. Nucleic Acids Res. 2016 Sep 19;[Epub ahead of print]
About this Dataset
Data Info
Date Created | 2004-01-20 |
---|---|
Last Modified | 2023-07-28 |
Version | 2023-07 |
Update Frequency |
Monthly |
Temporal Coverage |
N/A |
Spatial Coverage |
N/A |
Source | John Snow Labs; Comparative Toxicogenomics Database; |
Source License URL | |
Source License Requirements |
Publicly available and free for research application but citation is required. Permission asked for commercial uses |
Source Citation |
Publicly available and free for research application but citation is required. Permission asked for commercial uses |
Keywords | Toxicogenomics, Gene Disease Association, Gene Chemical Pathways, Gene and Disease Relationship, Heterogeneous Exposure Information, Comparative Toxicogenomics Database, Relationships Between Genes and Diseases, Chemical and Disease Inferences, Chemical Disease Hypotheses |
Other Titles | Genes Involved in Molecular Diseases, Literature Curated Database of Genes and Disease Relationships, Genetic Base of Disease |
Data Fields
Name | Description | Type | Constraints |
---|---|---|---|
Gene_Symbol | Short-form abbreviation of the name of the gene interacting with the disease. The approved symbols for human genes are collected in the HUGO Gene Nomenclature Committee database; each name and symbol is unique for every gene and can be applied for other species. | string | - |
Gene_ID | Unique identifier for the gene of the National Center for Biotechnology Information (NCBI)’s Entrez Gene database. This Entrez Gene unique integer can be browsed in the Entrez system online to find nomenclature, sequence, products and other specific details of the gene. The identifier is species specific, a gene ID of a human gene can’t be applied to the same gene of a different species. | integer | level : Nominalrequired : 1 |
Disease_Name | Name of the disease associated with the gene. | string | required : 1 |
Disease_ID | Unique identifier assigned to the disease by MeSH or OMIM, linked to the source record(s) for the disease. OMIM (Online Medelian Inheritance in Man) is a database of human genes and genetic disorders that displays the type of genetic variation and expression; OMIM uses a six-digit identifier for each gene or genetic disorder. MeSH is a controlled vocabulary of thousands of biomedical terms (including diseases) that serves to standardize the terminology used in published texts that belong to life sciences. Each MeSH term has a unique identifier, which can be from 7 to 8 character length. The MeSH unique identifier was changed to 10-character length after November 2013. | string | required : 1 |
Direct_Evidence | Type of evidence of the association published in scientific literature. Therapeutic association means that the gene actions, products or modifications over the gene have found to be a potential therapy for the disease. Marker or mechanism means that the gene has been found to intervene in the mechanism of disease development or that the gene mutation serves as a marker for the disease. ('|'-delimited list) | string | - |
Inference_Chemical_Name | Name of the chemical that was inferred to be linked to the association between the gene and the disease | string | - |
Inference_Score | Score calculated for the probability of the inference. The inference score is calculated using statistics that takes into account the connectivity of the chemical with the disease, the number of genes used to make the inference of association and the connectivity of each of the genes. The higher the score the more likely the inference is true. | number | level : Ratio |
Omim_ID | Identification number(s) for the disease on OMIM database (‘|'-delimited list). OMIM (Online Medelian Inheritance in Man) is a database of human genes and genetic disorders that displays the type of genetic variation and expression; OMIM uses a six-digit identifier for each gene or genetic disorder. | string | - |
PubMed_ID | Identification number(s) of text(s) published in PubMed database (‘|'-delimited list) as direct evidence of chemical/gene association with the disease. PubMed is a US National Library of Medicine citation database that contains millions of abstracts, references and full text links of biomedical literature from different trusted sources. | string | - |
Data Preview
Gene Symbol | Gene ID | Disease Name | Disease ID | Direct Evidence | Inference Chemical Name | Inference Score | Omim ID | PubMed ID |
11-BETA-HSD3 | 100174880 | Abnormalities, Drug-Induced | MESH:D000014 | Endocrine Disruptors | 5.22 | 22659286 | ||
11-BETA-HSD3 | 100174880 | Amyotrophic Lateral Sclerosis | MESH:D000690 | Water Pollutants, Chemical | 4.76 | 33562464 | ||
11-BETA-HSD3 | 100174880 | Anemia | MESH:D000740 | Water Pollutants, Chemical | 4.29 | 26546277 | ||
11-BETA-HSD3 | 100174880 | Anemia, Hemolytic | MESH:D000743 | Water Pollutants, Chemical | 4.59 | 22425172 | ||
11-BETA-HSD3 | 100174880 | Asthenozoospermia | MESH:D053627 | Water Pollutants, Chemical | 5.07 | 25179371 | ||
11-BETA-HSD3 | 100174880 | Birth Weight | MESH:D001724 | Endocrine Disruptors | 12.27 | 27152464|29518214 | ||
11-BETA-HSD3 | 100174880 | Birth Weight | MESH:D001724 | Water Pollutants, Chemical | 12.27 | 32321520 | ||
11-BETA-HSD3 | 100174880 | Body Weight | MESH:D001835 | Water Pollutants, Chemical | 5.17 | 32842613 | ||
11-BETA-HSD3 | 100174880 | Breast Neoplasms | MESH:D001943 | Endocrine Disruptors | 8.8 | 20646273 | ||
11-BETA-HSD3 | 100174880 | Breast Neoplasms | MESH:D001943 | Water Pollutants, Chemical | 8.8 | 20164002 |