Others titles
- Diseases and Genes Engaged After Chemical Exposure
- Diseases and Genes Affected by Chemical Exposure
- Chemical Disease Association Disorder Type
Keywords
- Toxicogenomics
- Gene Disease Association
- Gene Chemical Pathways
- Comparative Toxicogenomics Database
- Relationships Between Chemicals and Diseases
- Chemical and Disease Inferences
- Chemical Disease Hypotheses
Chemical Disease Associations
This dataset contains the relationships between chemicals and diseases. These relationships were inferred due to the fact that the chemical and the disease in some way share independent relationships with a same gene or group of genes; the inference was made through curation of research publications, the building of diagrams and statistical analysis.
Get The Data
- ResearchNon-Commercial, Share-Alike, Attribution Free Forever
- CommercialCommercial Use, Remix & Adapt, White Label Log in to download
Description
The dataset from the Comparative Toxicogenomics Database (CTD) contains different types of standardized identifications for the chemical and the disease to provide a cross-platform compatibility making able to identify the chemical and the disease in major science databases and to locate the references for the research in which the inference was based. It also provides the inference score that allows determining the importance of the inference.
Chemicals are among the main environmental factors that influence health and the way these can cause disease is not totally understood. The Comparative Toxicogenomics Database (CTD) purpose is to provide a tool to generate new hypotheses on the mechanism of chemicals in the development of diseases by collecting curated data reported in the scientific literature on chemicals, genes and diseases and making inferences on the relationships of these three elements. This is accomplished through transitive inference, which happens when for example a chemical and a disease share interactions with one or more genes, thus inferring that there is a relationship between the chemical and the disease linked to a process or product of the particular genes, with this information could be inferred the mechanism of action of the chemical upon the gene to produce the disease, the genes linked to the disease, the physiopathology of the disease and other inferences. “For example, if chemical A interacts with gene B, and independently gene B is associated with disease C, then chemical A is inferred
to have a relationship with disease C (via gene B).” These inferences could be given in other directions, for example, a gene and a disease could share the same group of chemicals; also the inferences could have direct evidence in which there are published research with evidence of the relationship, while other inferences don’t have direct evidence in the literature and can be used to create new testable hypothesis about the mechanism of disease, initiate new research on the relationship and potentially predict disease treatment and prevention.
The CTD datasets can be used to create a tool for input of queries to obtain inferred relationships between genes, chemicals and diseases and the significance of the inferences. To prioritize inferences CTD uses the inference score, which ranks how true is the inferred relationship; this is accomplished by a network diagram where the chemicals, genes and disease are nodes and the relationships between them (inferences) are edges (lines), then the statistical analysis takes into account the number of nodes (genes, diseases or chemicals) that interact with the node of interest (gene, disease or chemical), the number of inferences with direct evidence, and the location of the node of interest using the hypergeometric clustering coefficient and common neighbor statistics. Finally, the inferences should be ranked from higher to lower inference score, being the ones with higher score the most significant ones.
1. Davis AP, Grondin CJ, Johnson RJ, Sciaky D, King BL, McMorran R, Wiegers J, Wiegers TC, Mattingly CJ. The Comparative Toxicogenomics Database: update 2017. Nucleic Acids Res. 2016 Sep 19;[Epub ahead of print]
About this Dataset
Data Info
Date Created | 2004-01-20 |
---|---|
Last Modified | 2024-05-30 |
Version | 2024-05 |
Update Frequency |
Monthly |
Temporal Coverage |
N/A |
Spatial Coverage |
N/A |
Source | John Snow Labs; Comparative Toxicogenomics Database; |
Source License URL | |
Source License Requirements |
Publicly available and free for research application but citation is required. Permission asked for commercial uses |
Source Citation |
Publicly available and free for research application but citation is required. Permission asked for commercial uses |
Keywords | Toxicogenomics, Gene Disease Association, Gene Chemical Pathways, Comparative Toxicogenomics Database, Relationships Between Chemicals and Diseases, Chemical and Disease Inferences, Chemical Disease Hypotheses |
Other Titles | Diseases and Genes Engaged After Chemical Exposure, Diseases and Genes Affected by Chemical Exposure, Chemical Disease Association Disorder Type |
Data Fields
Name | Description | Type | Constraints |
---|---|---|---|
Chemical_Name | Name of the chemical associated with the disease. | string | required : 1 |
Chemical_ID | Identification number of the chemical by the US National Library of Medicine’s Medical Subject Headings (MeSH). MeSH is a controlled vocabulary of thousands of biomedical terms that serves to standardize the terminology used in published texts that belong to life sciences. Each MeSH term has a unique identifier, which can be from 7 to 8-character length. The MeSH unique identifier was changed to 10-character length after November 2013. | string | required : 1 |
Cas_Registry_Number | Unique numeric identifier designated by CAS for the chemical substance. CAS registry number also serves as a reference to find information on the specific chemical. CAS is a division of the American Chemical Society (ACS); the CAS registry collects information of millions of chemical substances identified since the early 1900’s. | string | - |
Disease_Name | Name of the disease associated to the chemical. | string | required : 1 |
Disease_ID | Unique identifier assigned to the disease by MeSH or OMIM, linked to the source record(s) for the disease. OMIM (Online Medelian Inheritance in Man) is a database of human genes and genetic disorders that displays the type of genetic variation and expression; OMIM uses a six-digit identifier for each gene or genetic disorder. MeSH is a controlled vocabulary of thousands of biomedical terms (including diseases) that serves to standardize the terminology used in published texts that belong to life sciences. Each MeSH term has a unique identifier, which can be from 7 to 8 character length. The MeSH unique identifier was changed to 10-character length after November 2013. | string | required : 1 |
Direct_Evidence | Type of evidence of the association published in scientific literature. Therapeutic association means that the research publication was based on a therapeutic approach or that the chemical was found to be a potential therapy for the disease . Marker or mechanism means that the research publication was oriented to a mechanism or a marker of the disease or that the chemical was found to intervene in the mechanism of disease development. | string | - |
Inference_Gene_Symbol | Short-form abbreviation of the name of the gene that was inferred to be linked to the association between the chemical and the disease. The approved symbols for human genes are collected in the HUGO Gene Nomenclature Committee database; each name and symbol is unique for every gene and can be applied for other species. | string | - |
Inference_Score | Inference score. The inference score is calculated using statistics that takes into account the connectivity of the chemical with the disease, the number of genes used to make the inference of association and the connectivity of each of the genes. The higher the score the more likely the inference is true. | number | level : Ordinal |
Omim_ID | Identification number(s) for the disease on OMIM database (‘|'-delimited list). OMIM (Online Medelian Inheritance in Man) is a database of human genes and genetic disorders that displays the type of genetic variation and expression; OMIM uses a six-digit identifier for each gene or genetic disorder. | string | - |
PubMed_ID | Identification number(s) of text(s) published in PubMed database (‘|'-delimited list) as direct evidence of chemical/gene associated with the disease. PubMed is a US National Library of Medicine citation database that contains millions of abstracts, references and full text links of biomedical literature from different trusted sources. | string | - |
Data Preview
Chemical Name | Chemical ID | Cas Registry Number | Disease Name | Disease ID | Direct Evidence | Inference Gene Symbol | Inference Score | Omim ID | PubMed ID |
06-Paris-LA-66 protocol | C046983 | Precursor Cell Lymphoblastic Leukemia-Lymphoma | MESH:D054198 | therapeutic | 4519131 | ||||
10074-G5 | C534883 | Adenocarcinoma | MESH:D000230 | MYC | 4.07 | 26432044 | |||
10074-G5 | C534883 | Adenocarcinoma of Lung | MESH:D000077192 | MYC | 4.3 | 26656844|27602772 | |||
10074-G5 | C534883 | Alopecia | MESH:D000505 | AR | 4.5 | 15902657 | |||
10074-G5 | C534883 | Androgen-Insensitivity Syndrome | MESH:D013734 | AR | 6.87 | 300068|312300 | 1303262|8281139 | ||
10074-G5 | C534883 | Astrocytoma | MESH:D001254 | AR | 4.95 | 24680642 | |||
10074-G5 | C534883 | Autistic Disorder | MESH:D001321 | AR | 3.93 | 19167832 | |||
10074-G5 | C534883 | Breast Neoplasms | MESH:D001943 | AR | 3.42 | 21633166|22174584 | |||
10074-G5 | C534883 | Breast Neoplasms, Male | MESH:D018567 | AR | 5.9 | 1303262|8281139 | |||
10074-G5 | C534883 | Bulbospinal neuronopathy, X-linked recessive | MESH:C537017 | AR | 6.87 | 313200 |