| title | hmdbQuery: working with Human Metabolome Database (hmdb.ca) | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| author | Vincent J. Carey, stvjc at channing.harvard.edu | ||||||||||
| date | `r format(Sys.time(), '%B %d, %Y')` | ||||||||||
| vignette | %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{hmdbQuery: working with Human Metabolome Database (hmdb.ca)} | ||||||||||
| output |
|
The human metabolomics database (HMDB, http://www.hmdb.ca) includes XML documents describing 114000 metabolites. We will show how to manipulate the metadata on metabolites fairly flexibly.
suppressMessages({ suppressPackageStartupMessages({ library(hmdbQuery) library(gwascat) }) }) The hmdbQuery package includes a function for querying HMDB directly over HTTP:
library(hmdbQuery) lk1 = HmdbEntry(prefix = "http://www.hmdb.ca/metabolites/", id = "HMDB0000001") The result is parsed and encapsulated in an S4 object
lk1 The size of the complete import of information about a single metabolite suggests that it would not be too convenient to have comprehensive information about all HMDB constituents in memory. The most effective approach to managing the metadata will depend upon use cases to be developed over the long run.
Note however that this package does provide snapshots of certain direct associations derived from all available information as of Sept. 23 2017. Information about direct associations reported in the database is present in tables hmdb_disease, hmdb_gene, hmdb_protein, hmdb_omim. For example
data(hmdb_disease) hmdb_disease Some HMDB metabolites have been mapped to diseases.
d1 = diseases(lk1) # data.frame pmids = unlist(d1["references", 5][[1]][2,]) library(annotate) pm = pubmed(pmids[1]) ab = buildPubMedAbst(xmlRoot(pm)[[1]]) ab Note that pre HMDB v 4.0, biospecimens were called biofluids.
There are arbitrarily many biospecimen and tissue associations provided for each HMDB entry. We have direct accessors, and by default we capture all metadata, available through the store method.
biospecimens(lk1) tissues(lk1) st = store(lk1) head(names(st)) length(names(st)) st$protein_assoc["name",] st$protein_assoc["gene_name",]