Show simple item record

dc.contributor.authorAsgari, Ehsaneddin
dc.contributor.authorGarakani, Kiavash
dc.contributor.authorMcHardy, Alice C
dc.contributor.authorMofrad, Mohammad R K
dc.date.accessioned2019-01-10T09:24:03Z
dc.date.available2019-01-10T09:24:03Z
dc.date.issued2018-07-01
dc.identifier.citationBioinformatics. 2018 Jul 1;34(13):i32-i42. doi: 10.1093/bioinformatics/bty296.en_US
dc.identifier.issn1367-4811
dc.identifier.pmid29950008
dc.identifier.doi10.1093/bioinformatics/bty296
dc.identifier.urihttp://hdl.handle.net/10033/621639
dc.description.abstractMicrobial communities play important roles in the function and maintenance of various biosystems, ranging from the human body to the environment. A major challenge in microbiome research is the classification of microbial communities of different environments or host phenotypes. The most common and cost-effective approach for such studies to date is 16S rRNA gene sequencing. Recent falls in sequencing costs have increased the demand for simple, efficient and accurate methods for rapid detection or diagnosis with proved applications in medicine, agriculture and forensic science. We describe a reference- and alignment-free approach for predicting environments and host phenotypes from 16S rRNA gene sequencing based on k-mer representations that benefits from a bootstrapping framework for investigating the sufficiency of shallow sub-samples. Deep learning methods as well as classical approaches were explored for predicting environments and host phenotypes. A k-mer distribution of shallow sub-samples outperformed Operational Taxonomic Unit (OTU) features in the tasks of body-site identification and Crohn's disease prediction. Aside from being more accurate, using k-mer features in shallow sub-samples allows (i) skipping computationally costly sequence alignments required in OTU-picking and (ii) provided a proof of concept for the sufficiency of shallow and short-length 16S rRNA sequencing for phenotype prediction. In addition, k-mer features predicted representative 16S rRNA gene sequences of 18 ecological environments, and 5 organismal environments with high macro-F1 scores of 0.88 and 0.87. For large datasets, deep learning outperformed classical methods such as Random Forest and Support Vector Machine. The software and datasets are available at https://llp.berkeley.edu/micropheno. Supplementary data are available at Bioinformatics online.en_US
dc.publisherOxford University Pressen_US
dc.relation.urlhttps://llp.berkeley. edu/microphenoen_US
dc.rightsAttribution-NonCommercial-ShareAlike 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/4.0/*
dc.titleMicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples.en_US
dc.typeArticleen_US
dc.contributor.departmentBRICS, Braunschweiger Zentrum für Systembiologie, Rebenring 56,38106 Braunschweig, Germany.en_US
refterms.dateFOA2019-01-10T09:24:03Z
dc.source.journaltitleBioinformatics (Oxford, England)


Files in this item

Thumbnail
Name:
Asgari et al.pdf
Size:
904.0Kb
Format:
PDF
Description:
Open Access article
Thumbnail
Name:
asgari_sup.pdf
Size:
637.6Kb
Format:
PDF
Description:
supplementary data
Thumbnail
Name:
Corrigendum to Asgari et al.pdf
Size:
70.60Kb
Format:
PDF
Description:
Open Access article (Corrigendum)

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-ShareAlike 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-ShareAlike 4.0 International