Frequently Asked Questions

Introductions to Microbiomes

What is a Microbiome?

A microbiota is the community of micro-organisms, such as bacteria, fungi, viruses, etc, that live within a given area and a microbiome is either the totality of all the DNA from all of the organisms living in a microbiota or the micobiota itself. Microbiomes can be very large with the number of micro-organisms living within a microbiome can range up into the millions. Microbiomes exists on and in humans too, as is becoming more and more understood as a vital part of us.

How can we know what bacteria is in a Microbiome?

We can quantify the different bacteria in a microbiome using the 16S ribosomal DNA sequence. Non-bacterial organisms like fungi, humans and viruses don’t have this DNA sequence, but every bacterium does, each with slight differences that is specific to it. However, since bacterial species also have many differences in their DNA beyond these sequences, it may not technically be accurate to call the divisions based on 16S as “species”, so instead they are called Operational Taxonomic Units or OTUs. OTUs are good approximation for species and are the foundational unit for the analysis of microbiomes. By targeting sequencing at this region, researchers can sequence hundreds to thousands of copies of this DNA and count the 16S sequences that match to each different OTU. These counts are called “community profiles”. The FMD uses a pipeline called UPARSE to assign the 16S sequences from a dataset to different bacterial OTUs across a variety of taxonomic depths from species level to phylum level.

Are all microbiomes the same?

No. The OTU community profiles differ in microbiomes between environments, between people, and between time. For example, the micro-organisms living in a person’s stomach (the gut microbiome) and those that live on their elbow (skin microbiome) are comprised of very different collections of bacteria. Likewise, the microbiomes can differ between people, so that the micro-organisms living in a person’s gut can significantly differ from their neighbor. Finally, microbiomes also differ in the same place over time, with changes in behavior being a large. However, these differences are not all the same order of magnitude, with the same person sharing more similarities, followed by the same body site followed by different body sites.

Data Analysis

Where does this data come from?

The FMD compiles publicly available 16S rRNA sequence data. In collaboration with co-Regional Investigators from multiple sites around the world (Hong Kong, Barbados, Chile, and two sites from South Africa) oral and stool samples will be collected from healthy females and processed (see SOP). Sequence data is also downloaded from public websites and then analyzed. All FMD sequence data is derived from samples from healthy adults (>= 18 years) across multiple body sites. A complete list of available data, including studies, counts, and metadata information, is found on the data statistics page.

How was the data analyzed?

The FMD pre-computes the taxonomic population distribution for each sequence from the public dataset using the UPARSE pipeline (Edgar 2013) and then matches these distributions to their geographic location (discretely, not continuously) using machine learning techniques to identify bacterial taxa (at different taxonomic levels) that best discriminate across different geographical locations. Both of these steps are described in more detail in the manual page.

What data should I upload into the database?

In the future the FMD will take in next generation 16S rRNA sequence data in FASTQ format (see SOP). Currently, the website can take user-supplied mothur formatted taxonomy and OTU files as input to view the taxonomic composition of the data, compare it with existing FMD data, and predict the geographical source of the data through the FMD analysis page.

Why does the FMD pipeline use UPARSE, and not Mothur or Qiime?

There are multiple programs that can estimate the taxonomic distribution of a 16S sample, including Mothur and Qiime. Since UPARSE uses Mothur during the taxonomic assignment step, it is only during the OTU building that the programs differ. During benchmarking of both methods, UPARSE was found to be quicker, yielding fewer OTUs (Edgar 2013). This is primarily due to UPARSE's removal of singletons (a singleton is a read with a sequence that is present exactly once, i.e. is unique among the reads) before OTU clustering (or OTU generation). Removing singletons is a way of reducing sequencing errors. Additionally, UPARSE is sequencing platform agnostic, making it unbiased for processing sequence datasets produced differently (i.e. 454 or Illumina). More detailed description of the UPARSE pipeline can be found in the manual.

What metadata is available on the FMD?

The database contains metadata such as: body site, subject age and gender, and geographic information: country, sub-division (i.e the state or department or province, etc), and city. For some datasets, these variables are unknown and designated by "NA". However, the more data the FMD is populated with, the more accurate the geolocation predictions will be.

Citation

How do I cite the FMD?

If you use the FMD to assist in research publications, abstracts, presentations, or proposals, the preferred method to cite the FMD is as follows: "Data was obtained from the Forensic Microbiome Database (FMD) though http://fmd.jcvi.org."