The study design and analyses were consistent with the Strengthening the Reporting of Observational studies in Epidemiology – Molecular Epidemiology (STROBE-ME) statement
Following review and approval of the protocol by the National Cancer Institute (NCI) Special Studies Institutional Review Board, healthy volunteer employees of the NCI Division of Cancer Epidemiology and Genetics were recruited to assess the reproducibility of microbial measures in self-collected fecal specimens and associations with urine estrogens. Following face-to-face instructions and informed consent, participants were provided a toilet-attached pouch (Protocult, Rochester, MN), from which they collected aliquots of an early or mid-morning stool, as well as a simultaneous urine specimen. After specimen collection, they completed a brief self-administered questionnaire on demographics, broad dietary categories, ease-of-use of two different fecal collection devices, and factors potentially related to gut microbiota, specifically age, sex, height, waist size, current weight, weight change within 12 months, inflammatory bowel disease, gastrointestinal surgery, cancer and other serious disease, food allergy, and dietary restrictions (vegan or vegetarian, gluten, lactose, peanuts, pork or shellfish)
. Postmenopausal women were those over age 50 years with no menses, pregnancy or childbirth within the previous 12 months. Premenopausal women were those with menses, pregnancy or childbirth within the previous 12 months. None of the women had had a hysterectomy.
Urine (20-50 mL) was collected in a screw-top sterile container without preservative. Urine was chilled immediately on frozen gel packs (4°C); frozen in liquid nitrogen within 3 hours; then thawed once to make 1ml aliquots, which were re-frozen and kept at -80°C until use for analysis of estrogens and estrogen metabolites.
Participants collected 16 aliquots, half in RNAlater (QIAGEN Inc., Valencia, CA) and half in sterile phosphate buffer saline (PBS), from various parts of a single stool. As with the urine, all fecal aliquots were chilled immediately on frozen gel packs (4°C) and frozen in liquid nitrogen within 3 hours. The fecal aliquots were stored at -80°C until used for DNA and protein extraction.
Approximately 0.5gr of thawed feces was transferred onto a 10 mL conical tube containing 5 mL of extraction buffer (60 mM Na2HPO4, 40 mM NaH2PO4, 10 mM KCl, 1 mM MgSO4) and kept on ice. Fecal material was homogenized by heavy vortexing for 1 min and bacterial cells were lysed by sonication using a Misonix XL2000 Ultrasonic Homogenizer (Fisher Scientific, Pittsburgh, PA) at max power for three 30-second intervals on an ice bath. Lysates were centrifuged at 21K ×g (15K rpm) for 30 minutes at 4°C using an Eppendorf 5424 microcentrifuge. Supernatant containing extracted proteins was transferred to new tubes and used to measure protein concentration and enzymatic activity. Protein concentration was estimated from each lysate using the bicinchoninic acid method according to the manufacturer’s instructions (PIERCE, Rockford, IL).
Protein extraction and enzymatic activities were performed as described by Goldin and colleagues
 with slight modifications to optimize detection of the enzymatic activities. Activities of β-glucuronidase and β-glucosidase (the control enzyme) were measured in a 96-microplate format using approximately 100mg of input protein from fecal lysates (in 100 μL volume with PBS). The final reaction volume was 200 μL/well, composed of 100 μL sample and 100 μL of either 10 mM 4-Nitrophenyl-β-D-glucuronide pH7.0 (for β-glucuronidase) or 10 mM 4-Nitrophenyl-β-D-glucopyranoside pH7.0 (for β-glucosidase) preincubated at 37°C, which was added immediately before starting the enzymatic reaction. Enzymatic activity was measured in triplicates by following real-time kinetics at 37°C of the product 4-nitrophenol. The increment of the product was monitored at 405 nm for 60 minutes for fecal extracts with sufficient protein concentration, or for 5 hours for diluted fecal samples, on a Spectramax M5 (Molecular Devices, Sunnyvale, CA). Enzymatic concentrations were determined from standards curves of pure enzymes from Sigma-Aldrich (St. Louis MO, G7646 for β-glucuronidase, G4511). This relates to β-glucosidase as controls and normalized by protein input. Enzymatic activity was reported as the mean value of triplicate runs in IU/100 mg protein.
Estrogens in urine and feces
Liquid chromatography/tandem mass spectrometry (LC-MS/MS) was used to quantify estrogens in 1mL of urine and fecal lysate in PBS
[22, 23]. Parent estrogens detected included estrone and estradiol; estrogen metabolites (EM) included estriol, 2-hydroxyestrone, 2-methoxyestrone, 2-hydroxyestradiol, 2-methoxyestradiol, 2-hydroxyestrone-3-methyl ether, 4-hydroxyestrone, 4-methoxyestrone, 4-methoxyestradiol, 16α-hydroxyestrone, 17-epiestriol, 16-ketoestradiol, and 16-epiestriol. A 500 μL aliquot of urine was used in the processing including enzymatic hydrolysis with glucuronidase/sulfatase-containing buffer, extraction, derivatization, and detection with stable isotope-labeled internal standards. Estrogens were quantified against calibration curves with 1000-fold linear ranges. Each batch included masked quality control samples and standards. Urine estrogens, which were determined for all participants, were expressed as pM/mg creatinine in urine, which was measured in the same samples. Fecal estrogens per pg/100 μL fecal lysate were likewise determined for 7 postmenopausal women and 22 men. The 19 premenopausal women and 3 men (selected at random) were excluded for cost considerations. To estimate deconjugated versus conjugated estrogen levels, LC-MS/MS on the fecal lysates was repeated without enzymatic hydrolysis. Conjugated estrogen was the level without hydrolysis, and deconjugated estrogen was the level with hydrolysis minus the level without hydrolysis. The current analysis considered estradiol, estrone, the EM grouped into three major hydroxylation pathways (2-, 4- and 16-hydroxylation), and the sum of these, designated as total estrogens.
Fecal DNA extraction
Genomic DNA from stool samples preserved and transported in RNAlater was extracted with a modification of the stool QIAamp DNA Stool mini kit (QIAGEN, Valencia, CA). Briefly, 300 mg of stool sample were mixed with 350 μL of lysis buffer composed of 0.05 M potassium phosphate buffer containing 50 μL lyzosyme (10 mg/mL), 6 μL of mutanolysin (25,000 U/ml; Sigma-Aldrich) and 3 μL of lysostaphin (4 U/mL in sodium acetate; Sigma-Aldrich). The mixture was incubated for 1 hour at 37°C then 10 μL proteinase K (20 mg/ml), 100 μL 10% SDS, 20 μL RNase A (20 mg/ml) were added, and the mixture was incubated for 1 hour at 55°C. Microbial cells were lysed by mechanical disruption (bead beating) using a FastPrep instrument (MP Biomedicals, Solon, OH) set at 6.0 m/s for 30 sec. The lysate was processed using the ZR Fecal DNA extraction kit (ZYMO Research, Irvine, CA) and according to the manufacture’s recommendation omitting the lysis steps (steps 1-3). The DNA was eluted into 100 μL of Tris EDTA (TE) buffer, pH8.0.
454 Pyrosequencing of 16S rRNA genes
Universal primers 27F and 338R were used for PCR amplification of the V1-V2 hypervariable regions of 16S rRNA genes. The 338R primer included a unique sequence tag to barcode each sample. The primers were as follows: 27F-5′-GCCTTGCCAGCCCGCTCAGTCAGAGTTTGATCCTGGCTCAG-3′ and 338R-5′-GCCTCCCTCGCGCCATCAGNNNNNNNNCATGCTGCCTCCCGTAGGAGT-3′, where the underlined sequences are the 454 Life Sciences® FLX sequencing primers B and A in 27F and 338R, respectively, and the bold font denotes the universal 16S rRNA primers 27F and 338R. The 8bp barcode within primer 338R is denoted by 8 Ns. Using 96 barcoded 338R primers, the V1-V2 regions of 16S rRNA genes were amplified in 96 well microtiter plates as follows: 5.0 μL 10X PCR buffer II (Applied Biosystems, Foster City, CA), 3.0 μL MgCl2 (25 mM; Applied Biosystems), 2.5 μL Triton X-100 (1%), 2.0 μL deoxyribonucleoside triphosphates (10 mM), 1.0 μL each of primer 27F and 338R (20 pmol/μL each), 0.5 μL AmpliTaq DNA polymerase (5 U/μL; Applied Biosystems), and 50ng of template DNA in a total reaction volume of 50 μL. Reactions were run in a PTC-100 thermal controller (MJ Research Inc., Waltham, MA) using the following cycling parameters: 5 minutes of denaturation at 95°C, followed by 20 cycles of 30 seconds at 95°C (denaturing), 30 seconds at 56°C (annealing) and 90 seconds at 72°C (elongation), with a final extension at 72°C for 7 minutes. Non-template controls were used as negative controls for each set of barcoded primers. The presence of amplicons was confirmed by gel electrophoresis on a 2% agarose gel and staining with SYBRGreen (Applied Biosystems, Foster City, CA). Equimolar amounts (~100 ng) of the PCR amplicons were mixed in a single tube, and amplification primers and reaction buffer were removed by processing the mixture with the Agencourt AMPure XP Kit (Beckman Coulter Genomics, Danvers, MA). The purified amplicon mixtures were sequenced by 454 FLX Titanium pyrosequencing (Roche Diagnostics Corp., Indianapolis, IN) with 454 Life Sciences® primer A by the Genomics Resource Center at the Institute for Genome Sciences, University of Maryland School of Medicine using protocols recommended by the manufacturer.
Classification of operational taxonomic units
Sequence read quality used the Institute of Genome Sciences bioinformatics pipeline that complies with standard operating procedures of the National Institutes of Health Human Microbiome Project
. Briefly, after trimming the primer and barcode sequences, raw sequence reads were filtered using the QIIME pipeline (http://qiime.sourceforge.net) with the following criteria to optimize the quality and integrity of the data: 1) minimum and maximum read length of 300 bp and 500 bp; 2) no ambiguous base calls; 3) no homopolymeric runs longer than 8 bp; 4) average quality value >q25 within a sliding window of 50 bp; 5) 60% match to a previously determined 16S rRNA gene sequence; and 6) chimera-free using the UCHIME software (http://www.drive5.com/uchime/). Sequence reads with the same barcode were binned by sample. Operational taxonomic units (OTUs) were defined using QIIME as sequences with at least 97% identity, and sequences were classified at the genus level using the Ribosomal Database Project (RDP) naïve Bayesian classifier
Relative abundance of each OTU, alpha diversity, and beta diversity were computed using QIIME
 for each DNA sample and then averaged on four replicates for each study participant. Alpha diversity was estimated by the Shannon index, which adjusts the number of OTUs detected for their relative abundance (proportions). Shannon index is calculated as minus the sum over OTUs of the proportion of a given OTU times the logarithm of that proportion in each sample. Beta diversity, which is a measure of separation of the phylogenetic structure of the OTUs in one participant, compared to all other participants, was estimated by unweighted Unifrac distances
. Taxa were classified by RDP with the Visualization and Analysis of Microbial Population Structures (VAMPS, Marine Biology Laboratories, Woods Hole, MA) pipeline. Pearson correlations were computed for each loge estrogen level with each loge enzymatic activity level and with each microbiome alpha diversity estimate. Two-sided P-values were computed. For the 55 taxa with mean relative abundance of at least 0.1%, as well as the six phyla, ordinal levels were created [zero, low (below median of detected sequences), high] and compared to loge β-glucuronidase and log β-glucosidase enzymatic activity levels by linear regression. Significance was based on two-sided tests with α=0.05. Analyses were conducted using the statistical software SAS version 9.2 (SAS Institute Inc, Cary, NC) and R version 2.13.0 (http://www.r-project.org/).
Role of the funding source
This research was supported by the Intramural Research Program of the NCI, NIH, which had no direct role in the data analysis, manuscript preparation, or decision to submit for publication.