Patient recruitment and stool collection
Patients with a diagnosis of chronic liver disease and undergoing ultrasound elastography were recruited prospectively from the VA Greater Los Angeles Healthcare System (VA) from 6/2017 to 6/2018. Chronic liver disease included patients with chronic hepatitis C virus (HCV) infection, chronic hepatitis B virus (HBV) infection, liver disease due to chronic alcohol use, primary biliary cholangitis (PBC), primary sclerosing cholangitis (PSC), Wilson’s disease, autoimmune hepatitis, hemochromatosis, and NAFLD. Patients were excluded if they were treated with antibiotics or probiotics within 3 months of enrollment, had only acute liver injury without any underlying chronic liver disease, treated HCV infection with sustained virologic response without any other forms of chronic liver disease, were on a specialized diet (e.g. gluten free, vegan, vegetarian, high protein), had a personal history of GI surgeries, irritable bowel syndrome or inflammatory bowel disease. Stool was collected within 7 days of their ultrasound elastography and placed into 95% ethanol and stored at −80°C until processing. Patient information including age, gender, race/ethnicity, and comorbidities were also collected. For race and ethnicity, there were 5 categories with Hispanic as a separate category (i.e. non-Hispanic white, non-Hispanic black, Hispanic, Asian, and other). Co-morbidities were collected in order to calculate the Charlson comorbidity index, a validated score that assesses overall health and risk of 1-year all-cause mortality19. Stool samples from heathy control patients without any evidence of chronic liver disease were also collected. The study was approved by the Veteran’s Affair Greater Los Angeles Healthcare System Institutional Review Board. All methods herein were performed in accordance with relevant guidelines and regulations. Verbal and written informed consent for study participation was obtained from all patients.
Liver ultrasound elastography
All patients with chronic liver disease underwent an ultrasound elastography using the FibroScan touch 502 machine (Echosens, MA, USA). All ultrasound elastographies were performed by trained technicians with over 100 scans of experiences each. Medium (M) and extra-large (XL) probes were utilized depending on the patient’s body habitus according to manufacturer’s protocol. Controlled attenuation parameter (CAP) score and liver stiffness were collected as non-invasive measurements of hepatic steatosis and fibrosis, respectively. All measurements were done at least 10 times at the same spot with interquartile range/median value less than 30% as per manufacturers guidelines. A CAP score of between 238 and 260 was given a steatosis grade of S1 representing 11–33% of fatty change in the liver, a score between 260 and 290 was given a grade of S2 representing 34–66% of fatty change, and a score higher than 290 was given a grade of S3 representing 67% or more of fatty change as per manufacturer’s guideline. Standard cutoffs of liver stiffness as measured in kilopascals based on etiology of liver disease was used to determine extent of liver fibrosis (F0/F1 to F4)20. Minimal fibrosis was defined as a score consistent with F0-F2 and advanced fibrosis was defined as a score consistent with F3-F4, similar to prior published studies17.
16S rRNA sequencing
DNA was extracted from ethanol preserved stool using the Powersoil kit as per the manufacturer’s instructions (MO BIO, Carlsbad, CA, USA). The V4 region of 16S ribosomal RNA was amplified and underwent paired end sequencing on an Illumina HiSeq 2500 (San Diego, CA, USA) as previously described21. The 253 base-pair reads were processed using QIIME 1.9.1 (San Diego, CA, USA) with default parameters22. The average sequence depth per sample was 45,560. Operational taxonomic units (OTUs) were picked against the May 2013 version of the Greengenes database, prefiltered at 97% identity. After removing OTUs that were present in fewer than 10% of all samples, 1479 OTUs remained for analysis. Raw 16S rRNA sequence data were deposited under National Center for Biotechnology Information BioProject PRJNA542724.
For demographic data, means are expressed along with their standard deviations and comparisons between means were performed using the Student’s t-test. Categorical data were compared using the Pearson’s chi-squared test.
For 16S rRNA sequencing data, alpha diversity metrics that included Chao1 (a metric for species richness), Faith’s phylogenetic diversity, and Shannon Index (a metric that incorporates both species richness and species evenness) were computed using QIIME. The statistical significance of differences in alpha diversity metrics was calculated using a two-tailed t-test. Beta diversity, a metric of differences between samples, was calculated using the square root of the Jensen-Shannon divergence and visualized by principal coordinates analysis in R23. Univariate Adonis, a permutational analysis of variance, was performed using 10,000 permutations to test for differences in the square root of the Jensens-Shannon divergence across the following variables: age, gender, race/ethnicity, BMI, control/patient cohort, fibrosis as a binary categorical variable, steatosis grade, etiology of liver disease, and Charlson’s comorbidity index. Only variables with a p-value < 0.1 were used for the final multivariate analysis. This included steatosis grade, Charlson’s comorbidity index, and fibrosis. Differential abundance testing was evaluated using DESeq2 in R, which employs an empirical Bayesian approach to shrink dispersion and fit non-rarified count data to a negative binomial model24. Variables listed in the multivariate analysis of DESeq2 were the same variables listed above for the multivariate Adonis analysis. P-values for differential abundance were converted to q-values to correct for multiple hypothesis testing (<0.05 for significance). All authors had access to the study data and had reviewed and approved the final manuscript.
Random forests classifier
A random forests classifier to predict advanced fibrosis was created in R using the randomForest package (https://cran.r-project.org/web/packages/randomForest) with 1001 trees and mtry = 225. Features inputted into the random forest classifier were those associated significantly with advanced fibrosis as determined by multivariate DESeq2 models. The accuracy of the random forest classifier was estimated using a 10-fold cross-validation.
Metagenomic data of each sample was inferred from 16S rRNA sequencing data by using PICRUSt 1.1.3 (http://picrust.github.io/picrust), a well validated tool designed to impute metagenomic data from 16S rRNA compositional data26. 16S rRNA sequencing data was inputted into PICRUSt and normalized by copy number using default parameters. The subsequent metagenes were then categorized by function using the KEGG database. Differences in predicted metagenes by advanced fibrosis were identified using DESeq2 with p-values adjusted for multiple hypothesis testing.
The findings of the random forest classifier were validated in a separate cohort of NAFLD patients recruited at the VA from January 1st, 2019 to October 1st, 2019. Inclusion and exclusion criteria were the same as above. All patients underwent stool collection and liver ultrasound elastography as described above. Demographic data, race, ethnicity, and comorbidities were collected. In addition, all patients within this cohort filled out a validated diet questionnaire, the NIH Diet History Questionnaire III (DHQIII), at the time of their stool collection27.
This is one of the few studies that have examined the microbiome as a novel biomarker for advanced fibrosis. Unlike prior works that have only examined patients with nonalcoholic fatty liver disease, this study included patients from various races and etiologies of liver disease. The study highlights how the gut microbiome may play a role in fibrosis progression.