Home Hepatitis Single-molecular real-time deep sequencing reveals the dynamics of multi-drug resistant haplotypes and structural variations in the hepatitis C virus genome

Single-molecular real-time deep sequencing reveals the dynamics of multi-drug resistant haplotypes and structural variations in the hepatitis C virus genome

Credits to the Source Link Daniel
Single-molecular real-time deep sequencing reveals the dynamics of multi-drug resistant haplotypes and structural variations in the hepatitis C virus genome

Single-molecule real-time sequencing of the region from the NS3 to the NS5A genes in the HCV genome

We conducted SMRT sequencing for the 35 HCV-samples and obtained a total of 9,342,760 raw reads with 25,251,529,303 bp (Fig. 1, Supplementary Table S1). To obtain more accurate reads, we executed a Consensus Circular Sequencing 2 (CCS2) algorithm on the error-prone raw reads using the pbsmrtpipe software (Pacific Biosciences). The CCS2 algorithm interlaces all raw reads (called 1-pass when only one raw read was sequenced from one adapter to another adapter, and called 2-pass when two raw reads were sequenced from one adapter to another adapter) derived from the same DNA template. The pbsmrtpipe and blastn software, and the in-house perl script generated a total of 284,565 ≥ 5-pass CCS2 reads with primer sequences at each end (888,923,070 bp). The CCS2 reads were equivalent to a 7,516× coverage, on average, of the region between the NS3 and NS5A genes (Supplementary Fig. S1). The mean quality value of the ≥5 pass-CCS2 reads was high, at 55.4–73.93, and the sequencing error rate was low, at 2.88 × 10−6–4.05 × 10−8%.

Figure 1

Sample collection and single-molecule real-time sequencing workflow. This flow consisted of sample collection, reverse transcription polymerase chain reaction (RT-PCR), single-molecule real-time (SMRT) sequencing, and Circular Consensus Sequencing2 (CCS2). For cases #1–#8, the samples were extracted prior to treatment and after treatment failure of the daclatasvir + asunaprevir (DCV/ASV). For cases #9–#12, the samples were extracted prior to treatment with simeprevir (SMV) combined with peginterferon and ribavirin, DCV/ASV and sofosbuvir + ledipasvir (SOF/LDV), and after the treatment failure of the SMV and DCV/ASV. RT-PCR amplified the NS3-NS5A gene regions of the cDNA transcribed from HCV-RNA. For SMRT sequencing, we constructed the library by binding the adapters to the cDNA by ligation. The library was sequenced with PacBio RS II. While the sequenced raw reads included sequencing errors, CCS2, the error correction algorithm, generated highly accurate reads by correcting the errors with the raw reads repetitively sequenced in single the molecules.

To assess the error rate in the whole procedure, we used two samples of HCV-containing plasmid as the control sequence. SMRT sequencing generated a total of 819,416 raw reads of the NS3-to-NS5A region with 2,177,357,653 bp. After error correction by the CCS2 algorithm, we obtained 38,364 ≥5-pass CCS2 reads (101,683,536 bp). These ≥5-pass CCS2 reads were aligned to the HCV1b reference (accession no. AB047639.1) using blastn version 2.2.29 with customised parameters (-dust no)26. Based on the blastn results, we found the mismatch rate to be 0.023% (SD = 0.039, 95% CI: 0.023–0.024) and the error rate to be 0.267% (SD = 0.298, 95% CI: 0.264–0.270) (Supplementary Table S2).

The sequence data suggested that we obtained more than 7,000 accurate long-reads per sample, derived from the region between the NS3 and NS5A genes.

Determination of RAS haplotypes of the HCV genome

To analyse the dynamics of the drug-resistant viral clones during the DAA treatment in a simple manner, we determined the linkage of seven RASs (Q80 and D168 in NS3, and R30, L31, P32, Q54, and Y93 in NS5A) from the NS3 to NS5A genes of each viral clone, and reduced the dimensions of the haplotype data (Fig. 2).

Figure 2
figure2

The flow of determining haplotype codes. The method for determining the haplotype code consists of 4 steps: (1) mapping CCS2 reads to the NS3/NS5A gene sequences using blastn, (2) extracting each codon from the 7 RAV-sites in the CCS2 reads, based on the blastn results, converting each codon of the 7 RAV-sites to three numbers (0, 1, and 2), and combining all codons to make a haplotype. The figure shows the example of the steps, from aligning the CCS2 reads to constructing a haplotype.

First, to determine the position of the NS3 and NS5A genes in each clone, we aligned the ≥5-pass CCS reads to the sequences of the NS3 (3,408–5,300) and NS5A genes (6,246–6,735) in the HCV reference genome (accession no. D90208.1) by blastn version 2.2.29 with customised parameters (-dust no)26. According to the results from the blastn, each codon from the CCS2 reads that was in one of the seven RASs, was extracted. Of the CCS2 reads, those possessing frameshift indels or not mapped to NS3 or NS5A were excluded from haplotype determination. Next, to convert the linkage information of the seven codon changes to low-dimensional digital data, we encoded the original codon (wild codon) as 0, a synonymous codon as 1, and a nonsynonymous codon as 2 (Fig. 2). The original codons and the synonymous codons for the seven-RAS loci are defined in Supplementary Table S3. After translating each codon to the code, we constructed a 7-digit code for 7-RAS haplotypes by combining these codes (Fig. 2). Before applying the coding procedure, the total number of HCV clones between the NS3-NS5A genes was 187,539, and the average number was as high as 5,358 (SD = 3460.05) haplotypes per sample (Table 1). In contrast, after the construction of the 7-digit coded haplotypes, the number of the viral haplotypes were summarised to 24 on average (SD = 20.45). Consequently, the coding procedure simplified thousands of HCV clone types into fewer than one hundred types, using the haplotype code for the 7-RAS loci, making the overall picture of the HCV quasispecies in each sample clearer and easier to understand.

Table 1 Sample information.

Dynamics of the 7-RAS haplotypes between the DAA pre- and post-treatment

To examine the dynamics of the 7-RAS haplotypes, we listed the haplotype codes at each timepoint and compared the relative frequency of the 7-RAS haplotypes between the pre- and post-treatment samples of a total of 15 DAA treatments (before and after DCV/ASV in cases #1–#12, and before and after SMV in cases #9, #10 and #12; Supplementary Table S4). Then, to exclude the haplotypes derived from artefacts, we set the mismatch rate + 2 SD as the threshold (0.103%) and listed the haplotype codes with over 0.103% frequency in the samples. In 5 of the 15 paired comparisons (#5-DCV/ASV-pre/post, #8-DCV/ASV-pre/post, #9-SMV-pre/post, #10-SMV-pre/post and #10-DCV/ASV-pre/post), the major haplotypes had multiple RASs present when the treatment failed, also existing in 0.14–1.26% of the CCS2 reads at pre-treatment. Meanwhile, in the other ten paired comparisons, the major haplotypes only had RASs present when the treatment failed. Of these 5 paired comparisons, the dynamics of the haplotypes in the 2 representative cases are demonstrated in Fig. 3 (#5-DSV/ASV-pre/post paired samples and #8-DSV/ASV-pre/post paired samples).

Figure 3
figure3

Dynamics of haplotype codes before treatment and after treatment failure. (A) Comparison of the haplotypes between #5-DCV/ASV-pre and #5-DCV/ASV-post. Before therapy, “0202012”, resistant to both the NS3 inhibitor and NS5A inhibitor, existed at a frequency of 0.32% (left table). In contrast, the relative abundance of “0202012” increased to 98.86%, with treatment failure (right table). The % in each table represents the percentage for the number of the CCS2 reads in the sample. (B) Comparison of haplotypes between #8-DCV/ASV-pre and #8-DCV/ASV-post. Although haplotype “0202022” was 1.26% and the minor haplotype at pre-treatment, this haplotype was 82.65%, and the major haplotype at treatment failure. The % in each table represents the percentage for the number of the CCS2 reads in the sample.

During the DCV/ASV treatment in case #5 (Fig. 3A), the haplotype code “0202012” was the ninth most frequent haplotypes in the pre-treatment samples. However, after treatment, haplotype “0202012” increased to the most frequent haplotype. Likewise, with the example of case #5 (Fig. 3B), its ninth most frequent haplotype before treatment, “0202022”, became the most frequent haplotype after treatment failure, and showed resistance to DCV and ASV due to the nonsynonymous changes of NS3-D168, NS5A-L31, and NS5A-Y93. These data suggest that low-abundance multidrug-resistant viral clones exist before the DAA treatment.

We also focused on the change in the number of haplotype codes from the pre-treatment of DAAs (DCV/ASV and SMV) to relapse (Fig. 4A). With DCV/ASV treatment, the number of haplotypes significantly decreased by 8.08 (SD = 18.77, 95% CI: 4.5–12.5) from 14.67 (SD = 9.12) with pre-treatment of DCV/ASV, and to 6.58 (SD = 7.1) at relapse (p = 0.00293). In contrast, comparing the means of the nonsynonymous codons of the 7-RAS loci per CCS2 read before and after DSV/ASV therapy, the number of nonsynonymous codons significantly increased by 2.14 (SD = 0.96, 95% CI: 1.35–2.91) from 1.50 (SD = 0.92) at the pre-treatment to 3.64 (SD = 0.75) at treatment failure (p = 0.0004883) (Fig. 4B). With SMV treatment, neither the number of haplotype codes (mean = 4, SD = 10.68, 95% CI: −5–19) nor the number of nonsynonymous codons (mean = 0.87, SD = 0.17, 95% CI: 0.63–1.00) showed significant differences, due to the small paired-sample size (Fig. 4C,D).

Figure 4
figure4

Differences in the haplotype profiles, both pre- and post-treatment on daclatasvir/asunaprevir and simeprevir. (A) The graph demonstrates the differences in the number of haplotype codes between pre-and post-treatment with daclatasvir/asunaprevir (DCV/ASV). The x-axis represents the timepoint of the sera collection, and the y-axis represents the number of haplotypes. The Wilcoxon signed-rank test was used for the statistical examination of the difference between the number of haplotype codes. (B) The graph demonstrates the difference in the number of nonsynonymous codons per clone from pre-treatment of DCV/ASV to post-treatment. The x-axis represents the timepoint of the sera collection and the y-axis represents the mean number of nonsynonymous codons on the 7-RAS loci, per HCV clone. The Wilcoxon signed-rank test was used for the statistical examination of the differences between the number of nonsynonymous codons. (C) The graph demonstrates the differences in the number of haplotype codes between pre-and post-treatment with simeprevir (DCV/ASV). The x-axis represents the timepoint of the sera collection, and the y-axis represents the number of haplotypes. (D) The graph demonstrates the difference in the number of nonsynonymous codons per clone from the pre-treatment of SMV to the post-treatment. The x-axis represents the timepoint of the sera collection and the y-axis represents the mean number of nonsynonymous codons on the 7 RAS loci, per HCV clone.

Although the coded haplotypes lacked the detailed information of each codon, the comparison of haplotype codes during DAA therapy suggests that rare HCV clones with multiple RASs at pre-treatment might be the cause of relapse as they became the major haplotype after treatment failure. In addition, the significant reduction in the number of haplotype codes and the significant gain in the nonsynonymous codons at RAS loci at treatment failure, indicates the clonal selection of viruses with survival benefit under anti-HCV treatment.

Structural variations detected by SMRT sequencing

To understand the viral clones at a deeper level, we analysed structural variation in the HCV-RNA genome at single-clone resolution. To call candidates with ≥30-bp SVs, we executed ngmlr and Sniffles for each sample. In all cases, a total of 6,512 CCS2 reads had SVs, corresponding to 2.29% of all ≥- 5-pass CCS2 reads. In particular, deletions were detected in 4,393 of CCS2 reads (1.54%), while duplications, insertions, inversions, and U-turns (INVDUP) were detected in 220 (0.08%), 61 (0.02%), 68 (0.02%), and 1,906 reads (0.67%), respectively (Fig. 5, Supplementary Table S5).

Figure 5
figure5

The definition of structural variations (SVs) by Sniffles. The figure is adapted from the article of Sniffles32 with focus on the relationship between the reference genome sequence and the sample genome sequence with each SV32. Sniffles is able to detect 8 types of SVs, such as deletions, duplications, insertions, inversion, translocation, nested variation (inversion + deletion and duplication + inversion), and U-Turns (INVDUP). The red regions indicate regions where an SV occurs. The red points stand for the breakpoint of deletions.

We focused on the 10 SVs detected with frequencies of ≥1% of the CCS2 reads, and examined the changes in their frequencies after the treatment (Supplementary Fig. S2, Supplementary Table S6). Two examples (SV7 and 10) of the 10 SVs are shown in Fig. 6A,B, respectively. In the case of SV7, the CCS2 reads lacked the sequence located at the region between 5,299–5,722 on the HCV reference genome (accession no. D90208.1), including parts of NS3, NS4A, and NS4B (Fig. 6A). In the validation, two of the Ion Proton reads were aligned to the region the SV occurred in. Also, in the case of SV10 (Fig. 6B), the CCS2 reads lacked the sequence between the 3,644–4,763, including part of NS3. In the validation, 925 of the Ion Proton reads were aligned to the region the SV occurred in. Of these 10 SVs, six (SV1, 2, 4, 5, 6, and 8) existed at the pre-treatment and vanished after the treatment. In contrast, two SVs (SV3 and 7) appeared for the first time after treatment failure. The remaining SV (SV9) was detected throughout the treatment period.

Figure 6
figure6

The detection of the genome structure harbouring SVs between the NS3 and NS5A gene regions. The mapping software ngmlr version 0.2.6 aligned ≥5-pass CCS reads to the HCV genome, and the structural variation (SV)-calling software, Sniffles, detected ≥30-bp SVs. The figure demonstrates that two deletions exist at a relative abundance of ≥1% in the samples. In each panel, the upper schema shows the HCV genome with SV from the NS3 to the NS5A gene regions. The red symbols stand for the region that the SV in question occurred in. In particular, the inverted letters in the schema mean “inversion,” and the boxes with dash lines mean “deletion.” The genome positions described here are based on the HCV reference sequence (accession no. D90208.1). The lower image demonstrates the alignment of the representative reads supporting the SVs by the MUMmer program. The x-axis shows the position of the HCV reference genome. The Y-axis shows the position of the reads harbouring SVs. The purple solid line shows the alignment for an HCV reference genome without SVs. The blue solid line shows the inverted alignment for an HCV reference genome. The red point shows the SV region on the CCS2 read, which was used for SV validation. (A) In sample #12-SMV-post, a 423-bp deletion was detected. (B) In #12-SOF/LDV-pre, 1,119-bp deletion was clonally detected.

As shown in these SVs, we identified HCV clones harbouring simple SVs and their various dynamics.

Source Link

Related Articles

Leave a Comment

This website uses cookies to improve your experience. We will assume you are ok with this, but you can opt-out if you wish. Accept Read More

%d bloggers like this: