Single-molecule real-time sequencing of the region from the NS3 to the NS5A genes in the HCV genome
We conducted SMRT sequencing for the 35 HCV-samples and obtained a total of 9,342,760 raw reads with 25,251,529,303 bp (Fig. 1, Supplementary Table S1). To obtain more accurate reads, we executed a Consensus Circular Sequencing 2 (CCS2) algorithm on the error-prone raw reads using the pbsmrtpipe software (Pacific Biosciences). The CCS2 algorithm interlaces all raw reads (called 1-pass when only one raw read was sequenced from one adapter to another adapter, and called 2-pass when two raw reads were sequenced from one adapter to another adapter) derived from the same DNA template. The pbsmrtpipe and blastn software, and the in-house perl script generated a total of 284,565 ≥ 5-pass CCS2 reads with primer sequences at each end (888,923,070 bp). The CCS2 reads were equivalent to a 7,516× coverage, on average, of the region between the NS3 and NS5A genes (Supplementary Fig. S1). The mean quality value of the ≥5 pass-CCS2 reads was high, at 55.4–73.93, and the sequencing error rate was low, at 2.88 × 10−6–4.05 × 10−8%.
To assess the error rate in the whole procedure, we used two samples of HCV-containing plasmid as the control sequence. SMRT sequencing generated a total of 819,416 raw reads of the NS3-to-NS5A region with 2,177,357,653 bp. After error correction by the CCS2 algorithm, we obtained 38,364 ≥5-pass CCS2 reads (101,683,536 bp). These ≥5-pass CCS2 reads were aligned to the HCV1b reference (accession no. AB047639.1) using blastn version 2.2.29 with customised parameters (-dust no)26. Based on the blastn results, we found the mismatch rate to be 0.023% (SD = 0.039, 95% CI: 0.023–0.024) and the error rate to be 0.267% (SD = 0.298, 95% CI: 0.264–0.270) (Supplementary Table S2).
The sequence data suggested that we obtained more than 7,000 accurate long-reads per sample, derived from the region between the NS3 and NS5A genes.
Determination of RAS haplotypes of the HCV genome
To analyse the dynamics of the drug-resistant viral clones during the DAA treatment in a simple manner, we determined the linkage of seven RASs (Q80 and D168 in NS3, and R30, L31, P32, Q54, and Y93 in NS5A) from the NS3 to NS5A genes of each viral clone, and reduced the dimensions of the haplotype data (Fig. 2).
First, to determine the position of the NS3 and NS5A genes in each clone, we aligned the ≥5-pass CCS reads to the sequences of the NS3 (3,408–5,300) and NS5A genes (6,246–6,735) in the HCV reference genome (accession no. D90208.1) by blastn version 2.2.29 with customised parameters (-dust no)26. According to the results from the blastn, each codon from the CCS2 reads that was in one of the seven RASs, was extracted. Of the CCS2 reads, those possessing frameshift indels or not mapped to NS3 or NS5A were excluded from haplotype determination. Next, to convert the linkage information of the seven codon changes to low-dimensional digital data, we encoded the original codon (wild codon) as 0, a synonymous codon as 1, and a nonsynonymous codon as 2 (Fig. 2). The original codons and the synonymous codons for the seven-RAS loci are defined in Supplementary Table S3. After translating each codon to the code, we constructed a 7-digit code for 7-RAS haplotypes by combining these codes (Fig. 2). Before applying the coding procedure, the total number of HCV clones between the NS3-NS5A genes was 187,539, and the average number was as high as 5,358 (SD = 3460.05) haplotypes per sample (Table 1). In contrast, after the construction of the 7-digit coded haplotypes, the number of the viral haplotypes were summarised to 24 on average (SD = 20.45). Consequently, the coding procedure simplified thousands of HCV clone types into fewer than one hundred types, using the haplotype code for the 7-RAS loci, making the overall picture of the HCV quasispecies in each sample clearer and easier to understand.
Dynamics of the 7-RAS haplotypes between the DAA pre- and post-treatment
To examine the dynamics of the 7-RAS haplotypes, we listed the haplotype codes at each timepoint and compared the relative frequency of the 7-RAS haplotypes between the pre- and post-treatment samples of a total of 15 DAA treatments (before and after DCV/ASV in cases #1–#12, and before and after SMV in cases #9, #10 and #12; Supplementary Table S4). Then, to exclude the haplotypes derived from artefacts, we set the mismatch rate + 2 SD as the threshold (0.103%) and listed the haplotype codes with over 0.103% frequency in the samples. In 5 of the 15 paired comparisons (#5-DCV/ASV-pre/post, #8-DCV/ASV-pre/post, #9-SMV-pre/post, #10-SMV-pre/post and #10-DCV/ASV-pre/post), the major haplotypes had multiple RASs present when the treatment failed, also existing in 0.14–1.26% of the CCS2 reads at pre-treatment. Meanwhile, in the other ten paired comparisons, the major haplotypes only had RASs present when the treatment failed. Of these 5 paired comparisons, the dynamics of the haplotypes in the 2 representative cases are demonstrated in Fig. 3 (#5-DSV/ASV-pre/post paired samples and #8-DSV/ASV-pre/post paired samples).
During the DCV/ASV treatment in case #5 (Fig. 3A), the haplotype code “0202012” was the ninth most frequent haplotypes in the pre-treatment samples. However, after treatment, haplotype “0202012” increased to the most frequent haplotype. Likewise, with the example of case #5 (Fig. 3B), its ninth most frequent haplotype before treatment, “0202022”, became the most frequent haplotype after treatment failure, and showed resistance to DCV and ASV due to the nonsynonymous changes of NS3-D168, NS5A-L31, and NS5A-Y93. These data suggest that low-abundance multidrug-resistant viral clones exist before the DAA treatment.
We also focused on the change in the number of haplotype codes from the pre-treatment of DAAs (DCV/ASV and SMV) to relapse (Fig. 4A). With DCV/ASV treatment, the number of haplotypes significantly decreased by 8.08 (SD = 18.77, 95% CI: 4.5–12.5) from 14.67 (SD = 9.12) with pre-treatment of DCV/ASV, and to 6.58 (SD = 7.1) at relapse (p = 0.00293). In contrast, comparing the means of the nonsynonymous codons of the 7-RAS loci per CCS2 read before and after DSV/ASV therapy, the number of nonsynonymous codons significantly increased by 2.14 (SD = 0.96, 95% CI: 1.35–2.91) from 1.50 (SD = 0.92) at the pre-treatment to 3.64 (SD = 0.75) at treatment failure (p = 0.0004883) (Fig. 4B). With SMV treatment, neither the number of haplotype codes (mean = 4, SD = 10.68, 95% CI: −5–19) nor the number of nonsynonymous codons (mean = 0.87, SD = 0.17, 95% CI: 0.63–1.00) showed significant differences, due to the small paired-sample size (Fig. 4C,D).
Although the coded haplotypes lacked the detailed information of each codon, the comparison of haplotype codes during DAA therapy suggests that rare HCV clones with multiple RASs at pre-treatment might be the cause of relapse as they became the major haplotype after treatment failure. In addition, the significant reduction in the number of haplotype codes and the significant gain in the nonsynonymous codons at RAS loci at treatment failure, indicates the clonal selection of viruses with survival benefit under anti-HCV treatment.
Structural variations detected by SMRT sequencing
To understand the viral clones at a deeper level, we analysed structural variation in the HCV-RNA genome at single-clone resolution. To call candidates with ≥30-bp SVs, we executed ngmlr and Sniffles for each sample. In all cases, a total of 6,512 CCS2 reads had SVs, corresponding to 2.29% of all ≥- 5-pass CCS2 reads. In particular, deletions were detected in 4,393 of CCS2 reads (1.54%), while duplications, insertions, inversions, and U-turns (INVDUP) were detected in 220 (0.08%), 61 (0.02%), 68 (0.02%), and 1,906 reads (0.67%), respectively (Fig. 5, Supplementary Table S5).
We focused on the 10 SVs detected with frequencies of ≥1% of the CCS2 reads, and examined the changes in their frequencies after the treatment (Supplementary Fig. S2, Supplementary Table S6). Two examples (SV7 and 10) of the 10 SVs are shown in Fig. 6A,B, respectively. In the case of SV7, the CCS2 reads lacked the sequence located at the region between 5,299–5,722 on the HCV reference genome (accession no. D90208.1), including parts of NS3, NS4A, and NS4B (Fig. 6A). In the validation, two of the Ion Proton reads were aligned to the region the SV occurred in. Also, in the case of SV10 (Fig. 6B), the CCS2 reads lacked the sequence between the 3,644–4,763, including part of NS3. In the validation, 925 of the Ion Proton reads were aligned to the region the SV occurred in. Of these 10 SVs, six (SV1, 2, 4, 5, 6, and 8) existed at the pre-treatment and vanished after the treatment. In contrast, two SVs (SV3 and 7) appeared for the first time after treatment failure. The remaining SV (SV9) was detected throughout the treatment period.
As shown in these SVs, we identified HCV clones harbouring simple SVs and their various dynamics.