Nuclei features-based analysis results
First, we analyzed proportion of nuclei belonging to Group I with CellProfiler outputted nucleus features. In Group I cases, 16 of the 23 cases were >20%. In contrast, in 22 of 38 cases in Group III, <10% of nuclei of the nuclei were characterized by early recurrence (Table 2).
ROI-based analysis results
Classification of the ROI of the HCC area into three groups using SVM model training (linear kernel) showed an accuracy of 99.8% (Table 3a). The ROI of the non-HCC area was then classified into three groups using SVM, with a probability of 100% (Table 3b). When the classification formula created using the training set was verified using the test set, the probabilities of correct classification of the ROIs in the HCC and non-HCC areas were 80.6% and 68.1%, respectively (Table 3c, d).
In addition, the information on ROIs contained in the HCC or non-HCC areas were summed, and the accuracy of the classification between the three groups was verified on a case-by-case rather than an ROI basis. The group to which the maximum number of ROIs belonged was the group to which the case belonged. The accuracies for HCC and non-HCC areas were 88.8% and 64.0%, respectively (Table 4a, b).
Aggregated case-based prediction results
Finally, three integrated SVM models; ROI of HCC and non-HCC area based SVM, and nuclei features based SVM, were used for the prediction of HCC recurrence.
The values of A, B, and C were calculated as the average of the probabilities for ROIs in the HCC areas predicted to be Groups I, II, and III, respectively. The values of D, E, and F were also calculated as the averages of the probabilities of ROIs in the non-HCC areas predicted to be Groups I, II, and III, respectively. At the nuclei feature base, G was defined as the percentage of case nuclei in Group I. The prediction algorithm is shown in Fig. 2.
(1) If the value of G was ≥20%, it was assumed that the case was Group I or II. Next, if comparisons of the ROI values of the HCC area showed A > B, the case was categorized as Group I; similarly, if A < B, the case was categorized as Group II.
For example, Case 1 had a G value of 34.8, which is >20%. Next, since A was 0.92 and B was 0.08, A > B; thus, Case 1 was predicted to belong to Group I. Case 25 had a G value of 35.2, also >20%. As A was 0.01 and B was 0.70, this case was predicted to belong to Group II because A < B.
(2) If the G value was 10–19% and A + B and D + E were ≤0.5, the case was predicted to be in Group III. If A + B and D + E were not <0.5, the values of A, B, D, and E were compared. If the value of A or D was larger than the other values, the case was predicted to be in Group I; if the value of B or E was larger, the case was predicted to belong to Group II.
For example, Case 35 had a G value of 17.8. The values of A + B and D + E were 0.93 and 0.98, respectively, both of which were >0.5. Of A, B, D, and E, E was the largest, at 0.98; therefore, the case was predicted to belong to Group II. Similarly, the G value for Case 65 was 18.8. Since A + B was 0.47 and D + E was 0.18, both <0.5, the case was predicted to belong to Group III.
(3) When the value of G was ≤10, the case was predicted to belong to Group I when A + B was >0.5 and A > B, and Group II when A < B, and Group III if A + B < 0.5. For example, Case 41 had a G value of 4.0. A + B was >0.5 and A (0.02) was <B (0.75). Therefore, Case 41 was predicted to belong to the Group II.
With this algorithm, these models showed an accuracy of 89.9% (80/89) (Table 5). Twenty-four cases were classified as Group I, of which 23 were really group I and the remaining one was Group II. Of the 35 cases predicted to be in Group II, 27 were actually Group II; the remaining 8 cases were Group III. Thirty cases predicted to be Group III were actually in Group III.
The prediction algorithm is created on training data set, one-third of ROIs removed as a validation set, SVM models created on the remaining data set. This process was repeated three times (Supplementary Table 2).
Prediction was performed using average value of probability of ROIs belonging to each case. The number of prediction of ROIs is shown in Supplementary Table 3.