Home Liver Research CT-ORG, a new dataset for multiple organ segmentation in computed tomography

# CT-ORG, a new dataset for multiple organ segmentation in computed tomography

This section describes the process of annotating the CT images with organ masks. Because computation was used to accelerate the process, we first describe the mathematical background, then proceed to the specific morphological algorithms used to segment the lungs and bones, and finally the manual annotation process used for all organs.

### Morphological segmentation

We used morphological algorithms to generate training data for the bones and lungs. This concept is called weak supervision, an active area of machine learning research. In the medical domain, weak supervision was previously exploited for brain ventricle segmentation9.

In what follows, we describe the basics of n-dimensional image morphology, and how we accelerated these operations using Fourier transforms. Then we describe the specific algorithms used to segment the lungs and bones.

### Morphology basics, acceleration by Fourier transforms

Let (f:{{mathbb{Z}}}^{n}to {{mathbb{F}}}^{2}) denote a binary image, and (k:{{mathbb{Z}}}^{n}to {{mathbb{F}}}^{2}) the structuring element. Then the familiar operation of morphological dilation can be written as

$$D(f,k)(x)=left{begin{array}{cc}0, & (fast k)(x)=0\ 1, & {rm{otherwise}}.end{array}right.$$

(1)

That is, we first convolve f with k, treating the two as real-valued functions on ({{mathbb{Z}}}^{n}). Then, we convert back to a binary image by setting zero-valued pixels to black, and all others to white. Erosion is computed similarly: let (bar{f}) denote the binary complement of f; then erosion is just (E(f,k)(x)=bar{D(bar{f},,k)}). Similarly, the opening and closing operations are compositions of erosion and dilation. Note that in actual implementation, it is more numerically stable to take (fast k < epsilon ), for some small (epsilon ), as a proxy for (fast k=0). Also, for finite image volumes, the convolution must be suitably cropped so that (fast k) has the same dimensions as f. This can be implemented by shifting the phase of the Fourier transforms and extrapolating the out-of-bounds values with 0 for (widehat{f}) and 1 for (widehat{bar{f}}).

The advantage of writing dilation this way is that all of the basic operations in n-dimensional binary morphology reduce to a mixture of complements and convolutions. Complements are .inexpensive to compute, while convolutions can be accelerated by fast Fourier transforms, due to the identity (widehat{fast k}=widehat{f},cdot ,widehat{k}), where (widehat{f}) denotes the Fourier transform of f. This allows us to quickly generate training labels by morphological operations, which is especially beneficial when the structuring elements are large. In what follows, we describe how morphology is used to generate labels for two organs of interest, the skeleton and lungs. These were chosen because their size and intensity contrast enable detection by simple thresholding.

### Morphological detection and segmentation of CT lungs

The lungs were detected and segmented based on the simple observation that they are the two largest air pockets in the body. The morphological algorithm is as follows:

1. 1.

Extract the air pockets from the CT scan by removing all voxels greater than τ = −150 Hounsfield units (HU). The resulting mask is called the thresheld image, denoted fτ.

2. 2.

Puncture the thin wall of the exam table by closing fτ with a rectangular prism-shaped structuring element of width 1 × 10/d × 1 voxels, where d is the pixel spacing in mm. This connects the air inside the hollow exam table to the air outside the patient, assuming the usual patient orientation.

3. 3.

Remove any mask component that is connected to the boundary of any axial slice. This removes air outside of the body, as well as the hollow interior of the exam table, while preserving the lungs.

4. 4.

Remove the chest wall and other small air pockets by opening the image using a spherical structuring element with a diameter of 1 cm.

5. 5.

From the remaining mask, take the two largest connected components, which are almost certainly the lungs.

6. 6.

Finally, undo the effect of erosion by taking the components of fτ which are connected to the two detected lungs.

Note that the final step is a form of morphological reconstruction, as elaborated in Vincent10.

This algorithm is fairly robust, on both abdominal as well as full-body CT exams showing the full width of the exam table. The only major drawback is that the lungs are not separated from the trachea, which may not necessarily be an issue. See the Experimental results section for a quantitative evaluation. An example output is shown in Fig. 1. A quick web search reveals that similar algorithms have previously been used to segment the lungs, although not necessarily for the purpose of training a deep neural network.

### Morphological detection and segmentation of CT bones

Bone segmentation proceeds similarly to lung segmentation, by a combination of thresholding, morphology and selection of the largest connected components. This time we define two intensity thresholds, τ1 = 0 and τ2 = 200 HU. These were selected so that almost all bone tissue is greater than τ1, while the hard exterior of each bone is usually greater than τ2. The algorithm is as follows:

1. 1.

Threshold the image by τ2. This extracts the hard exteriors of the bones, but also inevitably includes some unwanted tissues, such as the aorta, kidneys and intestines, especially in contrast-enhanced CTs.

2. 2.

Select only the largest connected component from the thresheld image, which is the skeleton. This removes unwanted hyper-intense tissues which are typically not connected to the skeleton. It does have the drawback of excluding the ribs on abdominal images, in which they are disconnected from the rest of the skeleton. However, this drawback is acceptable for the purposes of generating training data.

3. 3.

Close the mask using a spherical structuring element with a diameter of 2.5 cm. This fills gaps in the cortical bone, which may be too thin to be seen in digital CT images.

4. 4.

Apply the threshold τ1 to remove most of this unwanted tissue between the bones, which might have been closed in the previous step.

5. 5.

For each xy-plane (axial slice) in the image, fill any holes not connected to the boundary. This fills in the centers of large bones, such as the pelvis and femurs, assuming the usual patient orientation.

This simple algorithm is sufficiently accurate to train a deep neural network, and serves as a useful basis for future manual refinement. The accuracy is evaluated quantitatively in the [subsec:experiment_CT-organ-segmentation]Experimental results section. See Fig. 2 for an example output, which omits some sections of the sacrum and pelvis that will need to be manually corrected in the testing set.

### Manual annotation

Morphological segmentation is sufficiently accurate to create training data, but manual correction is required for the more accurate testing set. Furthermore, segmentation based on thresholding is not suitable for soft tissues, which exhibit poor intensity contrast in CT. For these reasons, we annotated the remaining organs using the ITK-SNAP software11. The annotation process consisted of manual initialization, followed by active contour segmentation with user-adjusted parameters, and finally a manual correction using the 3D paintbrush tool. The liver, kidneys, bladder and brain were manually annotated in both the testing and training set. The lungs were manually annotated in the test set only. The bones followed the same basic procedure as the lungs, but we saved time on the initialization stage by starting from the output of the morphological algorithm, followed by manual refinement. Refining the bone masks consisted mostly of including small, disconnected bones such as the ribs and scapulae, and removing the spinal cord.

All manual annotations were made or overseen by a graduate student with several years’ experience annotating CT images. Following the initial annotation, the 21 cases in the test set were reviewed by an experienced board-certified radiologist, after which 9 out of 21 cases were refined to a higher degree of accuracy. The radiologist noted that the annotations were of high quality, as subsequent corrections were minor.

In the majority of cases, probably around 130 out of the 140 total, liver annotations were taken from the existing LiTS Challenge dataset2. We added our own liver annotations where the original dataset was missing them, as well as for new images not in LiTS. We included LiTS liver annotations in both the test and training set, as their methodology for manual annotation was very similar to ours.

### Human subjects

All imaging data was either publicly available, or collected from Stanford Healthcare. This study was approved by the relevant IRB.