Self-supervised deep learning model for COVID-19 lung CT image segmentation highlighting putative causal relationship among age, underlying disease and COVID-19

Background Coronavirus disease 2019 (COVID-19) is very contagious. Cases appear faster than the available Polymerase Chain Reaction test kits in many countries. Recently, lung computerized tomography (CT) has been used as an auxiliary COVID-19 testing approach. Automatic analysis of the lung CT images is needed to increase the diagnostic efficiency and release the human participant. Deep learning is successful in automatically solving computer vision problems. Thus, it can be introduced to the automatic and rapid COVID-19 CT diagnosis. Many advanced deep learning-based computer vison techniques were developed to increase the model performance but have not been introduced to medical image analysis. Methods In this study, we propose a self-supervised two-stage deep learning model to segment COVID-19 lesions (ground-glass opacity and consolidation) from chest CT images to support rapid COVID-19 diagnosis. The proposed deep learning model integrates several advanced computer vision techniques such as generative adversarial image inpainting, focal loss, and lookahead optimizer. Two real-life datasets were used to evaluate the model’s performance compared to the previous related works. To explore the clinical and biological mechanism of the predicted lesion segments, we extract some engineered features from the predicted lung lesions. We evaluate their mediation effects on the relationship of age with COVID-19 severity, as well as the relationship of underlying diseases with COVID-19 severity using statistic mediation analysis. Results The best overall F1 score is observed in the proposed self-supervised two-stage segmentation model (0.63) compared to the two related baseline models (0.55, 0.49). We also identified several CT image phenotypes that mediate the potential causal relationship between underlying diseases with COVID-19 severity as well as the potential causal relationship between age with COVID-19 severity. Conclusions This work contributes a promising COVID-19 lung CT image segmentation model and provides predicted lesion segments with potential clinical interpretability. The model could automatically segment the COVID-19 lesions from the raw CT images with higher accuracy than related works. The features of these lesions are associated with COVID-19 severity through mediating the known causal of the COVID-19 severity (age and underlying diseases). Supplementary Information The online version contains supplementary material available at 10.1186/s12967-021-02992-2.

The Med-Seg (medical segmentation) COVID-19 Data set as shown in Figure 2D is called as Data set 1.
The Data sets 2 and 3 are detailed as follows: Data set 2: This is the original dataset that was used to evaluate the SInfNet [2]. It contains 50 single labeled CT lung images and 48 multi labeled CT lung images for the training set; 48 single & multi labeled images for testing set. There is no validation set. Data set 3: The dataset contains 750 CT images for which the segmentation mask is available [3]. These come from 150 patients with novel-coronavirus pneumonia. The images were labelled by a panel of five senior radiologist with over 25 years of experience. The labels used were healthy lung-field, GGO and consolidation. We used the labelled CT images to train a U-Net semantic segmentation model that effectively segments the lung field present in the CT image. Using this model, as well as the opening and closing morphological transformations for noise reduction, we cropped the CT images so that they would only include lung field. Then, for efficiency reasons, we took the middle most slice of each CT scan and removed all others. This ensures that we have a data set with a similar amount of diversity to the original data set, while being significantly smaller. After this, we manually removed any CT images that did not have the lungs in full view or had a significant amount of non-lung field present in the CT image.

Ablation studies
We carried out ablation studies to compare the performance differences between the combination of the different techniques that we incorporated into the multi SSInfNet. This analysis helped us determine which one contributes to the improved performance. We carried out 4 different ablations of our proposed Multi SSInfNet: Multi SSInfNet, Multi SSInfNetfocal loss (without focal loss), Multi SSInfNetlookahead optimizer (without lookahead optimizer), Multi SSInfNetfocal loss lookahead optimizer (without focal loss and lookahead optimizer). All other parameters are maintained the same with focal loss alpha as 1 and gamma as 2, the lookahead optimizer k value as 5 and alpha as 0.5.

Computation cost analysis
We performed a computation cost analysis to show the difference between the different models' computation efficiency.

Supplementary Figure 2.
A is the original architecture of the SInfNet. B is the architecture of our self-supervised InfNet model. Highlighted purple block is the difference between the original single SInfNet and the single SSInfNet.

Supplementary Figure 3.
A is the architecture of the original multi SInfNet model. B is the architecture of our self-supervised multi InfNet model. Highlighted green block is the difference between the original multi SInfNet and our self-supervised multi SSInfNet.

Area
The number of pixels in the mask.

Energy
The magnitude of voxel values in an image.
Here, c is optional value shifting the intensities to prevent negative values in Total Energy Energy scaled by the volume of the voxel.

Entropy
The uncertainty/randomness in the image values.

Minimum
The Minimum of min( ) 10th percentile The 10th percentile of 90th percentile The 90th percentile of Maximum The maximum of max( )

Mean
The average gray level intensity.

Median
The median gray level intensity. The mean distance of all intensity values from the Mean Value of the image array.

Interquartile
The mean distance of all intensity values from the mean value. The square-root of the mean of all the squared intensity values.

Skewness
The asymmetry of the distribution of values about the mean value.
Kurtosis A higher value means that the mass of the distribution is concentrated towards the tail(s) rather than towards the mean. A lower value means that the mass of the distribution is concentrated near the mean value.
The mean of the squared distances of each intensity value from the Mean value.
Uniformity A higher value means a smaller range of discrete intensity.
Gray Level Co-occurrence Matrix (GLCM) Features (28) Autocorrelation the magnitude of the fineness and coarseness of texture.

Joint Average
Returns the mean gray level intensity of the distribution.
Cluster Prominence The skewness and asymmetry of the GLCM.

Cluster Shade
The skewness and uniformity of the GLCM.
Cluster Tendency The grouping of voxels with similar gray-level values.

Contrast
The local intensity variation. A larger value is associated with a greater disparity among neighboring voxels.
Correlation The linear dependency of gray level values to their respective voxels in the GLCM.
The difference between occurrences of pairs with similar intensity values and occurrences of pairs with differing intensity values.
Difference Entropy The randomness/variability in neighborhood intensity value differences.
The heterogeneity that places higher weights on differing intensity level pairs that deviate more from the mean.

Joint Energy
The homogeneous patterns in the image. A greater Energy implies that there are more instances of intensity value pairs in the image.

Joint Entropy
The randomness/variability in neighborhood intensity values.
The correlation between the probability distributions of and (quantifying the complexity of the texture), using mutual information I (x, y) The correlation between the probability distributions of and (quantifying the complexity of the texture).
The local homogeneity of an image.
The complexity of the texture The local homogeneity of an image.
Inverse Difference (ID) The local homogeneity of an image. With more uniform gray levels, the denominator will remain low, resulting in a higher overall value.
The local homogeneity of an image. IDN normalizes the difference between the neighboring intensity values by dividing over the total number of discrete intensity values.
∑ p x−y (k) Inverse Variance Maximum Probability The most predominant pair of neighboring intensity values. max(p(i, j))

Sum Average
The relationship between occurrences of pairs with lower intensity values and occurrences of pairs with higher intensity values.
Sum Entropy The distribution of neighboring intensity level pairs about the mean intensity level in the GLCM.
The distribution of small dependencies. A larger value indicates less homogeneous textures.
The distribution of large dependencies. A larger value means more homogeneous textures.
The similarity of gray-level intensity values in the image Dependence Non-Uniformity The similarity of dependence throughout the image.
The variance in grey level in the image.
Dependence Variance The variance in dependence size in the image.
Low Gray Level (LGL) Emphasis The distribution of low gray-level values, with a higher value indicating a greater concentration of low gray-level values in the image.

High Gray Level (HGL) Emphasis
The distribution of the higher gray-level values, with a higher value indicating a greater concentration of high gray-level values in the image.
The joint distribution of small dependence with lower gray-level values.
The joint distribution of small dependence with higher graylevel values.

Large Dependence Low Gray Level (LDLGL) Emphasis
The joint distribution of large dependence with lower gray-level values.
The joint distribution of large dependence with higher graylevel values.

Coarseness
An indicator of the spatial rate of change. Higher value indicates lower spatial change rate and a locally more uniform texture.

Contrast
The spatial intensity change depending on the overall gray level dynamic range.

Busyness
The change from a pixel to its neighbor. A high value indicates a rapid changing.
Complexity The non-uniformity and busyness of the image.
Strength A greater value means slow change in intensity but larger coarse differences in gray level intensities. Supplementary