Deep learning model for classifying endometrial lesions

Background Hysteroscopy is a commonly used technique for diagnosing endometrial lesions. It is essential to develop an objective model to aid clinicians in lesion diagnosis, as each type of lesion has a distinct treatment, and judgments of hysteroscopists are relatively subjective. This study constructs a convolutional neural network model that can automatically classify endometrial lesions using hysteroscopic images as input. Methods All histopathologically confirmed endometrial lesion images were obtained from the Shengjing Hospital of China Medical University, including endometrial hyperplasia without atypia, atypical hyperplasia, endometrial cancer, endometrial polyps, and submucous myomas. The study included 1851 images from 454 patients. After the images were preprocessed (histogram equalization, addition of noise, rotations, and flips), a training set of 6478 images was input into a tuned VGGNet-16 model; 250 images were used as the test set to evaluate the model’s performance. Thereafter, we compared the model’s results with the diagnosis of gynecologists. Results The overall accuracy of the VGGNet-16 model in classifying endometrial lesions is 80.8%. Its sensitivity to endometrial hyperplasia without atypia, atypical hyperplasia, endometrial cancer, endometrial polyp, and submucous myoma is 84.0%, 68.0%, 78.0%, 94.0%, and 80.0%, respectively; for these diagnoses, the model’s specificity is 92.5%, 95.5%, 96.5%, 95.0%, and 96.5%, respectively. When classifying lesions as benign or as premalignant/malignant, the VGGNet-16 model’s accuracy, sensitivity, and specificity are 90.8%, 83.0%, and 96.0%, respectively. The diagnostic performance of the VGGNet-16 model is slightly better than that of the three gynecologists in both classification tasks. With the aid of the model, the overall accuracy of the diagnosis of endometrial lesions by gynecologists can be improved. Conclusions The VGGNet-16 model performs well in classifying endometrial lesions from hysteroscopic images and can provide objective diagnostic evidence for hysteroscopists.


Background
At the clinic, patients are often diagnosed with suspected endometrial lesion due to symptoms such as abnormal uterine bleeding or infertility [1,2]. Transvaginal ultrasound and diagnostic hysteroscopy are common gynecological examinations to diagnose endometrial lesions conclusively [3][4][5]. Transvaginal ultrasound is usually the first choice, but it has low diagnostic specificity and does not enable physicians to obtain pathological tissue specimen; in some cases, further hysteroscopy is required. [3][4][5]. Diagnostic hysteroscopy is a minimally invasive examination through which the hysteroscopist can directly observe the endometrial lesions and normal endometrium in the patient's uterine cavity, so that the gynecologist can make a more accurate primary diagnosis [6]. These endometrial lesions include endometrial polyps, submucous myomas, intrauterine adhesions, endometrial hyperplasia, malignancies, intrauterine foreign bodies, placental remnants, and endometritis [6]. An accurate primary diagnosis helps gynecologists to explain the condition to patients and decide on a primary treatment. However, the diagnostic performance of hysteroscopy for these lesions depends on the experience of the hysteroscopist, resulting in a degree of subjectivity in the gynecologist's diagnosis [7]. A stable and objective computer-aided diagnosis (CAD) system could shorten the learning curve of inexperienced gynecologists and effectively reduce the subjectivity (interobserver error) of gynecologist diagnosis.
Deep learning is a discipline that has recently played a prominent role in fields such as computer vision, speech recognition, and natural language processing [8]. Many practices in the medical field have also benefited from the use of deep learning, including identifying potential depression patients in social networks and locating the cecum in surgical videos [9,10]. Convolutional neural networks (CNNs) are a class of algorithms that excel in image classification tasks in deep learning, especially for classifying or detecting objects that can be directly observed [11]. It has been reported that CNNs can diagnose skin cancer at a level no less than that of experts [12]. The ability of CNNs to classify laryngoscopic images in most cases exceeds that of physicians [13]. There have been many other reports of endoscopic CAD systems based on deep learning, and excellent results have been achieved in cystoscopy, gastroscopy, enteroscopy, and colposcopy [14][15][16][17]. Deep learning has previously been applied in the field of hysteroscopy: Török reported the use of fully convolutional neural networks (FCNNs) to segment uterine myomas and normal uterine myometrium [18], and Burai used FCNNs to identify the uterine wall [19].However, no CNN-based CAD system for hysteroscopy has yet been reported.
This study considers the five most common endometrial lesions: endometrial hyperplasia without atypia (EH), including simple and complex hyperplasia; atypical hyperplasia (AH); endometrial cancer (EC); endometrial polyps (EPs); and submucous myomas (SMs) [20]. This study aimed to construct a CNN-based CAD system that can classify endometrial lesion images obtained from hysteroscopy and to evaluate the diagnostic performance of this model. The results show that the CAD system slightly outperforms gynecologists in classifying endometrial lesion images. It provides evidence of the feasibility of using artificial intelligence to assist in clinical diagnosis of endometrial lesions.

Dataset
This study retrospectively collected images of patients who underwent hysteroscopic examination at the Shengjing Hospital of China Medical University from 2017 to 2019, which confirmed the presence of endometrial lesions. All images were taken using an Olympus OTV-S7 (Olympus, Tokyo, Japan) endoscopic camera system with a resolution of 720 × 576 pixels and were stored in JPEG format. Images meeting the following criteria were excluded: (a) poor quality or unclear images; (b) images with no lesions in the field of view; (c) images with a large amount of bleeding in the field of view; (d) images from patients with an intrauterine device or who were receiving hormone therapy; (e) images from patients with multiple uterine diseases; and (f ) images from patients without histopathological results. The resulting dataset included 1851 images from 454 patients, including 509 EH, 222 AH, 280 EC, 615 EP, and 225 SM images. We randomly extracted 250 images (50 images for each category) from the dataset as the testing and validation set, and the remaining images were used as the original training set for data augmentation and model training. Table 1 shows the detailed dataset partition used in this study. Subsequently, the test set was randomly divided into two parts (125 images per part) to explore the role of the model in assisting gynecologists to diagnose endometrial lesions. This study was approved by the Ethics Committee of Shengjing Hospital (No. 2017PS292K).

Data preprocessing
All images were manually cropped by gynecologists to remove excessive non-lesion regions and retain the region of interest, thus preventing irrelevant features from disturbing the performance of the deep learning model. To improve the generalizability and robustness of the deep learning model, we performed data augmentation on the training set, including color histogram equalization, random addition of salt-and-pepper noise, 90° and 270° rotations, and vertical and horizontal flips ( Fig. 1). The final training set was augmented from 1601 to 6478 images. The test set was not processed. Finally, all images were resized to 224 × 224 pixels and rescaled for training, validating, and testing.

Convolutional neural network and transfer learning
We selected VGGNet [21] as the main structure of our deep learning model and tuned it to implement transfer learning [22]. VGGNet was developed by the Oxford Visual Geometry Group and won second place in the image classification task of the 2014 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [23]. It has a top-5 accuracy of 92.3% in classifying 1000 object categories. Compared to AlexNet [24], the winner of ILSVRC 2012, VGGNet uses a smaller convolution kernel and deepens the network to achieve better results [23]. VGGNet-16 and VGGNet-19 are commonly used versions of VGG-Net. There is no significant difference in the effect of the two in application, but VGGNet-16 has fewer layers and parameters than VGGNet-19 [21]. This provides VGG-Net-16 with shorter processing time and lower storage space usage than VGGNet-19, so we selected VGG-Net-16 as our model network.
We employed the VGGNet-16 CNN, pretrained on ImageNet, and adjusted its 4096 neurons in the fully connected layer to 512 neurons and its 1000-category output layer to 5 categories. We added a batch normalization layer after each convolutional layer to improve the training speed of the model [25]. Important training parameters were set as follows: (a) the input shape was 224 pixel × 224 pixel × 3 channel; (b) the batch size was 64; (c) the number of training epochs was 200; and (d) the optimizer used was stochastic gradient descent (SGD) with a learning rate of 0.00001 and a momentum of 0.9. The structure of our CNN is shown in Fig. 2, and a summary of the model is shown in Additional file 1: Table S1. The CNN for this research was built using the open source Keras neural network library [26]. Our fine-tuned VGG-Net-16 CNN was used for transfer learning and endometrial lesion classification task.

Performance evaluation metrics
To evaluate the diagnostic performance of the CNN model, one chief physician with more than 20 years of experience and two attending physicians with more than 10 years of experience in hysteroscopic examination and surgery diagnosed lesions using all images in the test set without knowing the histopathological results. These diagnoses were compared with the diagnostic results of the CNN model.
To explore the auxiliary role of the model in the diagnosis of endometrial lesions by gynecologists, three other licensed gynecologists performed direct diagnosis and model-aided diagnosis on two randomly divided test sets without knowing the histopathological results.
We present the results in two ways: five-category and two-category classification. In the first task, each lesion was classified as EH, AH, EC, EP, or SM. In the second task, lesions were categorized as premalignant/malignant (AH and EC) or benign (EH, EP, and SM).
The diagnostic performance of the model and that of gynecologists is initially demonstrated using confusion matrix, which records the samples in the test set according to their true and predicted categories in the form of a matrix, but it is not a direct evaluation metric. The actual evaluation metrics used in this study were derived from the confusion matrix, which shows the numbers of true positive (TP), false positive (FP), false negative (FN), and true negative (TN) classifications. The secondary evaluation metrics calculated from the primary evaluation metrics are as follows: All calculation and visualization operations were implemented in Python Version 3.7.0.

Results
During training, the model's accuracy changed with increase in epochs, as shown in Fig. 3. After 90 epochs, the validation accuracy plateaued. F1 -Score = 2 × P × TPR/(P + TPR); Area under the curve (AUC): the area under the receiver operating characteristic (ROC) curve.

Five-category classification task
For the five-category classification task, the VGGNet- 16   To directly observe the clustering of the five types of lesions, we applied the t-distributed stochastic neighbor embedding (t-SNE) [27] dimension reduction algorithm. The 512-dimensional output of all images in the test set of the last fully connected layer was reduced to two dimensions and is displayed in Fig. 6. We can see from this figure that most of the images are mapped on their own fixed areas, but there is an area of overlap between EH, AH, and EC. To deepen our understanding of the CNN's calculation process, we output the sum feature maps of an SM image in the test set at each convolutional layer, batch normalization layer, and MaxPool layer of the VGGNet-16 model and superimposed them on the original image after upsampling these sum feature maps. The superimposed heatmaps are shown in Fig. 7. Some examples of the model's classification are shown in Fig. 8.

Two-category classification task
When classifying premalignant/malignant and benign lesions, the accuracy, sensitivity, specificity, precision, f1-score, and AUC of the VGGNet-16 model were 90.8%, 83.0%, 96.0%, 93.3%, 87.8%, and 0.944. The accuracy of the three gynecologists was 86.8%, 82.4%, and 84.8%, and their AUCs were 0.863, 0.813, and 0.842, respectively. In this task, both the model and the gynecologists improved their performance significantly compared with the fivecategory classification task. Detailed two-category diagnostic performance evaluation metrics are shown in Table 3. The two-category ROC curve of the model and gynecologists is shown in Fig. 9.

Comparison between model-aided diagnosis and direct diagnosis by gynecologists
After we split the test set equally at random, the test sets Part I and Part II were used for direct diagnosis and model-aided diagnosis by gynecologists. The accuracies of direct diagnosis of test set Part I by the three  Figure S1. The confusion matrices of the direct diagnoses and model-aided diagnoses by gynecologists are shown in Additional file 3: Figure S2.

Discussion
This study explored the classification ability of the VGG-Net- 16  80.8% and 90.8% in our five-category and two-category classification tasks.
The benefit of CNN model is that the output provides the probability that a given hysteroscopy image belongs to each category. Even if the model makes a misclassification, the output contains a specific probability of the correct label. In contrast, it is difficult for hysteroscopists to give specific probabilities for their diagnoses. In most cases, gynecologists can only give two judgments: yes or no. The ability to harness probabilities is an important reason why the CNN model has a significantly higher AUC for each lesion type than the gynecologists. In this study, it has been confirmed that the model output probabilities can provide a b c d Dimension-reduced scatter plot of the last fully connected layer of the VGGNet-16 model. We output the 512-dimensional data of all images in the test set at the last fully connected layer of the optimal model and applied the t-SNE algorithm to reduce the data to two dimensions and show them in a scatter plot, along with some example images. AH atypical hyperplasia, EC endometrial cancer, EH endometrial hyperplasia without atypia, EP endometrial polyps, SM submucous myoma Batch_Norm_13 Block4_MaxPool  a convincing diagnostic reference for gynecologists and effectively reduce the subjectivity of gynecologists' diagnoses. Although the CNN model is difficult to interpret [28], visualizing its calculations and outputs helps us to understand its working process.
In the absence of dynamic vision, diagnosis based only on static local hysteroscopy images led to lower sensitivity and specificity of the gynecologists' diagnoses in this study as compared to results reported in a meta-analysis [29]. Given the appearance similarities of EH, AH, and EC endometrial lesions, it is relatively difficult for both the model and the gynecologists to distinguish between them. In actual clinical practice, hysteroscopists achieve better diagnostic performance through retrospective case data and dynamic vision. Gynecologists will give full consideration to the specific conditions of patients before performing hysteroscopy. For these difficult to distinguish endometrial lesions, gynecologists will actively advise patients to take pathological tissue specimens and submit them for examination during hysteroscopy to confirm the diagnosis and avoid over-or undertreatment. At this stage, the VGGNet-16 model in our study can only be used as an auxiliary diagnostic tool for gynecologists. Gynecologists can refer to the probability provided by the model and combine it with other clinical data to obtain a more accurate preliminary clinical diagnosis before the histopathological results are clear. In future research, we aim to implement a multimodal deep learning model that similarly combines case data and hysteroscopic images [30].
Machine learning and deep learning, an important branch of artificial intelligence, have also made outstanding contributions in the medical field, such as in clinical prediction models and radiomics [31,32]. Regardless of the research direction, these artificial intelligence technologies have considerable clinical application value. We believe that each technology plays a different role in diagnosis, treatment, and the prediction of clinical outcomes. The integration of an artificial intelligence system into each medical subdiscipline, conforming to the clinical diagnosis and treatment process, is the ultimate goal.
The results of this study have demonstrated the feasibility of applying deep learning techniques to the diagnosis of endometrial lesions. Although there is a gap between the diagnostic performance of the model and the histopathological results in this study, under the experimental conditions of this study, the CNN model's ability to classify hysteroscopic images slightly exceeded that of the gynecologists and can provide gynecologists with objective references.
There are some limitations to our research. First, this study included only the five most common endometrial lesions, and lesions with low incidence were not included. Moreover, all images were collected from the same endoscopic camera system of the same hospital, thus the images may lack diversity. Finally, no prospective validation was performed in this study. We speculate that by expanding the dataset samples, the retrained model should achieve better diagnostic performance and generalization capability. Our group will collect more data at multiple centers to retrain the model and implement prospective validation. The model that obtains better diagnostic performance will be considered for application to clinical practice.

Conclusions
In this study, we developed the first CNN-based CAD system for diagnostic hysteroscopy image classification. The VGGNet-16 model used in our study shows comparable diagnostic performance to expert gynecologists in classifying five types of endometrial lesion images. The model can provide objective diagnostic evidence for hysteroscopists and has potential clinical application value.