Radiomics-based machine learning analysis and characterization of breast lesions with multiparametric diffusion-weighted MR

Background This study aimed to evaluate the utility of radiomics-based machine learning analysis with multiparametric DWI and to compare the diagnostic performance of radiomics features and mean diffusion metrics in the characterization of breast lesions. Methods This retrospective study included 542 lesions from February 2018 to November 2018. One hundred radiomics features were computed from mono-exponential (ME), biexponential (BE), stretched exponential (SE), and diffusion-kurtosis imaging (DKI). Radiomics-based analysis was performed by comparing four classifiers, including random forest (RF), principal component analysis (PCA), L1 regularization (L1R), and support vector machine (SVM). These four classifiers were trained on a training set with 271 patients via ten-fold cross-validation and tested on an independent testing set with 271 patients. The diagnostic performance of the mean diffusion metrics of ME (mADCall b, mADC0–1000), BE (mD, mD*, mf), SE (mDDC, mα), and DKI (mK, mD) were also calculated for comparison. The area under the receiver operating characteristic curve (AUC) was used to compare the diagnostic performance. Results RF attained higher AUCs than L1R, PCA and SVM. The AUCs of radiomics features for the differential diagnosis of breast lesions ranged from 0.80 (BE_D*) to 0.85 (BE_D). The AUCs of the mean diffusion metrics ranged from 0.54 (BE_mf) to 0.79 (ME_mADC0–1000). There were significant differences in the AUCs between the mean values of all diffusion metrics and radiomics features of AUCs (all P < 0.001) for the differentiation of benign and malignant breast lesions. Of the radiomics features computed, the most important sequence was BE_D (AUC: 0.85), and the most important feature was FO-10 percentile (Feature Importance: 0.04). Conclusions The radiomics-based analysis of multiparametric DWI by RF enables better differentiation of benign and malignant breast lesions than the mean diffusion metrics. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-021-03117-5.

Page 2 of 10 Sun et al. J Transl Med (2021) 19:443 Diffusion-weighted imaging (DWI), a noninvasive contrast agent-free method, has been established for breast MR imaging and could improve the diagnostic specificity of lesions suspicious for breast cancer [4,5]. Conventional DWI (mono-exponential model) with 2 to 3 b-values for the measurement of the apparent diffusion coefficient (ADC) is the most commonly used diffusion fitting model for the characterization of breast lesions [6]. Furthermore, several studies have suggested that biexponential (BE), stretched-exponential (SE), or diffusion kurtosis imaging (DKI), fitting with multi-b-value sequences, could provide more accurate information about water diffusion [7][8][9][10]. Le Bihan et al. [11,12] proposed the intravoxel incoherent motion (IVIM) model, a kind of BE fitting, to separately calculate fast and slow diffusion components. The SE model was introduced by Bennett et al. [13] to depict the heterogeneity of intravoxel diffusion rates and the distributed diffusion effect. The DKI model was proposed by Jensen et al. [14] to reflect the complexity of the microenvironment. Since DWI with different fitting models may demonstrate different aspects of tissue properties [7,15,16], informative radiomics features could be derived from these models to better characterize breast lesions.
Radiomics-based analysis profiles lesions with extensive morphological and textural features for latter classification models to attain better differential diagnosis, prognosis prediction, and tumor subtype diagnosis, etc. [17][18][19][20]. Previous radiomics studies [21][22][23] of breast lesions focused more on the modalities of T2WI and DCE. Fewer studies [17,24,25] have considered the value of multiparametric DWI. However, to our knowledge, no study comparing the radiomics features of these different diffusion imaging approaches in the differentiation of breast lesions has been conducted. Further study of multiparametric DWI using more effective machine learning methods is needed to better understand their predictive value in breast cancer diagnosis.
We hypothesized that the diagnostic accuracy of breast lesions using multi-b-value sequences combined with ME, BE, SE and DKI can be improved by radiomicsbased analysis. The purpose of this study is to compare radiomics features and the mean values of diffusion metrics in the assessment of breast lesions with four machine learning methods, i.e., random forest (RF), L1 regularization combined with linear regression (L1R-LR), principal component analysis combined with linear regression (PCA-LR), and support vector machine (SVM).

Study design and patient selection
This retrospective study was performed with a prospectively acquired data set with institutional and governmental review board approval. The local Institutional Review Board (IRB) approved this study. Written informed consent was obtained from each participant. From February 2018 to November 2018, 622 women with lesions suspicious for breast cancer on mammography or ultrasonography (i.e., BI-RADS category 4 or 5) underwent MRI examinations with multi-b DWI. The exclusion criteria included the following: patients previously treated for a malignancy (N = 13), patients without histopathological results (N = 27), and patients with motion artifacts (N = 5), lesions that were not seen in DWI mappings (N = 25), and the mean value of the goodness-of-fit of the diffusion fitting model was less than 0.8 (N = 10). Ultimately, a total of 542 women (mean age, 51 years; age range, 24-84 years), with 542 lesions were enrolled in this study.

MR imaging
All breast MRI examinations were performed on a 1.5 T MR scanner (MAGNETOM Aera, Siemens Healthcare, Erlangen, Germany) with a dedicated 18-channel phasedarray breast coil. The breast MR examinations included fat-suppressed T2-weighted fast spin-echo imaging, T1-weighted imaging (T1WI), DWI, and DCE T1WI. All MR imaging examinations were performed before biopsy. The parameters of the above sequences are shown in Additional file 1: Appendix S1.

Image postprocessing and lesion segmentation
After data acquisition, all images were transferred to N4ITK for the data normalization. Then, all these data were assessed by KS and WC (with 8 years and 12 years of experience in breast imaging) to identify all lesions by using the DWI source images with b values of 1000 s/ mm 2 , T2-weighted images, and the first phase of postcontrast T1-weighted images. Clinical information and the X-ray and US images were provided to the radiologists. The lesions were manually segmented in the DW images (b 1000 ) on all visible sections, resulting in a threedimensional image of the lesion. Lesions were segmented by using the inner border of the lesion to minimize partial volume effects. All volumes of interest (VOIs) were manually segmented and labeled via a free open-source software package (ITK-SNAP, version 3.4.0, http:// www. itksn ap. org). An overview of our workflow is illustrated in Fig. 1.

Diffusion data analysis and processing
All diffusion parameter maps were generated using an in-house MATLAB software (MathWorks, Natick, MA, USA). The software first applied a Gaussian filter with a full width at a half maximum of 3 mm to suppress noise  19:443 in the diffusion images before the pixel-by-pixel fitting process. Four diffusion models are described as follows: 1. ME model ADC maps were generated according to the following equation: where S b represents signal intensity in the presence of diffusion sensitization, and S 0 represents signal intensity in the absence of diffusion sensitization. The ADC _all-b maps were generated by using all 13 b values. The ADC 0-1000 maps were generated by using b values of 0 and 1000.

BE_IVIM model
The IVIM parameters were fitted using the following IVIM model (proposed by Le Bihan et al. [11,12]: where D is the true diffusion as reflected by the pure molecular diffusion, f is the fractional perfusion related to microcirculation, and D * is the pseudodiffusion coefficient that represents perfusion-related diffusion or incoherent microcirculation.

SE model
The SE model was used to obtain the molecular water diffusion heterogeneity index (α) and the distributed diffusion coefficient (DDC) through the following equation: where α is related to the intravoxel molecular water diffusion heterogeneity, which ranges from 0 to 1. A numerically high α value represents low intravoxel diffusion heterogeneity (approaching mono-exponential decay). DDC represents the mean intravoxel diffusion rate. 4. DKI model Calculation of DKI parameters was performed by fitting the following nonlinear equation: where K is a unitless parameter that quantifies the deviation of water motion from the Gaussian distribution. K is zero for a perfect Gaussian diffusion, and a large K indicates considerable deviation of diffusion from a perfect Gaussian behaviour. D is a corrected ADC by removing non-Gaussian bias.  In total, 900 features were extracted. The interclass correlation coefficients (ICCs) were used to determine the interobserver reproducibility of the radiomics features [27].
The mean diffusion metrics of ME (mADC all-b , mADC 0-1000 ), BE (mD, mD*, mf ), SE (mDDC, mα), and DKI (mK, mD) were extracted from the radiomics set for separate analysis. Feature importance (FI) was calculated by using random forest. Feature importance was determined as the mean decrease in the impurity of the random forest as previously described [28].

RF, L1R, PCA, and SVM
The 542 subjects were randomly and equally divided into a training set containing 271 subjects and an independent testing set containing the remaining 271 subjects. The ratios of malignant and benign subjects in the training set and the testing set were equal to the ratio in the whole dataset. The RF, SVM, PCA-LR, and L1R-LR algorithms were all based on the most widely used machine learning Python package, i.e., Scikit-learn [29]. For RF, the parameters were set as the default values, the number of trees was 100, and the maximum depth of the tree was 3. For L1 regularization (L1R), the features were selected implicitly by the L1 regularization of the linear classifier. L1R enforced the coefficients of the linear model to be sparse, thus making a small subset of radiomics features contribute to the final results. For PCA, 100 features were selected based on their power to differentiate benign from malignant lesions in the training set by sorting the lowest P values. Then, the first 10 principal components were chosen for the linear model for prediction. The parameter settings of both PCA and L1R followed the widely-used strategies in other MRI-based radiomics studies for breast cancer [23]. For SVM, we used the radial basis function (RBF) kernel. The parameters were optimized with respect to the training set. The hyperparameters of the above four methods are shown in Additional file 1: Appendix S3. The classifiers were trained using the repeated tenfold cross-validation (CV) method (100 times) in the training cohort, and their prognostic performance was then evaluated in the validation cohort using the area under the receiver operating characteristic (ROC) curve. A more detailed description of the frequencies of the features of RF during 100 times of tenfold CV is shown in Additional file 1: Appendix S4.

Statistical analysis
A goodness-of fit evaluation was performed for fitting of the BE, SE and DKI models by using MATLAB (Math-Works). The R 2 value was calculated [9]. ROC curves were generated for the mean diffusion metrics (ME-mADC all b , ME-mADC 0-1000 , BE-mD, BE-mD * , BE-mf, SE_mDDC, SE_mα, DKI-mK, and DKI-mD), and the ROC curves of all the 9 DWI image sets of the RF, L1R, PCA, and SVM models were calculated for comparison. The ROC curves of the 9 diffusion-related image sets were calculated from the results obtained by the CV models in the independent testing set. To compare the AUCs of the mean diffusion metrics and radiomics features, the McNemar test was used for the paired cases. All these comparisons were run 100 times, and we obtained the mean P values. Bonferroni adjustment was performed to control for α error inflation [29]. A P value less than 0.05/23 (0.00217) was regarded as a significant difference. All statistical evaluations were performed by using software developed either with the Python programming language [30] or with MATLAB software.

Image quality of multi-b diffusion weighted imaging
The mean R 2 value for the BE model fit was 0.90 ± 0.06. The mean R 2 value for the SE model fit was 0.95 ± 0.03. The mean R 2 value for the DKI model fit was 0.99 ± 0.01.
The signal intensity of malignant lesions on the map of b 2500 was 113. 25 ± 31.53. The signal intensity of benign lesions on the map of b 2500 was 36.83 ± 10.73. The signal to noise ratio (SNR) of b 2500 was 30.01 ± 10.16. The contrast noise ratio (CNR) of b 2500 was 2.25 ± 0.67. The lesion contrast on the map of b 2500 was 3.20 ± 1.04. A case of 23 datasets is shown in Additional file 1: Appendix S5.

Patient demographic characteristics
There was significant difference in demographic characteristics between patients with malignant lesions and patients with benign lesions (55.0 ± 12.2 vs. 50.3 ± 11.6, P < 0.001). Details on the top five image sets with the highest mean AUCs by RF, SVM, L1R-LR, and PCA-LR are shown in Table 2. The comparisons between RF and L1R, and between PCA and SVM are shown in Additional file 1: Appendix S6.

Diagnostic performance comparison of radiomics features by RF and the mean values of diffusion metrics
The interobserver reproducibility of radiomics feature extraction was satisfactory, with ICCs greater than 0.80 for all extracted features. The AUCs of the radiomics features for the differential diagnosis of breast lesions ranged from 0.80 (BE_D*) to 0.85 (BE_D), with a sensitivity of 83% to 88%, and a specificity of 74% to 82%. The AUCs of the mean diffusion metrics ranged from 0.54 (BE_mf ) to 0.79 (ME_mADC 0-1000 ), with a sensitivity of 74% to 88%, and a specificity of 41% to 71%. The AUCs of the radiomics features for the differential diagnosis of breast lesions were higher than those of the corresponding mean diffusion metrics, and there were significant differences in the AUCs between the mean values of the diffusion metrics (ME_mADC all-b , ME_mADC 0-1000 , BE_mD, BE_mD * , BE_mf, SE_mα, and DKI_mK) and the corresponding radiomics features of AUCs (all P < 0.002)  Table 3.

Importance of diffusion-related radiomics features
Details of all radiomics feature importance were shown in Additional file 1:

Discussion
Based on experimental results of this study, the BE_ IVIM_D map (with the highest AUC by RF), and the FO-10 percentile feature (with the highest FI by RF) from the radiomics-based analysis of multiparametric DWI are recommended in the characterization of breast lesions. Furthermore, we also found that the diagnostic performance of multiparametric DWI-derived radiomics was superior to that of the mean diffusion metrics in differentiating between benign and malignant breast lesions. This finding suggests that the radiomics-based analysis for multiparametric DWI has a potentiallyimproved performance in the classifications of breast lesions.
The majority of radiomics-based analyses in breast MRI research utilize T2WI, contrast T1WI, and conventional DWI [17,18,31,32]. To the best of our knowledge, this is the first study that extensively explored radiomics from multi-b-value maps and its commonly used fitting models (ME, BE, SE, and DKI), which could reflect more details of both Gaussian and non-Gaussian water diffusion distributions in tumors. Bickelhaupt et al. [17] demonstrated that the radiomics features of DKI can help differentiate malignant breast lesions from benign lesions. However, they only used the fitting model of DKI, and their scan sequences contained both the single-shot echo planar imaging (ss-EPI) in 95 patients and readout-segmented echo-planar imaging (rs-EPI) in 127 patients. We used four clinically used diffusion fitting models, and we also enlarged the sample size (542 lesions) in our study. Moreover, all the patients in our study were scanned with rs-EPI, which has significantly higher image quality and lesion conspicuity than ss-EPI, as suggested by previous studies [33,34].
Many radiomics-based machine learning methods can be used for lesion classification [17,18,35,36]. In this study, we extensively explored four promising algorithms of RF, L1R-LR, PCA-LR and SVM, which have been demonstrated to have high effectiveness in the previous radiomics studies [23,37]. We found that the ADC 0-1000 feature attained the highest mean AUC with all four algorithms, indicating that the mono-exponential model had already provided enough diagnostic information for breast cancer. Furthermore, RF had the highest probability of achieving the highest AUCs (8/9). Accordingly, this finding further corroborates the robustness and strong generalization power of RF [28]. Thus, in our further analysis, both the calculation of feature importance and the comparison of AUCs were based on the results of RF. The most predictive image set by RF (i.e., with the highest AUC, sensitivity and specificity) was BE_IVIM_D. Of note, BE_IVIM_D can remove the influence of perfusion and therefore reflects the true diffusion coefficient, better reflecting water movement in the living tissues. This may be the reason why the radiomics features computed from BE_IVIM_D provide more accurate information on water diffusion in breast cancer classification. Furthermore, the most important radiomics feature of BE_IVIM_D is the FO-10-percentile. Unlike that in previously reported studies [38][39][40], our experimental results did not suggest that texture features can attain better performance than the FO features on BE_IVIM_D. Accordingly, the FO features may be more predictive of lesion malignancy. On the other hand, the FO-10-percentile was also shown to be the most important feature (FI = 0.043) for the differentiation of benign and malignant breast lesions, indicating that first-order features remain important cues in multiparametric DWI for the differential diagnosis of breast lesions.
Our study has several limitations. First, all lesions in this study were drawn manually, which was time-consuming. Thus, automated lesion segmentation will be implemented in our future study to improve the objectiveness of lesion boundaries and to expedite preprocessing. Second, our multi-b value sequences were acquired with a fixed protocol, whereas the choice of optimal b-values could vary across different institutions. Thus, there was a lack of an external independent verification dataset to verify the generalization ability of this study's findings. Finally, this study employed all extracted diffusion-related radiomics for breast cancer diagnosis. The feature selection strategy was not implemented in this study. In future studies, we will conduct feature selection to optimize the construction of radiomics models.

Conclusions
In conclusion, the BE_IVIM_D map, and of FO-10-percentile feature by RF enabled accurate differentiation between malignant and benign breast lesions. Radiomics features computed from multiparametric DWI performed better than the mean values in distinguishing benign and malignant breast lesions. Hence, our study may shed a light on the applicability of radiomics from the multiparametric DWI for the clinical diagnosis of breast lesions.