Bioinformatics-based discovery of the urinary BBOX1 mRNA as a potential biomarker of diabetic kidney disease

Background Diabetic kidney disease (DKD) is the leading cause of end-stage kidney disease (ESKD) in the world. Emerging evidence has shown that urinary mRNAs may serve as early diagnostic and prognostic biomarkers of DKD. In this article, we aimed to first establish a novel bioinformatics-based methodology for analyzing the “urinary kidney-specific mRNAs” and verify their potential clinical utility in DKD. Methods To select candidate mRNAs, a total of 127 Affymetrix microarray datasets of diabetic kidney tissues and other tissues from humans were compiled and analyzed using an integrative bioinformatics approach. Then, the urinary expression of candidate mRNAs in stage 1 study (n = 82) was verified, and the one with best performance moved on to stage 2 study (n = 80) for validation. To avoid potential detection bias, a one-step Taqman PCR assay was developed for quantification of the interested mRNA in stage 2 study. Lastly, the in situ expression of the selected mRNA was further confirmed using fluorescent in situ hybridization (FISH) assay and bioinformatics analysis. Results Our bioinformatics analysis identified sixteen mRNAs as candidates, of which urinary BBOX1 (uBBOX1) levels were significantly upregulated in the urine of patients with DKD. The expression of uBBOX1 was also increased in normoalbuminuric diabetes subjects, while remained unchanged in patients with urinary tract infection or bladder cancer. Besides, uBBOX1 levels correlated with glycemic control, albuminuria and urinary tubular injury marker levels. Similar results were obtained in stage 2 study. FISH assay further demonstrated that BBOX1 mRNA was predominantly located in renal tubular epithelial cells, while its expression in podocytes and urothelium was weak. Further bioinformatics analysis also suggested that tubular BBOX1 mRNA expression was quite stable in various types of kidney diseases. Conclusions Our study provided a novel methodology to identify and analyze urinary kidney-specific mRNAs. uBBOX1 might serve as a promising biomarker of DKD. The performance of the selected urinary mRNAs in monitoring disease progression needs further validation. Electronic supplementary material The online version of this article (10.1186/s12967-019-1818-2) contains supplementary material, which is available to authorized users.


Background
Diabetic kidney disease (DKD), one of the most common complications of diabetes mellitus (DM), is the leading cause of end-stage kidney disease (ESKD) [1]. As the prognosis for patients diagnosed with DKD is not optimistic, early detection and risk classification of DKD are of paramount importance for providing appropriate and timely therapies to slow the progression to ESKD.
Currently, the microalbuminuria (MA) test is recognized as the most important diagnostic measure of DKD. However, emerging evidence has shown that MA is not an ideal diagnostic tool. A recent autopsy study demonstrated that patients with diabetes mellitus (DM) may have advanced renal pathological changes before the onset of MA [2]. Over 20% of individuals with type 1 diabetes without MA or macroalbuminuria reached stage 3 chronic kidney disease (CKD) [3]. For these reasons, novel markers which can identify patients with DKD, as well as those with a high risk of developing DKD are urgently needed.
Recently, analysis of urinary mRNAs has emerged as a novel non-invasive method for monitoring a variety of kidney diseases, including CKD [4], renal-allograft rejection [5], renal fibrosis [6] and DKD [7]. In previous studies, the candidate mRNAs were selected from molecules reported to have a role in disease development. This literature-based screening approach, however, is often biased and low-throughput. Moreover, it does not account for the interference from other urinary cells (e.g. bladder cells, inflammatory cells and cancer cells), which imposes considerable limitations when interpreting the origin of urinary mRNAs.
To address the above challenges, a novel bioinformatics-based methodology for selecting urinary mRNAs that more accurately reflect the in situ molecular changes in kidney was established in this study. Then, based on this novel approach, a two-stage study to discover potential biomarkers for DKD was further performed.

Overall design for the bioinformatics analysis
The overall study design is shown in Fig. 1. Microarray datasets of the following tissues were collected: (1) diabetic kidney tissues (including glomerular and tubular compartments), (2) normal kidney tissues (including glomerular and tubular compartments), (3) normal bladder tissues, (4) bladder cancer (BC) tissues, and (5) inflammatory cells from patients with urinary tract infection (UTI). Next, the transcriptome profiles of these tissues were compared and two types of differentially expressed genes (DEGs) were identified as candidate biomarkers: (1) upregulated genes with (a) highest fold changes (FCs) in diabetic kidney tissues compared with normal kidney tissues and (b) low expression in normal bladder, BC tissues and UTI-derived inflammatory cells; (2) upregulated genes with (a) highest FCs in diabetic kidney tissues compared with normal bladder tissues and (b) low expression in BC tissues and UTI-derived inflammatory cells. Next, using a targeted polymerase chain reaction (PCR) array, the urinary expression of the candidate mRNAs was determined in stage 1 study, and the one with best performance moved on to stage 2 study for validation. To avoid potential detection bias, a one-step PCR assay based on Taqman probes was developed for quantification of the interested mRNA in stage 2 study. Lastly, the in situ expression of the interested mRNA was determined by fluorescent in situ hybridization (FISH) assay and bioinformatics analysis.

Datasets collection
The gene expression datasets used in this study were compiled using the gene expression omnibus (GEO; https ://www.ncbi.nlm.nih.gov/geo/). For datasets of normal human glomeruli, tubules and bladder tissues, the primary search criteria were set to "glomerulus OR glomeruli", "tubulus OR tubules" and "bladder" respectively. For DKD, the primary search criteria were set to "diabetic nephropathy OR diabetic kidney disease OR DKD". For UTI-derived inflammatory cells and BC tissues, the primary search criteria were set to "urinary tract infection OR UTI" and "bladder AND (cancer OR tumor)", respectively. To minimize platform variations, the platform filtering criterion was set to "Affymetrix U133″, which included three types of microarrays (Human Genome U133 Plus 2.0 Array, Human Genome U133A 2.0 Array, and Human Genome U133A&B). The searching criteria for other types of CKD can be found in our previous study [8].

Processing microarray datasets
Data processing was performed using Bioconductor (version 3.4) in R software (version × 64 3.3.2) according to the following steps [9]: 1. Before included in the analysis, the quality of each dataset was examined using the general quality control (QC) stats in the simpleaffy package and RNA degradation analysis in the affy package.  (10). The inclusion criteria were set as follows: at least 5 years from diagnosis of type 2 diabetes, the presence of diabetic retinopathy, and an elevated ACR. The exclusion criteria were as follows: infection, signs or symptoms of other systemic diseases, or suspected non-diabetic kidney disease. ESKD was defined as the onset of dialysis or estimated glomerular filtration rate (eGFR) < 15 ml/ min/1.73 m 2 . The eGFR was calculated according to modified modification of diet in renal disease (MDRD) equations [10].
The HCs were enrolled from the Zhong Da Hospital Health Care Center, all of whom met the following criteria: (1) no record of abnormal renal function (eGFR < 90 mL/min/1.73 m 2 ); (2) normal routine urinalysis, ACR, and 24 h urinary protein test results; (3) no record of hypertension, diabetes, hyperlipidaemia, or hyperuricaemia; and (4) no family history of kidney diseases. Moreover, all patients with UTI and BC have no evidence of kidney diseases or diabetes.

PCR assay
In the stage-1 study, a targeted PCR array was fabricated to detect differentially expressed mRNAs with highest FCs. The primer sets were designed using PRIMER 5 software according to optimized experimental conditions. Two housekeeping genes (B2M and RPL27) were used to normalize data. Positive PCR control (PPC) and reverse transcription control (RTC) were set into the array to monitor PCR reaction performance. Cycling conditions were set as follows: 95 °C for 10 min, followed by 40 cycles of 15 s at 95 °C and 60 °C for 1 min.
In the stage-2 study, a Taqman one-step PCR assay was developed for quantification of the BBOX1 mRNA. B2M was used as the housekeeping gene. Cycling conditions were set as follows: 95 °C for 10 min, followed by 45 cycles of 20 s at 95 °C and 60 °C for 45 s. All primer sequences can be found in the additional files (Additional file 1: Table S1). All PCR assays were performed using an ABI PRISM7700 system (Applied Biosystems).
In order to further test the reproducibility and sensitivity of the Taqman PCR assay, the cycle threshold (Ct) values of BBOX1 and B2M in different amounts of total urinary RNA (500 ng, 50 ng, 5 ng and 0.5 ng) were measured for three times. The reproducibility was measured by the coefficient of variation (CV) according to the following formula c v = σ µ , where σ and μ stand for the standard deviation(SD) and mean of repeated measurements, respectively.

Urinary mRNA samples collection and measurement
First morning urine samples were collected and centrifuged at 3000g for 30 min at 4 °C within 2 h of collection to obtain the urinary sediments. The sediments were then resuspended in 1.5 ml DEPC-treated PBS and centrifuged at 12,000g for 5 min at 4 °C. RNAiso Plus (Takara) was added to preserve total RNA, and the samples were stored at − 80 °C until use. Total RNA was extracted according to the manufacturer's protocol (Invitrogen). Then, RNA concentrations were measured using a NanoDrop 2000 (Thermo) based on the relative absorbance ratio at 260/280 nm. The qualified RNA samples were then reverse transcribed to cDNA according to the manufacturer's protocol (Takara), which were stored at − 20 °C until use.

Confirmation of in situ mRNA expression
The Ethical Committee of Zhong Da Hospital of Southeast University approved the use of human samples for the experiments outlined in this study. FISH assay was performed on 2-μm-thick sections of diabetic kidney tissues and normal urothelium to determine the in situ mRNA expression levels. Additionally, kidney tissues were co-stained with podocalyxin antibody to detect the expression of BBOX1 mRNA in podocytes.
Briefly, sections were first deparaffinized and dehydrated in dimethylbenzene and ethanol, followed by rinsing twice in distilled water for 5 min each. After pretreatment with pepsin and permeabilization, the sections were treated with a BBOX1 gene probe mix (Exiqon, sequence:5′-AGTAA TCCAC TCCAA TGTCT GT-3′) overnight at room temperature. To stain podocytes, the slides were additionally incubated with labeled antihuman podocalyxin monoclonal antibodies(Abcam) at a dilution of 1:100 overnight. Nuclei were counterstained with 4′,6-diamidino-2-phenylindole (DAPI) and coverslips were fixed with nail polish. Analysis of fluorescence signals was performed using a Nikon Eclipse C1 epifluorescence microscope with interference filters (AHF Analysentechnik AG).

Statistical analysis
R software (version ×64 3.3.2) and Graphpad Prism 7.0 were used for all other statistical analysis and figure construction, respectively. The Shapiro-Wilk test was used to determine the normality of the data. Numeric results with a normal distribution were presented as the mean ± SD. Non-normal numeric results were presented with the interquartile range (IQR). Analysis of variance (ANOVA) was applied to compare the means of normalized data. For skewed data, the Kruskal-Wallis test was applied. The frequencies were compared using the Chi squared test. The correlation between gene expression levels and clinical parameters were analyzed using Spearman's rank-order test. The discriminative power of the biomarker was evaluated by generating receiver operating characteristic (ROC) curves. The area under the curve (AUC) was used to assess the overall discriminatory power. An AUC of 0.6-0.7 was considered poor, 0.7-0.8 was considered moderate, 0.8-0.9 was considered good, and > 0.9 was considered excellent. Optimal cut-offs were determined by selecting the data points that maximized the sum of specificity and sensitivity on the ROC curve. Two-tailed p < 0.05 was considered statistically significant.

Identification of candidate mRNAs
As shown in Fig. 2, after comparing the gene expression patterns in glomeruli from diabetic kidney tissues and others, a total of 9203 upregulated DEGs were identified (DKD glomeruli vs. normal bladder, n = 6025; DKD glomeruli vs. normal glomeruli, n = 278; DKD glomeruli vs. BC, n = 5083; DKD glomeruli vs. UTI-derived polymorphonuclear leukocytes, n = 8036), of which seventy-eight were co-differentially expressed.
According to the rules described in the Methods section (Overall design for the bioinformatics analysis), sixteen mRNAs were finally selected as candidate biomarkers. Detailed lists of differentially expressed genes are shown in the additional files (Additional file 2:  Table S9).

Characteristics of the study population
A total of 82 participants were enrolled in stage 1 study (HC: n = 14; NA: n = 16; MA: n = 11; OA; n = 12, ESKD: n = 13; UTI: n = 8; BC: n = 8). Table 2 shows the demographic and clinical characteristics of included participants. The mean ages of participants were similar in the HC, NA and UTI group, but those in the MA, NA, ESKD and BC group were older. In addition, the renal function of patients in the MA, OA and ESKD group was significantly poorer than those of HCs. Patients in the UTI and BC group all had normal albuminuria, blood glucose levels and eGFR.
In the stage-2 study, another 80 participants were enrolled (NA: n = 20; MA: n = 20; OA: n = 20; HC: n = 20). As shown in Table 3, there was no significant difference in age among different groups. Patients in the OA group, but not the NA or MA group, had lower eGFR than HCs.

Urinary BBOX1 mRNA (uBBOX1) expression was upregulated in patients with DKD
To verify the differential expression of candidate mRNAs in the urine, the urinary mRNA profile of each participant of stage 1 study was measured using the self-assembly PCR array. As a result, four urinary

The discriminative performance of uBBOX1
Next, ROC analysis was performed to further test the overall accuracy of uBBOX1 in discriminating different groups of participants enrolled in the stage 1 study. As shown in Fig. 5, with HCs as the control group and patients with diabetes and non-ESKD DKD as the test group, uBBOX1 had an AUC of 0.762 (p = 0.0084). At its optimal cut-off value of 0.0082, uBBOX1 yielded a specificity of 63.6% and a sensitivity of 82.0%. When patients with UTI and BC were added to the control group, the AUC for uBBOX1 changed to 0.805 (p < 0.0001). Interestingly, the optimal cut-off value remained at 0.0082, where uBBOX1 yielded a specificity of 66.7% and a sensitivity of 82.1%.

Performance validation of uBBOX1 in stage 2 study
To avoid potential detection bias, the reproducibility and sensitivity of the one-step Taqman PCR assay were tested. As shown in Table 5, both BBOX1 and B2M mRNAs can be detected in the reaction systems containing as low as 0.5 ng total urinary RNA/10ul with low variance (average CVs for Ct values of BBOX1: 0.26%; average CVs for Ct values of B2M: 0.14%). Then using this one-step Taqman PCR assay, the uBBOX1 levels of all participants in study 2 were measured. As shown in Fig. 6 . However, significant difference of uBBOX1 levels within the NA, MA and OA group was not found. In correlation analysis (Fig. 7), uBBOX1 also positively correlated with urinary ACR levels (Spearman's r = 0.471, p < 0.0001), urinary NAG levels (Spearman's r = 0.488, p < 0.0001), blood glucose levels (Spearman's r = 0.293, p = 0.0084), HbA1c levels (Spearman's r = 0.263, p = 0.019). A weak negative correlation between eGFR and uBBOX1 was also found (Spearman's r = 0.263, p = 0.019).

BBOX1 mRNA in situ expression in diabetic kidney tissues
Next, the in situ expression of BBOX1 mRNA in human tissues was examined. As shown in Fig. 8, BBOX1 mRNA was highly expressed in the tubular compartments of diabetic kidney tissues, while its expression in glomerular compartments including podocytes was weak. Moreover, BBOX1 mRNA was poorly expressed in normal bladder tissues, indicating that detached tubular epithelial cells (TECs) primarily contributed to the elevated uBBOX1 levels. Next, we asked whether the tubular expression of BBOX1 mRNA would change in the context of DKD. As shown in Table 6, bioinformatics analysis involving 33 microarray datasets showed that tubular expression of BBOX1 mRNA was not significantly changed in diabetic kidney tissues (FC: 0.745, adjusted p = 0.136).

The differential expression of BBOX1 mRNA in other types of CKD
To expand the potential use of uBBOX1, the differential expression of BBOX1 mRNA in other types of CKD was further examined using an integrative bioinformatics Fig. 3 The differential expression of uBBOX1 among different populations in stage 1 study. a uBBOX1 levels were significantly elevated in the NA, MA and OA group, but not the ESKD group; b uBBOX1 levels of the UTI and BC group were significantly lower than those of the NA, MA and OA group. *p < 0.05; **p < 0.01 approach. The basic characteristics of the included datasets have been described in our previous study [8]. As shown in Table 6, compared with normal kidney tissues, tubular BBOX1 mRNA expression was either not significantly changed or slightly upregulated in other types of CKD including hypertensive nephropathy (relative expression: 0.929, adjusted p = 0.438), focal segmental glomerulosclerosis (relative expression: 1.17, adjusted p = 0.134), IgA nephropathy (relative expression: 0.965, adjusted p = 0.662) and membranous nephropathy (relative expression: 1.28, adjusted p = 0.0045).

Discussion
Although MA is the most common diagnostic measure for DKD, recent studies have suggested that it is suboptimal in many cases. Therefore, there is an urgent need to identify novel biomarkers to increase the accuracy of diagnostic tools.
Previous studies have applied a literature-based approach to screen candidate urinary mRNA markers for kidney diseases. This approach is inherently biased and lacks efficiency. In recent years, bioinformatics has emerged as a powerful tool for the high-throughput identification of potential biomarkers [18]. However, owing to the tremendous amount of data generated by highthroughput technology, bioinformatics analysis based on a few samples may be at a high risk of false positive results. In the present study, we developed a novel bioinformatics workflow based on over one hundred microarray datasets and found that uBBOX1, a TEC-specific mRNA, could be used as a potential biomarker of DKD.
The BBOX1 gene (also known as BBH) is located on chromosome 11 and encodes gamma-butyrobetaine hydroxylase 1, which catalyzes the formation of L-carnitine from gamma-butyrobetaine [19]. L-carnitine deficiency is associated with skeletal myopathies, poorer renal and cardiac function, and anemia in the context of CKD [20,21]. It also exerts antioxidant and anti-inflammatory effects on TECs in various kidney injury models [22,23]. The kidney, the liver and the brain are three major organs where carnitine biosynthesis is highly activated [19]. Using quantitative PCR, Rigault et al. found that BBOX1 mRNA is highly expressed in the kidney, with an increase in expression of over threefold and ninefold compared to the liver and brain, respectively [19]. Our result further demonstrated that BBOX1 mRNA is predominantly located in TECs, while its expression in podocytes and urothelium is weak. Therefore, detached TECs and fragments are the major origin of uBBOX1. In addition, BBOX1 mRNA expression, as revealed by our bioinformatics analysis, was not significantly changed in TECs. Hence, detecting uBBOX levels provides a novel and fast method for measuring the extent of TEC loss in the urine.
Previous studies have shown that urinary podocyte mRNAs, which reflect the extent of podocyturia, can be useful as a non-invasive marker for DKD [24,25]. In a cohort of 1143 patients, urinary podocyte mRNAs were markedly increased in patients with glomerular diseases including DKD and even higher in progressors [26]. Recently, tubulopathy is becoming increasingly recognized as a key culprit of DKD-not only during late stages, but also at the onset of disease [27]. As a consequence of tubular injury induced by DM-related pathophysiological disturbances including albuminuria and hyperglycemia, elevated TEC excretion can be found in the urine of patients with DM [28][29][30]. Hyperglycemia can trigger oxidative stress and lead to aggravated injury of TECs even in non-diabetic rats [31]. Consistent with these studies, we observed that uBBOX1 levels were augmented in normoalbuminuric patients with diabetes and patients with DKD, and positively correlated with albuminuria, hyperglycemia and urinary NAG excretion, indicating its potential diagnostic value for DKD. Its correlation with eGFR, however, was relatively weak. Besides, the differences of uBBOX1 levels among the NA, MA and OA group did not reach statistical significance. Aside from the fact that the relative small sample size limits the statistical power, the sharp increase of urinary mRNA upon kidney damage may also contribute to the insignificance [32]. Another interesting phenomenon is that patients with ESKD had a decreased level of uBBOX1 compared with those in earlier stages, which may have been owing to the lack of kidney intrinsic cells in fibrotic kidneys.
Taken together, these data suggest that uBBOX1 may act as an early biomarker for DKD. To expand the potential use of uBBOX1, we also analyzed its in situ expression in other types of CKD by an integrative bioinformatics method. We found that tubular BBOX1 mRNA expression is quite stable in hypertensive nephropathy and various types of glomerulonephritis, suggesting that elevated uBBOX1 may also act as an indicator of tubular injury in these kidney diseases. Further studies are still required to verify this hypothesis.
Other strengths of this study include: (1) we developed a convenient and effective detection method based on Taqman probes for large-scale clinical verification and transformation; (2) we pre-planned to minimize the potential interference from urinary cells in the context of UTI and BC. Unsurprisingly, uBBOX1 expression was not substantially increased in patients with UTI or BC, supporting the efficacy of this top-down study design. Notably, uBBOX1 was found to be highly effective in discriminating patients with diabetes from HCs and those with confounding diseases. Some limitations of our study should be considered. Firstly, the study design was cross-sectional with a relatively small sample size, which reduced the significance of the study. Additionally, the diagnosis of DKD in our study was based on laboratory parameters rather than pathological parameters. Large-scale prospective studies are still needed to further validate the prognostic role of uBBOX1 for DKD and other CKDs.

Conclusions
In conclusion, our study provided a novel methodology to identify and analyze urinary kidney-specific mRNAs. Urinary BBOX1 mRNA might serve as a promising biomarker of early detection of DKD. The performance of