Longitudinal analysis of symptom-based clustering in patients with primary Sjogren’s syndrome: a prospective cohort study with a 5-year follow-up period

Background Sjogren’s syndrome (SS) is a heterogenous disease with various phenotypes. We aimed to provide a relevant subclassification based on symptom-based clustering for patients with primary (p) SS. Methods Data from patients in a prospective pSS cohort in Korea were analysed. Latent class analysis (LCA) was performed using patient reported outcomes, including pain, fatigue, dryness, and anxiety/depression. Clinical and laboratory differences between the classes were analysed. Latent transition analysis (LTA) was applied to the longitudinal data (annually for up to 5 years) to assess temporal stability of the classifications. Results LCA identified three classes among 341 patients with pSS (i.e., ‘high symptom burden’, ‘dryness dominant’, ‘low symptom burden’). Each group had distinct laboratory and clinical phenotypes. LTA revealed that class membership remained stable over time. Baseline class predicted future salivary gland function and damage accrual represented by a Sjogren’s syndrome disease damage index. Conclusion Symptom-based clustering of heterogenous patients with primary Sjogren’s syndrome provided a relevant classification supported by temporal stability over time and distinct phenotypes between the classes. This clustering strategy may provide more homogenous groups of pSS patients for novel treatment development and predict future phenotypic evolvement. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-021-03051-6.


Background
Sjogren's syndrome (SS) is a systemic autoimmune disease characterised by sicca symptoms associated with lymphocytic infiltrates of affected glands [1]. Some patients with SS suffer from various extra glandular manifestations (e.g., arthralgia/arthritis, Raynaud's phenomenon, peripheral neuropathy, interstitial lung disease) [2]. Therefore, between-group heterogeneity in phenotype and severity occurs in patients with SS.
Given the lack of pathogenesis-targeted therapies, treatment for SS is mainly focused on symptom relief [3]. Novel biologics, such as rituximab, fail to meet the primary endpoint of clinical trials [4]. This result appears to occur partly because inclusion criteria were not appropriate to select a group of patients homogenous enough to have similar responses. On the other hand, some trials (e.g., abatacept trial) [5] used the high European League Against Rheumatism (EULAR) Sjogren's syndrome disease activity index (ESSDAI) as inclusion criteria and

Open Access
Journal of Translational Medicine  19:394 found that ESSDAI scores fail to improve more so than scores generated using a placebo. It appears that heterogenous groups of patients had high ESSDAI scores but different phenotypes within groups. These trials might have been successful if they were performed using more homogenous groups of patients. Tarn et al. found that symptom-based stratification of patients with SS identified four distinct subgroups with unique pathobiological endotypes [6]. The authors reanalysed data from two large clinical trials, JOQUER [7] and TRACTISS [8], and found that hydroxychloroquine or rituximab, respectively, were efficacious in specific subgroups of patients. Stratification was performed based on baseline characteristics of an existing United Kingdom Primary Sjogren's Syndrome Registry cohort [9]. They also performed an external validation study using external cohorts and found good performance of the classification system. Nevertheless, to apply this stratification clinically, it is important that class membership remains stable over the time so that stratification can be performed at any time during the disease course. Clustering methods will also be more advantageous if stratification could predict future disease status.
Based on these, we investigated whether symptombased clustering performed well enough to provide relevant classes in a population of Korean patients with pSS. We also sought to determine if class membership had temporal stability during a 5-year follow-up period. Finally, we examined whether the initial class predicted future disease status in terms of salivary flow rate (SFR) and Sjogren's syndrome damage index (SSDDI) results.

Study population
In this study, all enrolled patients with pSS were Korean Initiative Sjogren's Syndrome (KISS) participants that were recruited at Seoul St. Mary's Hospital. The KISS was founded in 2013 with the aims to establish a nationwide prospective cohort database that contained overall clinical data and samples from patients with pSS, and to develop diagnostic and treatment tools for pSS.

Statistical methods
Latent class analysis (LCA) was used for clustering. The variables selected for clustering included components of the EULAR SS Patient Reported Index (ESSPRI) and the EQ-5D. Visual analogue scales (VASs) of pain, fatigue, and dryness, with values from 0-10 derived from ESS-PRI. Anxiety/depression was examined using the 5-Likert scales from the EQ-5D. Usually, binary variables are applied when using LCA. Definition of variables such as pain > 3, dryness > 5, and fatigue > 7 were made according to baseline median values. A clinically meaningful value ≥ 3 was used for anxiety/depression, (Additional file 1: Table S1). Model fitness was measured using Akaike's information criterion, Bayes information criterion (BIC), G-Squared, entropy, and log-likelihood [10] results (Additional file 1: Table S2).
Latent transition analysis (LTA) is a longitudinal version of LCA. LTA provides class membership probabilities at each time point, and probabilities of transitioning to a different class over time [11]. Results for fit statistics for LTA are presented in Additional file 1: Table S3.
Clinical and laboratory parameters were compared between classes. Continuous variables were compared using Kruskal Wallis tests with post-hoc analysis. Categorical variables were compared using chi-square tests. A value of P < 0.05 was considered to be significant. Statistical analysis was performed using SAS software (version 9.4; SAS institute, Cary, NC, USA) with PROC LCA and PROC LTA downloaded from https:// www. metho dology. psu. edu/ downl oads/ procl calta/, and IBM SPSS Statistics for Windows (version 24; IBM corp., Armonk, NY, USA).

Latent class analysis identifies three classes in patients with pSS
LCA was performed using components of ESSPRI and EQ-5D from baseline to 5 years of follow-up. Supplementary table 4 presents the results for numbers of subjects during the follow-up and their VAS results for the ESSPRI and depression/anxiety scales. Calculation of fit statistics found that three-class clustering had the most relevant performance, which was represented by low BIC and high entropy values (Additional file 1: Table S2).
The class 1 (66 out of 321) group had 'dryness dominant' characteristics. Patients in class 1 had low levels of pain, but suffered from dryness and fatigue comparable to patients in class 2 (Table 1). Class 2 (134 out of 321) was characterised as 'high symptom burden. ' Patients

Temporal stability of classification determined using latent transition analysis
Next, we performed LTA to investigate if class membership remained stable over time. Results for latent status and item response probabilities at all times are presented in Table 2 and Fig. 1. High VAS values for pain, fatigue, and anxiety/depression conferred a high probability to be classified as class 2. Dryness was highly associated with class 1. The results for transition probabilities indicated temporal stability of membership (Fig. 2). Patients with high symptom burden (class 2) tended to remain in the same class with annual transition probabilities more than 0.9, except during the initial 1-year period. Similarly, patients with low symptom burden (class 3) hardly moved to other classes (i.e., transition probability of nearly 1.0).  The dryness dominant population (class 1) had a different trend compared with the other two groups. Patients in class 1 often experienced transition to class 3 (low symptom burden), with transition probabilities from 0.003 to 0.364. However, they did not move to the high symptom burden group (class 2) throughout the follow-up period.

Predictive ability of baseline classes
After verifying temporal stability of the classes, we examined whether initial class predicted deterioration associated with pSS. We compared the ESSPRI, SFR, SSDDI values on the last visits between those at baseline for each class. The median follow-up period was 4 years. We found significant differences in ESSPRI (P < 0.001), uSFR (P = 0.004), and SSDDI (P = 0.014) values (  [2][3]) were higher than that of patients in class 3. At baseline, uSFR was not different between classes 1 and 2, but the follow-up uSFR result was higher in patients in class 1. The SSDDI was higher in class 1 at baseline, but class 2 patients had higher SSDDIs at the last follow-up. These results suggested that initial symptom-based classification predicted future disease status at follow-up and supported the clinical relevance of the classification method.

Discussion
In this study, we used symptom-based clustering and LCA to subclassify patients with pSS. Clustering revealed three classes with distinct endotypes and LTA revealed temporal stability of membership during and up to 5 years of follow-up. Baseline membership predicted future SFR and SSDDI results. This result suggested that the initial class determined different disease evolvement at the last follow-up.
The three latent classes identified were designated as 'dryness dominant' , 'high symptom burden' , and 'low symptom burden' groups. This nomenclature was originally derived from the Tarn et al. [6] report, which was the first to suggest the use of four symptom-based clusters in patients with pSS. The authors used the anxiety/ depression scale from the hospital anxiety and depression scale (total score ranges from 0 to 42), instead of the EQ-5D scale. This difference may explain the difference in the numbers of classes between the two studies. In addition to the number of classes-3 compared with 4-, ESSDAI and medications were different between the classes in our study, which was not the case in the previous study. Contrary to the higher lymphoma prevalence along with β2-microglobulin and CXCL13 level in the dryness dominant group observed in the previous study, we could not find any difference in β2-microglobulin level. However, the only lymphoma patient in our cohort was classified to dryness dominant group which showed the highest cryoglobulin positivity-a risk factor for lymphoma in pSS consistent with the previous report. Therefore, patients in the same class seemed to have basically similar characteristics in both studies, which suggested that symptom-based clustering performed well regardless of ethnicity. A major strength of our classification criteria compared to the previous study is that the LCA method we used in the current permits LTA analysis which showed the temporal stability of a cluster over time, in addition to the use of cross-sectional clustering analysis. And the questionnaire for the classification is more simple.
Each class had distinct clinical and laboratory parameters associated with pSS. Nevertheless, symptom variables might not be objective parameters associated with the pathogenic mechanism of pSS. Subclassification of patients with pSS has focused on molecular signatures that appear to be more associated with pathogenesis of the disease [2,[12][13][14][15]. James et al. found that transcriptional modules identified three clusters with differences in interferon, inflammation modules, and molecules, such as CXCL10, CXCL9, and BAFF [16]. This classification that uses molecular features correlates well with systemic involvement represented by ESSDAI. However, other phenotypical differences do not seem to be affected by these molecular features. We found the same optimal performance of symptom-based clustering as the previous study, and verified its relevance and temporal stability. These results indicated this approach may be valid to subclassify patients with pSS. We also found that there was a significant difference in salivary siglec-5 results between the classes. We previously reported salivary siglec-5 as a biomarker for pSS diagnosis; it is negatively correlated with SFR and positively correlated with serum IgG [17]. Although baseline uSFR and IgG levels were not different, salivary siglec-5 was significantly higher in patients in class1 than in patients  Table 1)). The differences might result from 'latent' differences between the two classes. Precision medicine is one of the most interesting topics in the current medical field. Appropriate classification of patients cannot be over-emphasised for its value in specific application of unique and efficient treatment strategies. Therefore, the use of clustering has been suggested to be applied to many other diseases, including asthma [18][19][20], sepsis [21], and cardiovascular diseases [22]. With regard to rheumatologic diseases, studies have used different statistical methods for clustering systemic lupus erythematosus [23], systemic sclerosis [24], and IgG4-related disease [25]. These studies classified heterogenous groups of patients into more homogenous subgroups to better understand disease course and underlying mechanisms.
We used LCA for cluster analysis instead of the hierarchical analysis, which has been widely used in previous studies. One advantage of hierarchical analysis is that it displays a dendrogram, which allows for easy visual presentation of results [26]. An advantage of LCA is that it has been used as confirmatory analysis to reproduce results performed using k-means clustering and has been evaluated for use in person-centered analysis [27]. Using LCA, we performed LTA of longitudinal data as well.
From different perspectives, the temporal stability of this classification approach is favorable to explain the possible disease course of patients. For example, a patient with a low symptom burden (class 3) at baseline has a high chance of staying in that class during the follow-up period. The analysis of our longitudinal data indicated that it was not likely that he or she would experience high disease activity. Therefore, physicians might interpret the Sjogren's syndrome disease entity, which can affect all systems of the body, as slow evolving and as one that largely results in mild clinical manifestations. To confirm this hypothesis, more long-term data is needed.
This study had some limitations. First, the number of patients in the study population was small and only included Korean patients with pSS from a single center. In these patients, systemic involvement was not frequent or severe. However, we obtained similar results in a previous study [6], which suggests that symptom-based clustering performs well in general. Second, symptom-based stratification is not based on variables associated with the pathogenesis itself. However, as previously mentioned, this method may identify the latent class of patients with pSS. Third, predictions of future ESSPRI, SFR, and SSDDI values were not derived from a model that adjusted for potential confounding variables. The ESSPRI results were expected because class remained stable over time, and the initial class with low symptom burden had low ESS-PRI scores during the follow-up periods. The finding of significant differences in SFRs and SSDDIs at the last follow-up, according to initial class membership, conferred more value on the clustering method used. Currently, we are developing a regression model to predict class membership in another patient cohort, which aims to further validate the potential application of this classification method.

Conclusions
Symptom-based clustering of heterogenous pSS patients provided a relevant classification that is supported by temporal stability over time and clearly distinct phenotypes between classes. This clustering strategy may identify more homogenous subgroups of patients with pSS, to aid in novel treatment development and to predict future phenotypic evolvement.