Study settings and participants
This cohort study included community-dwelling older adults enrolled from a sub-cohort of the National Center for Geriatrics and Gerontology-Study of Geriatric Syndromes (NCGG-SGS). NCGG-SGS is a Japanese national cohort study; its primary goal was to establish a screening system for geriatric syndromes and validate evidence-based preventive interventions [14]. Individuals aged ≥ 65 years living in Obu City that were not hospitalized, not in residential care, not certified by the national long-term care insurance system (LTCI) as having a functional disability, or not participating in another study (n = 14,313) were sent an invitation letter to participate in a baseline assessment. We assessed 5104 individuals at baseline (August 2011 to February 2012). We applied the following exclusion criteria: (i) those with a history of dementia (n = 140), (ii) those with suspected dementia at baseline based on Mini-Mental State Examination (MMSE) score < 21 [15] (n = 146); (iii) those with Parkinson’s disease (n = 21), (iv) those with depression (n = 130), (v) those dependent on others for basic activities of daily life, such as eating, bathing, grooming, walking, and stair-climbing (n = 20), (vi) those with a functional disability based on the LTCI system (n = 74), and (vii) those with missing data for these criteria or questionnaires at baseline assessment (n = 95). After exclusion, 4478 cognitively intact participants were identified. During the follow-up period, participants who were unable to confirm public health insurance affiliations based on the Japanese National Health Insurance and Later-Stage Medical Care Systems (n = 180) were excluded from the analysis. Therefore, 4298 participants were included in the final longitudinal analysis.
All baseline assessments were performed as health check-ups by well-trained nurses and study assistants in community centers. All staff received training from the authors on the protocols for administering the assessments before the study began. During the follow-up period, we collected medical records of Japanese public health insurance to identify the incidence of dementia. Data from medical records were collected from the local government on a monthly basis for 24 months.
Ethics approval
The study protocol was developed in accordance with the Declaration of Helsinki and was approved by the ethics committee of the National Center for Geriatrics and Gerontology. Prior to participation in the study, written informed consent was obtained from all participants.
Development of STAD and screening of dementia risk
STAD was developed based on a combination of literature review, statistical analysis, and expert opinion. First, we created a questionnaire on dementia risk consisting of 30 “yes”/“no” questions, which included subjective memory complaints, depressive symptoms, functions in daily living, and lifestyle activities based on a literature review. We then assessed the participants using a questionnaire at baseline and followed them up for 24-months to detect dementia incidence. Second, we examined the relationships between each item of the questionnaire and dementia incidence using Cox proportional hazards regression analysis, and 23 significant predictors (after adjusting for age and sex) were identified as candidate items of STAD (Additional file 1: Table S1). Third, a panel of five experts (geriatrics and health science specialists) examined the content validity of the candidate items using the content validity index (CVI) [16]. The content validity was assessed in terms of clarity, concreteness, essentiality, and importance for the prediction of dementia risk using a 4-point Likert scale (e.g., 1, not clear; 2, not very clear, 3, somewhat clear; and 4, very clear). The CVI of each item was calculated as the number of experts giving a rating of 3 or 4 divided by the total number of experts [16]. The results range from 0 to 1 with the following interpretation, > 0.79, the item was relevant; 0.70–0.79, the item needed revision; and < 0.70, the item was eliminated [16] (Additional file 1: Table S2). Finally, 11 items were eliminated and the remaining 12 were included in the STAD: (1) Do you forget where you have left things more than you used to? [5], (2) Do other people find you forgetful? [5], (3) Do you find yourself not knowing today’s date? [17], (4) Have you dropped many of your activities and interests? [18], (5) Do you often get bored? [18], (6) Do you feel helpless? [18], (7) Do you prefer to stay at home, rather than going out and doing new things? [18], (8) In the last 2 weeks have you felt tired without a reason? [17], (9) Do you go out less frequently compared to last year? [17], (10) Do you engage in low levels of physical exercise aimed at health at least five times a week? [19], (11) Do you use maps to go to unfamiliar places? [20], and (12) Do you engage in cognitive stimulation, such as board games and learning [20]? Participants were assessed for the risk of dementia using the STAD. The total score (0–12) was calculated by adding the number of risks at baseline.
Observation of dementia incidence
All individuals aged ≥ 65 years had one of the following types of public health insurance in Japan: “Employees’ Health Insurance,” “Japanese National Health Insurance,” or “Later-Stage Medical Care System” [21, 22]. Individuals aged 65–74 years enroll in either one of the Employees’ Health Insurance (health insurance for employed individuals aged < 75 years) or the Japanese National Health Insurance (national health insurance for unemployed and self-employed individuals aged < 75 years). When they reach 75 years, they are automatically switched to Later-Stage Medical Care System (health care for individuals aged ≥ 75 years). We checked the Japanese National Health Insurance and Later-Stage Medical Care Systems for data regarding newly reported cases of dementia and the date of diagnosis every month. We defined “the incidence of dementia” as a new diagnosis of dementia during the 24-month follow-up period, but not at baseline. The diagnosis of dementia was made by medical doctors in medical facilities according to the International Classification of Diseases-10 [23].
Assessments of potential confounding factors
As covariates, age, sex, educational attainment, and comorbidities (hypertension, hyperlipidemia, diabetes mellitus, and heart disease) were assessed through face-to-face interviews at baseline. We also included drinking and smoking habits (current vs. former/never), slow gait speed, physical inactivity, living arrangements (living alone or cohabiting), and global cognitive function at baseline as covariates. The gait speed was measured over a 2.4-m distance, and the mean gait speed of five trials of < 1.0 m/s was defined as slow gait speed [24]. The physical inactivity was evaluated by asking the following: (1) “Do you engage in more than moderate levels of physical exercise or sports aimed at health?” and (2) “Do you engage in low levels of physical exercise aimed at health?” Participants who responded “no” to both questions were classified as inactive [14]. The global cognitive function was assessed using the MMSE; the MMSE scores ranged from 0 to 30, with higher scores indicating better cognitive performance [25].
Statistical analyses
To begin with, we compared baseline characteristics between participants with and without dementia incidence using Student’s t-test for continuous variables and χ2 test for categorical variables. And then, for model construction and validation, we randomly divided the dataset into training and test datasets in a 6:4 ratio.
First, in the training dataset, the optimal cut-off points of the STAD score that best discriminated participants who developed and did not develop dementia were identified using the Youden Index [26]. In the test dataset, the cumulative survival rate of the incidence of dementia during the 24-month follow-up according to the cut-off points was calculated using Kaplan–Meier curves, and intergroup differences were estimated using a log-rank test. Additionally, a Cox proportional hazards regression analysis was conducted to examine the predictive validity of STAD cut-off points for the prediction of dementia incidence. The hazard ratios (HRs) were calculated with 95% confidence intervals (CIs) for the risk of dementia.
Second, we used a decision-tree model to enhance the predictive ability of STAD for dementia risk screening. We performed a decision-tree analysis using the CART algorithm to identify the optimal and minimum combination of STAD items for predicting the risk of dementia in the training dataset. The CART algorithm is based on recursive partitioning analysis, and the aim is to develop prediction rules by constructing binary trees. For this analysis, the Gini index [27] was used as the splitting criterion, which characterized the impurity of a sample set, and the maximum tree depth was set to 3. Additionally, the synthetic minority oversampling technique (SMOTE) [28] was applied to solve the problem of imbalanced data in the dementia status (incidence rate of dementia was only 2.2%), as some supervised algorithms showed worse performance with unbalanced datasets.
Third, a logistic regression model using cut-off points of the STAD score was also created as a benchmark to evaluate the decision-tree model. In the test dataset, we identified the model performance of the decision-tree model and logistic regression model using areas under the curve (AUC) based on the receiver operating characteristic (ROC) analysis, accuracy, sensitivity, and specificity. All analyses were performed using IBM SPSS Statistics 25 and IBM SPSS Modeler 18 (IBM Japan, Tokyo, Japan). The level of statistical significance was set at P < 0.05.