The discovery of biomarkers for AD is an increasingly important task – both for early diagnosis and for use in experimental medicine. However, it is a task complicated by at least three major intrinsic difficulties. First the complexity of AD pathology means that identification of candidate markers, beyond the low hanging fruit of AÎ² and tau, is problematical. Second, collection of the optimal peripheral fluid for biomarker identification, CSF, is relatively invasive and unsuitable for repeated measures in elderly people. Third, AD has a prolonged prodrome when apparently normal elderly people harbor considerable pathological load meaning that the conventional case–control design is confounded by pathology in clinically unaffected subjects. Previously we and others have attempted to mitigate the third of these complications by using a design of biomarker discovery where the independent or outcome variable is not clinical diagnosis but an endophenotype of disease such as structural MRI evidence of atrophy  or PET evidence of Aβ load . The second of the limitations in biomarkers for AD – the availability of CSF – has prompted many groups to seek markers in other fluids such as plasma. The first of the limitations, the identification of candidates, has been previously attempted by two broad categories of studies; either using candidates based on the researcher’s own understanding of disease or using a data-driven, most often proteomic, approach. Here we combine the use of endophenotypes to complement diagnostic category as an outcome measure, with the use of plasma as a biomarker tissue, with an entirely novel approach to the identification of candidates. This innovation makes use of linguistic and textual analysis to interrogate the entire biomedical knowledge base in the form of all the major publicly available databases to identify candidate markers using a consensus driven set of primary assertions. We accessed the various data sources in 2006, and clearly in a fast moving field data will have changed considerably in the intervening years. Indeed some of the proteins identified in 2006 (such as Transthyretin and Clusterin) had not at that time been considered as biomarkers. However, when the in vitro analysis was carried out, data had been provided from our own proteomics studies that these proteins were in fact putative biomarkers. Thus this time lag has inadvertently provided further substantiation of the proof of concept of the in silico approach that we discuss here.
The textual analysis of publicly available data sources suggested a total of 25 potential candidate biomarkers. Some of these have previously been identified as potential biomarkers in plasma. For example, using MRI measures of atrophy as an outcome endophenotype we identified and confirmed plasma Clusterin  and Transthyretin  as measure of severity of disease and using PET measures of amyloid identified apoE protein as the primary correlate in plasma . All these studies used gel based proteomics as the discovery tool and the fact that textual analysis identifies the same proteins before these proteomic studies were performed is a strong indicator of the power of the method. Other promising candidates suggested by textual analysis, and where there is published data suggesting that these proteins are altered either in blood or CSF, include CRP [17–19], Complement factor 1 [20, 21], butyrylcholinesterase  and BACE1 [23, 24]. In all but the case of butyrylcholinesterase, this biomarker data was published after the IN lock-down and hence these biomarker utility data are independent of the IN and act as independent proof of concept.
As the IN identified as potential protein biomarkers proteins previously identified in proteomic studies – without this data entering into this particular network – we were encouraged to attempt further validation in plasma. We chose two proteins - neither previously identified as potential plasma biomarkers to our knowledge - and measured these in over 200 subjects most of whom had as part of the European AddNeuroMed project, automated analysis of structural MRI data available. One of these proteins – PLAUR – was significantly decreased in AD relative to controls with MCI being at an intermediate level. Both PLAUR and ChAT showed a correlation, inverse in the case of PLAUR, with imaging evidence of atrophy in control cases and both showed a smaller and non-significant, but in the same direction, correlation in AD cases. We used semi-quantitative immunoblotting as a screening method as in previous studies as this approach, in contrast to ELISA for example, yields information on degradation products and post translational modifications. In fact the data on these two chosen proteins suggested whole protein correlation with disease state suggesting future biomarker replication and qualification studies, beyond the intention of the present investigation, might progress rapidly to fully quantitative methods.
Urokinase plasminogen activator receptor (PLAUR) is a protein involved in many biological functions including cell signaling [25, 26]. By binding urokinase plasminogen activator (uPA), with which it forms an active complex (uPA-PLAUR), it catalyzes the transformation of zymogen plasminogen into the active protein plasmin, a serine protease which degrades fibrin. The receptor is also involved in cell signaling and in chemotaxis, and controls cell adhesion. Increased levels of PLAUR have been previously reported in inflammatory disorders  and has been implicated in chemotaxis leading to microglial accumulation in the core of amyloid plaques in brain in transgenic rodent models of AD. AÎ² induces PLAUR  and PLAUR is increased in microglia cells of human AD brains and in brains treated with amyloid β peptide [29, 30]. The inverse relationship we observe between soluble PLAUR and AD and brain atrophy is noteworthy and might suggest an inverse relationship either between soluble and, functional, membrane bound PLAUR or between central and peripheral PLAUR more generally. An inverse relationship between central amyloid load and peripheral, CSF, amyloid has been previously and extensively noted. The other novel protein association with pathology we identify, ChAt, is a key component of the cholinergic pathway which is severely affected in AD and is the target for the first symptomatic therapies for AD. We observe a relationship between cerebral atrophy and ChAT protein and this may reflect the loss of cholinergic neurons known to occur early in disease process.
In summary we show here that the extraction of data from huge volumes of biological datasets including text based information is possible and that the creation of hypothesis or assertion-driven analysis yields potential biomarkers. As some of these markers have been independently generated using proteomics, and, as here we show at least partial validation of the two markers tested, this finding offers strong support to a text mining approach to biomarker discovery using the ever increasing publically available datasets.