- Research article
- Open Access
Clinical features in primary care electronic records before diagnosis of ankylosing spondylitis: a nested case-control study
BMC Family Practice volume 21, Article number: 78 (2020)
Ankylosing spondylitis (AS) often has a long period from first symptom presentation to diagnosis. We examined the occurrence of symptoms, prescriptions and diagnostic tests in primary care electronic records over time prior to a diagnosis of AS.
Nested case-control study using anonymised primary care electronic health records from Scotland. Cases were 74 adults with a first diagnosis of AS between 2000 and 2010. Controls were matched for age, sex and GP practice: (a) 296 randomly selected adults (b) 169 adults whose records contained codes indicating spinal conditions or symptoms.
We extracted clinical features (symptoms, AS-related disorders, prescriptions and diagnostic tests). Conditional logistic regression was used to examine the association between clinical features (both individually and in combinations) and diagnosis of AS. We examined the associations between clinical features and diagnosis over time prior to diagnosis.
Several new composite pointers were predictive of AS: including distinct episodes of axial pain separated by more than 6 months (OR 12.7, 95% CI 4.7 to 34.6); the occurrence of axial pain with and tendon symptoms within the same year (OR 21.7, 95% CI 2.6 to 181.5); and the co-occurrence (within 30 days) of axial pain and a prescription for nonsteroidal anti-inflammatory drug (OR 10.4, 95%CI 4.9 to 22.1). Coded episodes of axial pain increased steadily over the 3 years before diagnosis. In contrast, large joint symptoms and enthesopathy showed little or no time trend prior to diagnosis.
We identified novel composite pointers to a diagnosis of AS in GP records. These may represent valuable targets for diagnostic support systems.
Ankylosing Spondylitis (AS) is an uncommon rheumatological condition in primary care, for which there is often a long time between consultation and diagnosis [1, 2]. As with other conditions in which a long period between first consultation and diagnosis is often seen, symptoms of AS (such as spinal pain, stiffness and fatigue) are both non-specific and frequently occuring [2, 3]. This often leads to primary care doctors assigning the symptoms of AS to more common back pain conditions.
Most research on the clinical features of AS in primary care has focused on the characteristics of back pain. Inflammatory features such as worsening in the second half of the night, causing stiffness on waking and relief by exercise [4,5,6,7,8] have high sensitivity but relatively low specificity for AS. We considered the possibility of using data in electronic records to support earlier diagnosis of AS. While records may include codes for back pain, they do not currently include searchable information about the characteristics of pain. However other features in records may act as proxy markers for these pain charateristics. For instance the association in time between back pain and prescription of non-steroidal anti-inflammatory drugs (NSAID) may suggest inflammatory back pain (which often responds well to NSAIDs). While such knowledge-derived features  are not immediately present in electronic records, they can be constructed [10, 11]. One very recent study has used machine learning based on single items in a clinical care database but still found only low predictive values ..
We aimed to (a) construct enriched datasets from electronic health records which contained conventional and composite features potentially predictive of AS; (b) examine the association of these features with a subsequent diagnosis of AS in a nested case-control study; (c) examine the relationship of these features to diagnosis at different time periods before the date of diagnosis.
We analysed data from the Practice Team Information (PTI) database, a subset of the Primary Care Clinical Informatics Unit Research database held by the University of Aberdeen. The PTI database is comprised of pseudonymised electronic health records which were collected between 1996 and 2010 from approximately 224,000 patients registered with a primary care physician in Scotland. It is broadly representative of the Scottish population with regards to age, sex, deprivation and geographical location in terms of the ratio of urban: rural practices . Practices which contributed their data to the PTI project were expected to record every clinical encounter using Read codes for clinical diagnoses and / or, main reasons for consultation. Diagnostic codes entered during and before the study dates were included in the data. All GP prescriptions were automatically recorded throughout the database period. Procedures for the routine recording and coding of diagnostic investigations changed over the database period as electronic linkages between laboratories and GP practices developed. Investigations were present in the data more in later years of the database period. The study was approved by the Primary Care Clinical Informatics Unit (PCCIU) team in keeping with PCCIU and local ethical committee procedures.
We conducted a nested case-control study. Cases were patients whose first recorded diagnosis of AS was between 1/1/2000 and 31/12/2010 and who were aged between 18 and 50 years at the time of diagnosis. We excluded patients whose first recorded diagnosis occurred within 1 year of registering with their GP practice as (a) it was possible that this represented the coding of an earlier diagnosis for the purposes of updating record summaries i.e. a prevalent rather than incident case of AS (b) it did not allow sufficient period of time in which relevant data before diagnosis could be examined for features predictive of AS. We then excluded patients who had been prescribed a disease modifying anti-rheumatic drug (sulphasalazine or methotrexate) more than 1month prior to their coded diagnosis. We did this in order to ensure that our analysis was limited to incident cases rather than prevalent cases in whom an earlier diagnosis had not been coded at the time it was made.
Population controls who did not have a diagnosis of AS at the index date were identified electronically from the database for each case. Controls were individually matched on age, sex and GP practice. Where more than four matched controls were available for a given case the computer randomly selected four. A second control group comprised patients with codes for other spinal diagnoses including degenerative, mechanical and intervertebral disc disorders or for a symptom of axial pain, but with no recorded diagnosis of AS. These controls were also electronically selected for each case and individually matched by age (within 2 years), sex and GP practice, with up to four symptomatic controls per case. We defined the index date for cases as the date of diagnosis of AS and for controls as the date of diagnosis of AS in the matched case.
Data extraction and preparation
For all cases and controls, data were extracted which detailed the dates of consultation for particular symptoms, disorders, tests and procedures and drugs prescribed. Table 1 lists the key data extracted and the categories into which we grouped related items.
As well as examining various individual features e.g. axial pain, we enriched the data by calculating when a number of composite features had occurred e.g. axial pain occurring within 30 days of a prescription for a non-steroidal anti-inflammatory drug (NSAID). We specified composite features according to one of three relationships: proximity (where two features occurred within a given number of days of each other), separate (where two consecutive instances of the same features occurred more than a given number of days apart), and exclusive (where one code occurred and another was not present). We used the separate composite features in order to identify discrete episodes as opposed to a single episode comprising multiple instances of a feature.
For each feature (single and composite) we ascertained its presence in the record of each individual at any time in the record, and during a series of overlapping three-year time windows set at different intervals from the index date (for diagnosis or matching). We defined the windows using intervals between the end of the window and the index date of 0, 3, 6, 12, 18, 24 and 36 months. We then examined the appearance of statistical associations between available information in the record and diagnosis over time by comparing the same measure in different windows. The purpose of this was to differentiate between features which were present long before diagnosis (and might thus indicate missed diagnostic opportunities) and those which appeared only shortly before diagnosis (and may thus have triggered referral).
Analysis of association of features and patterns with diagnosis
We carried out conditional logistic regression to examine the association between each feature (conventional or composite) and the diagnosis of AS. Each feature was reported as either present or absent within the time period. Rather than use counts of how often a feature occurred, we used the “separated” composite variables to indicate multiple episodes. Analyses were reported as the odds ratio, OR (with 95% confidence intervals, CI). All analyses were conducted in R 3.6 .
We conducted the analysis separately with population and symptomatic control groups. For the time window analysis, we limited the data to patients who had been registered with their practice for at least 1year before the beginning of the relevant gap prior to diagnosis. We plotted the odds ratios for each feature at each of the six different time gaps in order to visualise the appearance of predictive features over time.
There were 74 newly diagnosed cases who met the study criteria. The annual number of diagnoses was broadly similar between 2000 and 2006 (representing an incidence of approximately 4/100,000 registered patients per year) but fell after 2006 – this coincided with a progressive reduction in the size of the database as GP practice computer systems were replaced.
Cases were matched to 296 population controls and 169 symptomatic controls. 53 cases (72%) were men and median age at diagnosis of AS was 37 years (interquartile range 31 to 43).
54 cases (73%) were registered with the same GP practice (and therefore had continuous records in the PTI database) for at least 6 years before diagnosis. Similar proportions were seen for population and symptomatic controls (81 and 72% respectively). A code for one or more prescription of an appropriate treatment (e.g. a NSAID) was present for 68 cases (92%). Diagnostic tests were coded less often – any relevant diagnostic code (such as for a full blood count, ESR or x-ray) was present in only 30 cases (41%).
Occurrence of diagnostic features
The numbers and proportions of patients with at least one instance of each feature, either in the 3 years prior to the index date or at any time are shown in Table 2 (vs. population controls) and Table 3 (vs. symptomatic controls). Tables 2 & 3 also show the ORs (with 95% CIs) for the two comparisons.
As expected axial pain was more common in cases than population controls (OR 9.8, 95% CI 5.1 to 18.9) but not than symptomatic controls (OR 1.0, 95% CI 0.6 to 1.8) in the 3 years period before the index date. Tendon related disorders and iritis were both more common in cases than population controls (OR 3.4, 95% CI 1.3 to 8.7 and OR 32.0, 95% CI 4.0 to 255.9) but were recorded in only 21 and 16% of cases respectively. Urethral symptoms were infrequently recorded in all groups. Fatigue was not more common in cases when compared to population and symptomatic controls (OR 1.9, 95% CI 0.8 to 4.2 and 2.1, 95% CI 0.8 to 5.7) respectively. A history of inflammatory bowel disease was present in 16% of cases at any time before diagnosis. Codes indicating recording of x-rays and MRI scans were rare among cases and controls.
Occurrence of prescribed treatments
In both the population and the symptomatic group comparisons, both analgesics (OR 6.0, 95% CI 3.3 to 10.8 and OR 2.0, 95% CI 1.1 to 3.6) and NSAIDS (OR 12.9, 95% CI 6.3 to 26.8 and OR 3.6, 95% CI 1.8 to 7.1) were more commonly prescribed in the 3 year period before the index date to cases than controls. Prescriptions of tricyclic antidepressants, typically prescribed for chronic pain, were more common compared to population controls, unlike prescriptions for other antidepressants.
Table 4 shows the number and proportion of patients with at least one instance of each of the composite features over the 3 years before date of diagnosis/ matching. Several composite features appeared relatively infrequently. Only three occurred in more than 15% of cases: distinct episodes of axial pain separated by more than 6 months (OR 12.7, 95% CI 4.5 to 34.6); the occurrence of axial pain with and tendon symptoms within the same year (OR 21.7, 95% CI 2.6 to 181.5); and the co-occurrence (within 30 days) of axial pain and a prescription for nonsteroidal anti-inflammatory drug (OR 10.4, 95% CI 4.9 to 22.1).
Occurrence of diagnostic features over the time prior to diagnosis
Figure 1 shows histograms of the number of years between first episode of back pain or NSAID prescription and diagnosis (or matching) for cases and symptomatic controls. The median time between first coded episode of back pain and diagnosis of AS was 4 years (interquartile range 2 to 7). For the same patients the median time between first prescription for a NSAID was 4 years (interquartile range 2 to 6). Fig. 2 shows plots of eight diagnostic features, showing the ORs for three-year time windows with different intervals between the end of the three-year window and the diagnosis / matching date. Each plot compares cases with matched population controls (in blue) and matched symptomatic controls (in red). In all plots, 95% confidence intervals are indicated by dotted lines. The comparison with population controls demonstrates the development of features over time. The comparison with symptomatic controls indicates whether features have different predictive value in diagnosing symptomatic patients at different stages.
The plot for axial pain shows that the odds ratio for coded episodes of axial pain rose steadily from the 3 year period ending 3 years before diagnosis to the 3 year period ending at the time of diagnosis when compared to population controls. On the other hand, the plots for large joint symptoms and (in the 2 years prior to diagnosis) enthesopathy suggests little or no time trend. The combination of axial pain and large joint symptoms – while relatively infrequent – shows a strong signal beginning at least 2 years before diagnosis.
Summary of main findings
This study demonstrates new and potentially useful composite features within electronic health records, such as two or more distinct episodes of axial pain, which appear to have predictive value and may in turn lead to earlier diagnosis.
Strengths and limitations
Our choice of features as pointers used principles of selection based on expert input  and methods of data consolidation and aggregation which have been developed for use with clinical data sources other than GP records [10, 16]. This sequence of steps is broadly comparable with other recent approaches to the summarisation of clinical data [16, 17]. We used an established anonymised GP record set which contained both diagnostic and symptom codes using the Read code format as well as prescribing data which means that the method is transferrable to other datasets of primary care data and potentially into clinical use.
There were limitations relating to the data. The first was the small number of incident cases of AS. This meant that confidence intervals were wide and it is possible that we lacked statistical power to detect some potentially meaningful associations. The data was from stand-alone primary care records with no linkage to secondary care records so we could not assess the reliability of GPs’ diagnosis of AS, however in our experience GP practices tend not to code such diagnoses without specialist opinion and in a recent US study a over 80% of a sample of coded diagnoses of AS were confirmed on chart review . It should be noted that the annual incidence (approximately 4 per 100,000 per year in adults aged 18–60 years) is compatible with the lifetime prevalence of approximately 15 per 10,000 observed in other studies [19, 20]. The diagnostic criteria for AS and the wider spondyloarthropathies evolved during the exposure time period  and it is increasingly recognised that disorders in the spondyloarthritis spectrum are much more common than full AS .
The data on symptoms and investigations were more sparse than anticipated with only around half of cases having back pain coded in the 3 years prior to diagnosis. This probably reflects the limited use of symptom codes by GPs, even in this database where a reason for consultation was meant to be given for each attendance. For those cases where a code for axial pain was entered, there were not long periods of GPs issuing NSAID prescriptions prior to a symptom code. The use of diagnostic tests was under-reported in the database, particularly until around 2006 when the direct importing of laboratory tests into electronic records began being used widely in contributing practices. While tests such as inflammatory markers have low predictive value,  the fact that they were being carried out suggests GPs may have had a raised index of suspicion for AS in at least some patients.
Comparison with existing literature
Previous studies of the clinical features of AS have used patient self-report rather than entries coded in GP records [4, 5, 24]. One recent machine learning study used a broadly similar range of single items from a different clinical dataset  but did not explicitly code composites which were clinically intuitive. We are not aware of studies which have looked for combinations of features. Our analysis of the emergence of clinical features over time confirms that in some cases there is an observably long time to diagnosis but also shows that the predictive value of clinical features does increase with proximity to diagnosis .
Implications for research and practice
The ultimate purpose of this research is to identify clinically useful predictors of a diagnosis of AS in order to facilitate early diagnosis. None of the features examined here are sufficient on their own, but merit examination in a larger dataset. However some features (such as the multiple episodes of axial pain) may be useful triggers to clinicians to ask more specific questions about inflammatory back pain. Diagnostic support prompts are more effective at the beginning of a consultation (influencing the clinician’s prior probabilities and triggering specific questions) than at the end  so computation of the composite indicators of the kind we found, might not need to be carried out in real-time, but could run during quiet periods for the database and be used to inform future consultations.
The next step in research is to repeat this work with a larger, more recent and less sparse dataset. This should be able to access more information about diagnostic tests (through automatic transfer of results from laboratories to electronic records in general practice) and linkage to hospital records (to confirm diagnosis). Additionally, machine learning techniques [12, 26, 27] have potential value in feature reduction and model selection. Ultimately the aim must be to apply these observations within predictive models for earlier referral and diagnosis of AS.
We have developed and tested conventional and new composite pointers to a diagnosis of ankylosing spondylitis in GP records. Some of these were present several years before the diagnosis and may be valuable targets for systems to support earlier diagnosis.
Availability of data and materials
The datasets generated during and/or analysed during the current study are not publicly available as they represent anonymised data from clinical records. They are available from the corresponding author on reasonable request and subject to the approval of the Research Applications and Data Management Team at the University of Aberdeen.
Erythrocyte sedimentation rate
Full blood count
Magnetic resonance imaging
Non steroidal anti-inflammatory drug
Primary Care Clinical Informatics Unit
Practice Team Information.
Selective serotonin reuptake inhibitors
Dincer U, Cakar E, Kiralp MZ, Dursun H. Diagnosis delay in patients with ankylosing spondylitis: possible reasons and proposals for new diagnostic criteria. Clin Rheumatol. 2008;27(4):457–62.
Sykes MP, Doll H, Sengupta R, Gaffney K. Delay to diagnosis in axial spondyloarthritis: are we improving in the UK? Rheumatology (Oxford). 2015;54(12):2283–4.
National Institute for Health and Care E. National Institute for Health and Care Excellence: Clinical Guidelines. Spondyloarthritis in over 16s: diagnosis and management. London: National Institute for Health and Care Excellence (UK) Copyright (c) National Institute for Health and Care Excellence; 2017. p. 2017.
Underwood MR, Dawes P. Inflammatory back pain in primary care. Br J Rheumatol. 1995;34(11):1074–7.
Kain T, Zochling J, Taylor A, Manolios N, Smith MD, Reed MD, et al. Evidence-based recommendations for the diagnosis of ankylosing spondylitis: results from the Australian 3E initiative in rheumatology. Med J Aust. 2008;188(4):235–7.
van Hoeven L, Vergouwe Y, de Buck PD, Luime JJ, Hazes JM, Weel AE. External validation of a referral rule for axial Spondyloarthritis in primary care patients with chronic low Back pain. PLoS One. 2015;10(7):e0131963.
van Hoeven L, Luime J, Han H, Vergouwe Y, Weel A. Identifying axial spondyloarthritis in Dutch primary care patients, ages 20-45 years, with chronic low back pain. Arthritis Care Res. 2014;66(3):446–53.
Poddubnyy D, Vahldiek J, Spiller I, Buss B, Listing J, Rudwaleit M, et al. Evaluation of 2 screening strategies for early identification of patients with axial spondyloarthritis in primary care. J Rheumatol. 2011;38(11):2452–60.
Sleeman D, Moss L, Aiken A, Hughes M, Kinsella J, Sim M. Detecting and resolving inconsistencies between domain experts' different perspectives on (classification) tasks. Artif Intell Med. 2012;55(2):71–86.
Reis BY, Kohane IS, Mandl KD. Longitudinal histories as predictors of future diagnoses of domestic abuse: modelling study. BMJ. 2009;339:b3677.
Burton C, Iversen L, Bhattacharya S, Ayansina D, Saraswat L, Sleeman D. Pointers to earlier diagnosis of endometriosis: a nested case-control study using primary care electronic health records. Br J Gen Pract. 2017;67(665):e816–e23.
Deodhar A, Rozycki M, Garges C, Shukla O, Arndt T, Grabowsky T, et al. Use of machine learning techniques in the development and refinement of a predictive model for early diagnosis of ankylosing spondylitis. Clin Rheumatol. 2020;39(4):975–82.
Scotland I. General Practice - Practice Team Information (PTI_ http://www.isdscotlandarchive.scot.nhs.uk/isd/3727.html: ISD Scotland; 2011 [.
Team RC. R: a language and environment for statistical computing. R Foundation for Statistical Computing: Vienna; 2018.
Sleeman D, Moss L, Sim M, Kinsella J. Predicting adverse events: detecting myocardial damage in intensive care unit (ICU) patients. Proceedings of the sixth international conference on knowledge capture. New York: ACM Press; 2011.
Feblowitz JC, Wright A, Singh H, Samal L, Sittig DF. Summarization of clinical information: a conceptual model. J Biomed Inform. 2011;44(4):688–99.
Hirsch JS, Tanenbaum JS, Lipsky Gorman S, Liu C, Schmitz E, Hashorva D, et al. HARVEST, a longitudinal patient record summarizer. J Am Med Inform Assoc. 2015;22(2):263–74.
Walsh JA, Pei S, Penmetsa GK, Leng J, Cannon GW, Clegg DO, et al. Cohort identification of axial spondyloarthritis in a large healthcare dataset: current and future methods. BMC Musculoskelet Disord. 2018;19:317.
Wang R, Ward MM. Epidemiology of axial spondyloarthritis: an update. Curr Opin Rheumatol. 2018;30(2):137–43.
Dean LE, Macfarlane GJ, Jones GT. Differences in the prevalence of ankylosing spondylitis in primary and secondary care: only one-third of patients are managed in rheumatology. Rheumatology (Oxford). 2016;55(10):1820–5.
Proft F, Poddubnyy D. Ankylosing spondylitis and axial spondyloarthritis: recent insights and impact of new classification criteria. Ther Adv Musculoskelet Dis. 2018;10(5–6):129–39.
Hamilton L, Macgregor A, Toms A, Warmington V, Pinch E, Gaffney K. The prevalence of axial spondyloarthritis in the UK: a cross-sectional cohort study. BMC Musculoskelet Disord. 2015;16:392.
Turina MC, Yeremenko N, van Gaalen F, van Oosterhout M, Berg IJ, Ramonda R, et al. Serum inflammatory biomarkers fail to identify early axial spondyloarthritis: results from the SpondyloArthritis caught early (SPACE) cohort. RMD open. 2017;3(1):e000319.
Hermann J, Giessauf H, Schaffler G, Ofner P, Graninger W. Early spondyloarthritis: usefulness of clinical screening. Rheumatology (Oxford). 2009;48(7):812–6.
Nurek M, Kostopoulou O, Delaney BC, Esmail A. Reducing diagnostic errors in primary care. A systematic meta-review of computerized diagnostic decision support systems by the LINNEAUS collaboration on patient safety in primary care. Eur J Gen Pract. 2015;21(Suppl):8–13.
Mitchell TM. Machine learning. McGraw Hill series in computer science. 1997:I-XVII,1–414.
Darcy AM, Louie AK, Roberts LW. Machine learning and the profession of medicine. Jama. 2016;315(6):551–2.
This work was supported by a vacation scholarship bursary from North East Scotland Faculty of the Royal College of General Practitioners (MTB). The funder had no role in the design, conduct or reporting of the study.
Ethics approval and consent to participate
The study involved analysis of anonymised data. The study was approved by the Primary Care Clinical Informatics Unit (PCCIU) team in keeping with PCCIU and local ethical committee procedures (Study Reference 201501A).
Consent for publication
Corresponding author Christopher Burton is a Section Editor of BMC Family Practice. The other two authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Bashir, M.T., Iversen, L. & Burton, C. Clinical features in primary care electronic records before diagnosis of ankylosing spondylitis: a nested case-control study. BMC Fam Pract 21, 78 (2020). https://doi.org/10.1186/s12875-020-01149-2
- Ankylosing spondylitis
- Primary care
- Electronic health records