Main findings
GPPAQ is acceptable to older primary care patients, has reasonable reliability, particularly when repeated in the same season, with 67 % agreement in findings at 12 months. However, it has poor validity in this age group for identifying PA levels accurately, with a sensitivity of only 19 % compared to objective accelerometry PA assessment. The sensitivity was more than doubled by including brisk walking in the scoring (GPPAQ-Walk), but at the cost of a marked reduction in specificity, suggesting that modifying GPPAQ to include walking in this age group did not improve its overall performance as a screening test.
Strengths and limitations
Strengths
This study has a large number of participants and also has an objective PA measure, accelerometry, as the criterion, with which GPPAQ was compared. Accelerometry was administered over the same 7 day period preceding the completion of GPPAQ, so the PA measurements correspond to the time period covered by the questionnaire. Evidence from average wear-time indicated that there was a high average wear-time (over 13 h per day, which covers approximately 8 am to 9 pm) and indicated that we captured a high proportion of the physical activity achieved by this population. We have repeated measures of GPPAQ at baseline, 3 months and 12 months and also from computerised primary care records, allowing comparisons over a short time period (but different season) and longer time period (but same season) as well as with data collected in different ways by different individuals. We have data on occupational status/retirement, allowing us to examine the effect of this on GPPAQ validity. We have examined the effect on validity of adding in brisk or fast walking to the GPPAQ score, as this is the primary PA in older people. The participants are primary care patients, the target group for GPPAQ, approximately half are male and they are comparable in terms of the proportion who are overweight and obese with a recent population based survey of this age group (68 % in PACE-Lift, 73 % in Health Survey England [21]).
Limitations
The Actigraph GT3X+ accelerometer does not register complex movements well, consequently power sports (such as weight lifting), swimming and cycling are not accurately registered [22]. However, it monitors walking, the main PA in older adults, accurately, is able to identify if PA is occurring in ≥10 min bouts, as stipulated by guidelines, is widely used in PA research [23]. There are small amounts of missing GPPAQ data: (9/298 (3 %) at baseline, 31/298 (10 %) at three months and 16/298 (5 %) at twelve months). However, this is unlikely to bias analyses which are all within person comparisons. The pre-specified cut-off on time spent walking used to define GPPAQ-WALK (≥3 h weekly) was not exactly the same as those used in guidelines (2.5 h weekly), but was the nearest provided by the questionnaire. The effect of using a slightly higher, more difficult to achieve cut-off (≥3 h rather than ≥2.5 h) means that we may have underestimated the sensitivity and overestimated the specificity. However, GPPAQ-WALK already demonstrated greater sensitivity but poorer specificity than GPPAQ, so the direction of any effect of using the lower cut-off of ≥ 2.5 h would be unchanged from what we have described. The PACE-Lift trial only includes 60–74 year olds and is therefore not able to comment on GPPAQ’s reliability and validity in 40–59 year olds, where its use is also promoted in NHS health checks [6].
Comparison with existing literature
Other studies have used correlation coefficients for questionnaire validation [8]. However, this relies on arbitrary weighting decisions. We chose to present the raw data and calculate percentage agreement for reliability in addition to kappa statistics and to present sensitivity and specificity for validity (as has also been presented by others [24]). The limitations of using correlation coefficients in PA questionnaire assessment have been discussed in other papers where Bland-Altman plots have been chosen instead [25]. As described previously, there are no published studies with data on GPPAQ’s reliability and validity, and this is the first study to assess GPPAQ against objective accelerometer assessment in a large sample of older adults. GPPAQ is derived from the EPIC PA questionnaire (EPIC-PAQ), which has been validated (good reliability, weighted kappa = 0.6, p < 0.0001 and positive associations with objective measures of energy expenditure, p = 0.003) [8]. Though GPPAQ was derived from EPIC-PAQ, it is difficult to use data on validation of EPIC-PAQ to throw light on our findings as their validation was primarily focussed on demonstrating correlations between accelerometry and their questionnaire measures, and not on the sensitivity and specificity for identifying those requiring intervention to increase PA [26] [9]. EPIC-PAQ relies on recall of PA over the last year whereas GPPAQ relies on recall over the preceding week, and EPIC-PAQ was validated in 50–65 year olds, whilst GPPAQ is aimed at 16–74 year olds [5] and used in NHS health checks for 40–74 year olds [6].
A feasibility study of GPPAQ use in patients aged 35–74 years across four Northern Ireland general practices examined 192 questionnaires and found that GPs and nurses reported it was an easy tool to assess PA levels with, although integration within routine practice was limited by time constraints and complex consultations [27]. An important study limitation was the 8 % GPPAQ completion rate [27].
In terms of other short PA self-report measures which could be appropriate for primary care, a systematic review of studies validating the short form of the International Physical Activity Questionnaire (IPAQ) found that it significantly overestimated PA when compared to objective measurement [28]. A recent review of reliability and validity of 34 new PA questionnaires assessed their performance across age groups. In the elderly they found that although there is a reasonable reliability (median correlation coefficient 0.60-0.65 in existing questionnaires and 0.70 in the newer questionnaires) the validity was ‘poor to acceptable’ in the elderly (0.35-0.40 in existing and 0.41 in new questionnaires). They also identified sedentary behaviour as a difficult domain to assess using questionnaires, with poor correlation with objective measures [29]. A global physical activity questionnaire that has been validated against national self-report survey data in a large study of almost 2 million patients in the US is the Exercise Vital Sign- a 2-item PA questionnaire [30] but to date has not been validated against an objective PA measure or used in UK primary care.
Implications for research and practice
The health benefits of PA, specifically in older adults, have been well documented and an accurate validated tool that would identify which older patients would benefit most from a PA intervention would be of great benefit in general practice. In response to GPs’ concerns over GPPAQ, NICE recommended further studies. This study suggests that whilst the GPPAQ has reasonable reliability, it is not a valid tool for assessing PA levels in older adults. Our findings therefore support the retraction of GPPAQ from the hypertension QOF [13] and question its continued use in NHS health checks [6] for this age group. Currently while there are good cheap pedometers which, when worn on the hip measure step-counts, they do not measure physical activity intensity. Accelerometers worn on the hip, such as those used in our study, provide measures of both steps and intensity and thus of MVPA. They are however relatively expensive and a little uncomfortable to wear. Wrist worn devices offer improved wear acceptability, and potentially 24 h wear-time, and can be waterproofed. However, accelerometers worn on the hip generally do better than similar data from wrist worn accelerometers for measuring energy expenditure. Moreover, accurately identifying sedentary behaviour from a lack of wrist motion presents significant challenges [31]. Nevertheless, improvements and rapid technological advances in PA measurement, including the use of smartphone applications and cheap accelerometers, are likely to provide more robust measures of PA in primary care, rather than relying on short but invalid questionnaires.