Journals Library

Validation studies within QResearch and CPRD

Project title
 

Validation studies within QResearch and CPRD

 
Project reference
 

93

 
Final report date
 

30 March 2015

 
Project start date
 

01 October 2012

 
Project end date
 

28 February 2015

 
Project duration
 

3 years

 
Project keywords
 

Risk Prediction; Validation; Epidemiology; QRISk2; QDiabetes; QStroke; QResearch; CPRD

 
Lead investigator(s)
 
  • Professor Julia Hippisley-Cox, School of Medicine, University of Nottingham
 
NIHR School Collaborators
 
  • Professor Carol Coupland, School of Medicine, University of Nottingham
  • Dr Peter Brindle, School of Social and Community Medicine, University of Bristol
 

Project objectives

Our objective is to validate the performance of a series of existing risk prediction scores which have been developed on the QResearch database using a sample of data derived from the GPRD.

Changes to project objectives

No changes to the methodology or scope of the project. This original funding of £210K was from the NSPCR budget to support new members of the school joining in 2009. We were required to give part of our funding (50K) to an existing member (Oxford University). Initial plans to collaborate with Oxford became impractical in light of protracted complex legal negotiations so we agreed to report separately on our share of the budget (160K).

Brief summary

Objectives

To validate the performance of a set of risk prediction algorithms developed using the QResearch database, in an independent sample from general practices contributing to the Clinical Research Data Link (CPRD).

Setting

Prospective open cohort study using practices contributing to the CPRD database and practices contributing to the QResearch database.

Participants

The CPRD validation cohort consisted of 3.3 million patients, aged 25-99 years registered at 357 general practices between 01 Jan 1998 and 31 July 2012. The validation statistics for QResearch were obtained from the original published papers which used a one third sample of practices separate to those used to derive the score. A cohort from QResearch was used to compare incidence rates and baseline characteristics and consisted of 6.8 million patients from 753 practices registered between 01 Jan 1998 and until 31 July 2013.

Outcome measures

Incident events relating to seven different risk prediction scores: QRISK2 (cardiovascular disease); QStroke (ischaemic stroke); QDiabetes (type 2 diabetes); QFracture (osteoporotic fracture and hip fracture); QKidney (moderate and severe kidney failure); QThrombosis (venous thromboembolism); QBleed (intracranial bleed and upper gastrointestinal haemorrhage). Measures of discrimination and calibration were calculated.

Results

Overall, the baseline characteristics of the CPRD and QResearch cohorts were similar though QResearch had higher recording levels for ethnicity and family history. The validation statistics for each of the risk prediction scores were very similar in the CPRD cohort compared with the published results from QResearch validation cohorts. For example in women, the QDiabetes algorithm explained 50% of the variation within CPRD compared with 51% on QResearch and the ROC value was 0.85 on both databases.  The scores were well calibrated in CPRD.

Conclusion

Each of the algorithms performed practically as well in the external independent CPRD validation cohorts as they had in the original published QResearch validation cohorts. This is the first external validation of a set of QPrediction scores on the CPRD. It is important since CPRD represents a fully independent sample of patients registered with general practices using a different clinical computer system from that used to derive the algorithms.

The discrimination and calibration statistics for each score were very similar in CPRD to those published from validation cohorts from QResearch. This supports their potential utility in the general population of patients in primary care.

A strength of using CPRD for risk score validation is that the risk score can be assessed using data collected in a similar manner to the data that would be used when the risk score is used in clinical practice.

The difficulty of obtaining a comprehensive code list for any given outcome or exposure is a limitation common to all research in primary care databases. We mitigated this by matching our code lists for the CPRD primary analysis to the code lists in the QResearch derivation data set wherever possible.

Further research is needed to evaluate the clinical outcomes and cost-effectiveness of using these algorithms in primary care.

Plain English summary

In the last 7 years, we have developed a series of risk prediction algorithms using the QResearch database. QResearch is a large research database containing pseudonymised individual level data from over 700 general practices using the EMIS clinical system. The QResearch database consists of data collected from primary care (coded information on socio-demographic characteristics, diagnoses, symptoms, smoking/alcohol, clinical measurements, laboratory values, prescriptions and referrals) which has been linked to cause of death, hospital episodes and cancer registrations at individual patient level.

The algorithms predict outcomes such as cardiovascular disease (www.qrisk.org), stroke (www.qstroke.org), type 2 diabetes (www.qdiabetes.org), osteoporotic fracture (www.qfracture.org), moderate or severe kidney disease (www.qkidney.org), venous thrombo-embolism (www.qthrombosis.org), and emergency hospital admission (www.qadmissions.org). Generally, the “QPrediction” algorithms have been designed to systematically identify patients in primary care at high risk of a serious clinical outcome for whom further intervention to lower risk of that outcome might be possible. They are also designed to quantify absolute risk of serious outcomes in a way which patients can understand and which might help guide lifestyle and management decisions. A number of these algorithms are now integrated into GP clinical computer systems, included in national guidelines and are in daily use across the NHS.

The algorithms were originally developed using a random two thirds sample of practices contributing to the QResearch database and validated on the remaining third. Whilst this represents a physically discrete population of patients and practices for validation, the practices all use the same clinical computer system (EMIS), which is in use in 53% of UK practices. A more stringent test of performance is to validate the algorithms on a fully external database derived from practices using a different but commonly used primary care computer system. This would help determine whether the predictions from the algorithms are likely to generalise to the whole population in England. Whilst some of the algorithms have been validated by an independent team using the THIN primary care database, there are currently no published validations of the algorithms using a primary care database which is routinely linked to mortality data in the same way as QResearch.

We therefore decided to validate the various QPrediction Scores using another database known as the Clinical Research Data Link (CPRD). The General Practice Research database (GPRD) was originally set up in 1988 and is of similar nature to QResearch although it is derived from practices using a different clinical computer system. It was extended to include linked mortality data and data from secondary care and was renamed the Clinical Research Data Link (CPRD) in 2012.

Each of the algorithms performed practically as well in the external independent CPRD validation cohorts as they had in the original published QResearch validation cohorts. This is the first external validation of a set of QPrediction scores on the CPRD. It is important since CPRD represents a fully independent sample of patients registered with general practices using a different clinical computer system from that used to derive the algorithms.

Dissemination

  • Hippisley-Cox J, Coupland C, Brindle P. The performance of seven QPrediction risk scores in an independent external sample of patients from general practice: a validation study. BMJ Open 2014;4(8):e005809. 
    http://www.ncbi.nlm.nih.gov/pubmed/25168040

Public involvement

The results have been presented at meetings of the QResearch advisory board which has patient representation as well as at the EMIS National User Group (conference including clinical and IT professionals who use the EMIS computer system – many of the practices contribute to the QResearch database). Overall we concluded we need to think of new ways to engage with patients in these sorts of database studies and plan to explore that in future grant applications.

Impact

The paper has only just been published. Public Health England are using the results to inform their current update of the NHS Health Check Programme. We gather that the NICE technology appraisal on osteoporosis (currently in progress) has been interested in the results of the validation of Qfracture. A number of the scores are now implemented into GP computer systems having completed this external validation.

This project was funded by the National Institute for Health Research School for Primary Care Research (project number 93)

Department of Health Disclaimer

The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the NIHR School for Primary Care Research, NIHR, NHS or the Department of Health.