Journals Library

Validation of existing risk prediction models for colorectal cancer

Project title

Validation of existing risk prediction models for colorectal cancer

Project reference


Final report date

24 March 2018

Project start date

01 October 2015

Project end date

30 November 2017

Project duration

26 months

Project keywords

Colorectal cancer, risk, prediction, model, external validation

Lead investigator(s)
  • Dr Juliet Usher-Smith, Clinical Senior Research Associate, Department of Public Health and Primary Care, University of Cambridge
NIHR School Collaborators
  • Professor Simon Griffin, The Primary Care Unit, University of Cambridge
  • Dr Fiona Walter, The Primary Care Unit, University of Cambridge
  • Professor Jon Emery, The Primary Care Unit, University of Cambridge
  • Amelia Harshfield, Research Associate, University of Cambridge – data management, 
  • Dr Catherine Saunders, Senior Research Associate, University of Cambridge – statistics and variable definition, 
  • Mr Stephen Sharp, Senior Statistician, University of Cambridge  - statistical advice, University of Cambridge 
  • Professor Ken Muir, University of Manchester – Biobank data expertise

Project objectives

To quantify and compare the predictive utility of selected existing risk prediction models to identify incident cases of CRC in the UK Biobank cohort in order to guide selection of risk prediction models for use in the UK


Changes to the project originally outlined in the proposal

The duration of the original proposal was 12 months. The reason for the longer duration is that UK Biobank announced an updated release of cancer data shortly after the start of the project. That data was not released until January 2017 (15 months into the project). We completed the analysis and submitted the publication in May 2017. The delay from May to November was waiting for the publication to be accepted and to pay for open access publication from the grant.
We had also initially planned to include risk prediction models that included blood or tissue samples but as release of those from UK Biobank was delayed, we chose to limit our analysis to those including only variables routinely available or easily obtainable by self-completed questionnaire.

Brief summary


We performed an external validation of risk models following the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) guideline.

Selection of risk prediction models
We identified 40 risk prediction models for either CRC, colon cancer or rectal cancer from our recent systematic review and two that had been published since the end of the search period for that review (March 2014) and November 2016. If insufficient data were provided to operationalise the risk scores in the published articles, we contacted authors asking for the additional data. We excluded sixteen that included either biochemical or genetic biomarkers.
In three it was not possible to operationalise the risk score, seven were duplicate models and a further two included risk markers for which there is no comparable variable available within UKBiobank. We therefore included fourteen risk models in our analysis.

Risk factor and outcome variables

For each risk factor we used data collected at the baseline assessment at cohort entry. In all cases we matched variables from the Biobank dataset as closely as possible to those described in each model and if there was not an exact match we derived proxy variables. In most cases we were able to do this by combining existing variables. For some this was simple, for example summing beef, pork and lamb consumption to derive a variable for red meat. In some, however, it was more complex and required a number of assumptions. In other cases where an exact variable did not exist in the Biobank cohort, we derived variables from similar questions. For example, no data are available in Biobank for historic use of aspirin or non-steroidal anti-inflammatory drugs (NSAIDs). We therefore used responses to the question “Do you regularly take any of the following? Aspirin, Ibuprofen, Paracetamol, Codeine” or the presence of a code indicating NSAID use in the list of current regular treatments to categorise individuals as regular or current users and used the mean duration of use from the literature to estimate duration of use. The outcome for each risk model was newly diagnosed CRC using the data from linked cancer registries (ICD10 C18.0-C18.9, C19, C20 and C21.8). This was available for each participant up to 30 September 2014. We excluded from the analysis participants with a diagnosis of CRC (ICD9 153.0-153.9, 154.0, 154.1 and 154.8 and ICD10 C18.0-C18.9, C19, C20 and C21.8) prior to recruitment. 

Data analysis 

For all prediction models we first computed the predicted probability for each participant at baseline. We then assessed the discrimination and calibration of the risk scores. Although some risk models had been developed in all male populations, we assessed the performance in both men and women. For our primary analysis we used a “complete-case” approach, including only those for whom a risk score based on all risk factors could be computed and who had five-year follow-up. This was done on an individual risk score basis so the sample size varied between scores. To reflect the clinical application of risk scores, we did not exclude those who did not have five year follow-up due to death. We treated the outcome as a binary variable (developed CRC or did not develop CRC) and compared the overall discriminative ability of the models numerically with the area under the receiver operating characteristic curve (AUC). We also
calculated sensitivity, specificity, positive and negative likelihood ratios (LR+ and LR-) and the positive and negative predictive values (PPV and NPV) using a cut-off value for each risk score chosen such that 10% of the population had values above the cut-off; the procedure was then repeated using cut-offs where 20%, 80% and 90% had values above the cut-off. If data were available in the original published reports or from authors, we assessed calibration graphically by comparing the predicted risk with the observed percentage of those who developed CRC over the five-year follow-up period stratified by deciles and calculated Hosmer-Lemeshow statistics. QCancer10 was the only model to provide data on five year risk. All the other models predicted risk over 10 or 20 years and this required converting the predicted risks to risks over five years. We did this first assuming a constant risk overtime as the rate of incident CRC observed within the UK Biobank cohort was constant over the followup period. We then repeated the analysis assuming risk doubles every five years, in line with reported increasing incident rates with increasing age. To allow comparison across all the models we also used this same approach for the QCancer10 model. 

Sensitivity analyses

We carried out a number of sensitivity analyses. In the first set we explored the impact of missing data, comparing the performance of the models using the complete case analysis with an extreme case in which risk factors with greater than 5% missing data were coded as the 90th or 10th percentile values for continuous variables and present or absent for dichotomous. Secondly, in view of the absence of data on historic aspirin or NSAID use and inability to distinguish between oestrogen-containing contraceptive pills and progesterone only pills, we assessed the performance of the models excluding variables for aspirin NSAIDs or hormonal medication. Thirdly, recognising that these models may be used in multiple countries, we assessed the performance of the QCancer10 model for men without the term deprivation. As participants with previous colorectal polyps or a diagnosis of inflammatory bowel disease (IBD) would likely be in surveillance programmes, we also assessed the discrimination after excluding those individuals with a history of a colorectal polyp or diagnosis of IBD at baseline. Finally. we compared the performance of the risk scores using an open cohort design, that is to say including participants with less than 5 years follow-up. In that analysis we used Harrell’s C-statistic to assess discrimination. All analyses were carried out in Stata 13.1(StataCorp, 2013).


Of the 502,633 participants within the UK Biobank cohort, 2,268 had a prior diagnosis of CRC and 127,198 did not have follow-up for five years. We therefore included 373,164 participants in our primary analysis. Amongst those there were 1,719 (0.46%) cases of incident CRC. Discrimination The performance of the risk models varied substantially. In men, the QCancer10 model and models by Tao, Driver and Ma all had an area under the receiver operating characteristic curve (AUC) between 0.67 and 0.70. Discrimination was lower in women: the QCancer10, Wells, Tao and Ma models were the best performing with AUCs between 0.64-0.66. 

Sensitivity and specificity 

By targeting the 10% with the highest risk, the QCancer10, Tao, Ma and Wells models identified 24% of men and 19% of women who went on to develop colorectal cancer. Among those with the highest 20% risk, this increased to 37% to 43% for men and 33% to 37% for women, compared with 31% for the UK screening programme age threshold. The negative predictive values were high and comparable (>99.4) for all models.


Assessment of calibration was possible for six models in men and women. It was sensitive to assumptions about the change in risk over time, with all models overestimating risk when risk was assumed to be constant over time and estimated risks more closely matching observed risk when risk for each individual was assumed to double every five years. All would require country-specific recalibration if estimates of absolute risks were to be given to individuals.

The results from the all the sensitivity analyses were consistent with the main analysis.


This study shows that the performance of published risk models varies substantially with several risk models based on easily obtainable data, such as age, sex, BMI, smoking, alcohol consumption and physical activity, having relatively good discrimination and accuracy in a UK population. Using the QCancer10 model, for example, the data from this study estimates that the top 10% would include 24% of men who later go on to develop colorectal cancer, and the
top 20% would include 43%. The QCancer10 model includes variables available within routine electronic health records and so would not require additional data collection if access to those records could be used to identify those eligible for screening. The model by Driver also contains variables that would be available within routine health records or easily obtainable (age, BMI, smoking status, alcohol consumption). The discrimination and sensitivity are slightly lower than QCancer10 but the advantage would be simplified data collection or extraction and this may be preferable particularly in health systems where less data are routinely collected. To our knowledge this is the first study to directly compare multiple published risk prediction models for colorectal cancer in the same population, and the first to externally validate any risk prediction models in a UK population. Advantages of using the UK Biobank cohort include the large size, comprehensive phenotyping, completeness of data, and linkage to national cancer registries. However, the response rate to invitations to take part was only 5.5%. While the cohort is representative of the UK general population with respect to age, sex, ethnicity, and deprivation within the age range recruited, it is however not representative with respect to a variety of sociodemographic, physical, lifestyle and health-related characteristics. The performance of the risk models in this study may, therefore, not reflect those in the entire UK population. The relatively short duration of follow-up to date within UK Biobank also means that we were only able to evaluate calibration with estimates of risk over a five-year period. Additionally, we had to derive proxy variables where there were no exact matches for many of the risk models. In most cases we were able to do this by simply combining existing variables within the UK Biobank cohort, but some, notably aspirin/NSAID use and oestrogen use, required a number of assumptions which may have reduced the estimates of performance. The findings must therefore be interpreted in the context of these limitations.

Modelling studies are now needed to assess the extent to which using these risk prediction models in place of the current age-based criteria might improve efficiency of colorectal cancer screening programmes and to allow recommendations about different tests, screening intervals, preventive advice, treatment, or age of onset of screening based on modelled risk. Implementation studies, ideally randomised controlled trials, are then needed to assess the
feasibility of obtaining the risk factor data for each individual, the acceptability of incorporating a stratified approach and potential benefits and adverse consequences of incorporating such an approach into practice.

Plain English summary


Bowel cancer is one of the most common causes of cancer-related death in the UK. Finding the disease earlier makes it easier to treat and improves survival, so many countries, including the UK, invite people for screening to try and pick up those with early signs of cancer. These work well but at the moment people receive invitations to screening based on their age. Being able to group people depending on their risk of developing bowel cancer may help make screening better by changing the age at which people are invited, the type of screening test they are offered, the time between screening tests, and maybe the offer of medicines to reduce risk. In work we have already done, we found 52 ways, called models, of working out a persons’ risk of developing bowel cancer. Only one was developed in the UK and none have been tested in the UK. In this research we used information/data from the UK Biobank study to test 14 of these
models and see how well they could find people in the UK who go on to develop bowel cancer. The UK Biobank includes over half a million people who had an initial assessment in 2006-2010 and agreed to be followed up for many years.


How good these models were at finding those who later developed bowel cancer varied a lot. In men, the QCancer10 model and models by Tao, Driver and Ma were reasonably good at finding people who were more likely to develop bowel cancer in the future. The QCancer10, Wells, Tao and Ma models were the best in women but all were slightly less good than the models in men. When we compared the risk calculated by the models to the actual risk
among the people in the UK Biobank study, the risk calculated by all the models was higher than the actual risk. The risk models would therefore need to be adjusted if they were going to be used to tell people their risk of developing bowel cancer.


Several risk models based on easily obtainable data are relatively good at identifying which people are more likely to go on to develop bowel cancer in a UK population. Further studies are now needed to work out the benefits and cost of including one of these models into the bowel cancer screening programme.


  • Usher-Smith JA, Harshfield A, Saunders CL, Sharp SJ, Emery J, Walter FM, Muir K, Griffin
    SJ. External validation of risk prediction models for incident colorectal cancer using UK
    Biobank. British Journal of Cancer doi: 10.1038/bjc.2017.463
  • Usher-Smith JA, Harshfield A, Saunders CL, Sharp SJ, Emery J, Walter FM, Muir K & Griffin SJ External validation of risk prediction models for colorectal cancer using UK Biobank British Journals of cancer 118, 750-759 (06 March 2018)

Public involvement

We invited a PPI member to join this study at the beginning of the project. Although we updated her throughout the project, as this research principally involved using data that has already been collected to validate existing risk prediction models her role was mainly in supporting the interpretation and dissemination stages of the research. To support that she came to study analysis meetings and commented on the manuscript prior to publication. She also provided helpful comments on the plain English summary included in this report.


The findings from this research are directly informing a BMJ Rapid Recommendation on colorectal cancer screening due to be published in 2018. As a result of this research we have also been awarded further funding from Bowel Cancer UK to conduct a similar analysis including risk models that incorporate genomic markers and to model the potential impact of introducing risk stratification using phenotypic or genomic information into the current English bowel cancer screening programme (Grant reference 18PG0008: PI Usher-Smith, £64,995 over 12 months). Link:

This project was funded by the National Institute for Health Research School for Primary Care Research (project number 249 )

Department of Health Disclaimer

The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the NIHR School for Primary Care Research, NIHR, NHS or the Department of Health.