This is an HTML version of an attachment to the Freedom of Information request 'Privacy Issues'.



 
 
 
 
 
 
CPRD Gold Data Specification 
Version 1.3 
 
 
 
Author: Shivani Padmanabhan, CPRD, UK. 
 
 
 
 
 
 
 
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 1 of 14 

DOCUMENTATION CONTROL SHEET 
During the course of the project it may be necessary to issue amendments or clarifications to parts of this document.  This form must be updated whenever 
changes are made and should be filed inside the front cover of the new or amended document. 
 
Version  Affected Areas 
Prepared By   Date 
Reviewed By  
Date 
Summary of Change 
1.0 Initial 
Draft 
 
 
 
 
Shivani 
1.1 Modified 
01/06/09 Nick 
Wilson 
22/07/09 
Padmanabhan 
Shivani 
1.2 Modified 
28/07/09 Arlene 
Gallagher  30/07/09 
Padmanabhan 
Shivani 
1.3 Modified 
06/01/11 Nick 
Wilson 
07/01/11 
Padmanabhan 
 
 
 
 
 
 
 
SUMMARY OF CHANGES 
 
Version 1.1 
o  Refined wordings 
 
Version 1.2 
o  Acceptable field in Patient file equals 1 if patient is acceptable, else 0 (Lookup reference incorrectly labelled as Y_N in previous versions) 
o  UTS field in Practice file has been derived using a CPRD algorithm that looks at death recording at the practice, and gaps in the data (prior to August 
     2009, this field was populated with the practice UTS date as was generated in the old FF-CPRD system) 
o  Ndd field in the Therapy file has been populated for the most common occurring dosage strings in the data (field was set to ‘0’ prior to August 2009) 
o  Descriptions of all fields have been revised, for clarity 
 
Version 1.3 

Field name ‘attendtype’ in Referral table modified to ‘attendance’ 
 
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 2 of 14 

link to page 4 link to page 4 link to page 4 link to page 4 link to page 4 link to page 4 link to page 4 link to page 5 link to page 5 link to page 5 link to page 5 link to page 5  
Table of contents 
Dataset Format ............................................................................................................................................................................................................ 4 
Field descriptions......................................................................................................................................................................................................... 5 
 
 
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 3 of 14 

Dataset Format 
 
1.  The Patient file (PatientNNN.txt) contains basic patient demographics and patient registration details for the patients. 
2.  The Practice file (Practice001.txt) contains details of each practice, including region and collection information. 
3.  The Staff file (StaffNNN.txt) contains practice staff details, with one record per member of staff. 
4.  The Consultation file (ConsultationNNN.txt) contains information relating to the type of consultation as entered by the GP from a pre-
determined list. Consultations can be linked to the events that occur as part of the consultation via the consultation identifier (consid). 
5.  The  Clinical file (ClinicalNNN.txt) contains medical history events. This file contains all the medical history data entered on the GP 
system, including symptoms, signs and diagnoses. This can be used to identify any clinical diagnoses, and deaths. Patients may have 
more than one row of data. The data is coded using Read codes, which allow linkage of codes to the medical terms provided. 
6.  The Additional Clinical Details file (AdditionalNNN.txt) contains information entered in the structured data areas in the GP’s software. 
Patients may have more than one row of data. Data in this file is linked to events in the clinical file through the additional details 
identifier (adid). 
7.  The  Referral file (ReferralNNN.txt) contains referral details recorded on the GP system. These files contain information involving 
patient referrals to external care centres (normally to secondary care locations such as hospitals for inpatient or outpatient care), and 
include speciality and referral type. 
8.  The Immunisation file (ImmunisationNNN.txt) contains details of immunisation records on the GP system. 
9.  The  Test file (TestNNN.txt) contains records of test data on the GP system. The data is coded using a Read code, chosen by the GP, 
which will generally identify the type of test used. The test name is identified via the Entity Type, a numerical code, which is determined 
by the test result item chosen by the GP at source. There are three types of test records, involving 4, 7 or 8 data fields (data1 - data8). The 
data must be managed according to which sort of test record it is. Data can denote either qualitative text based results (for example 
'Normal' or Abnormal') or quantitative results involving a numeric value. 
10. The  Therapy file (TherapyNNN.txt) contains details of all prescriptions on the GP system. This file contains data relating to all 
prescriptions (for drugs and appliances) issued by the GP. Patients may have more than one row of data. Drug products and appliances 
are recorded by the GP using the Multilex product code system. 
  
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 4 of 14 

link to page 14 link to page 14 link to page 14 link to page 14 Field descriptions 
Full descriptions of fields in each file are provided in the tables below. All files can be linked using the encrypted patient identifier (patid). The 
last three digits of the patient identifier (patid), and staff identifier (staffid) denote the identifier of the practice (pracid) that the patient, or staff 
belongs to. The mapping column references information relating to the use of data in the field. It specifies lookup references, linkages to other 
tables, and information on decoding numerical values. A mapping of ‘None’ indicates the existence of raw data in the field. 
 
1.   Patient 
 
Column name 
Field name 
Description 
Mapping 
Patient Identifier 
patid 
Encrypted unique identifier given to a patient in CPRD GOLD 
None 
VAMP Identifier 
vmid 
Old VM id for the patient when the practice was using the VAMP system 
None 
Patient Gender 
gender 
Patient’s gender 
Lookup SEX 
Birth Year 
yob 
Patient’s year of birth 
Value + 1800 
Birth Month 
mob 
Patient’s month of birth (for those aged under 16). 0 indicates no month set 
None 
Marital Status 
marital 
Patient’s current martial status 
Lookup MAR 
Family Number 
famnum 
Family ID number 
None 
CHS Registered 
chsreg 
Value to  indicate whether the patient is registered with Child Health Surveillance 
Lookup Y_N 
CHS Registration Date 
chsdate 
Date of registration with Child Health Surveillance 
1
dd/mm/yyyy 
Prescription Exemption 
prescr 
Type of prescribing exemption the patient has currently (e.g. medical or maternity) 
Lookup PEX 
Capitation Supplement 
capsup 
Level of capitation supplement the patient has currently (e.g. low, medium, or high) 
Lookup CAP 
Socio-Economic Status 
ses 
Patient’s socio-economic status. Currently 0; to be populated in future builds 
2
PAT_SES 
Date the patient first registered with the practice. If patient only has ‘temporary’ records, the date is the first 
First Registration Date 
frd 
encounter with the practice; if patient has ‘permanent’ records it is the date of the first ‘permanent’ record  dd/mm/yyyy 
(excluding preceding temporary records) 
Date the patient’s current period of registration with the practice began (date of the first ‘permanent’ record 
Current Registration Date 
crd 
dd/mm/yyyy 
after the latest transferred out period). If there are no ‘transferred out periods’, the date is equal to ‘frd’ 
Registration Status 
regstat 
Status of registration detailing gaps and temporary patients 
3
PAT_STAT 
Registration Gaps 
reggap 
Number of days missing in the patients registration details 
PAT_GAP 4  
Internal Transfer 
internal 
Number of internal transfer out periods, in the patient’s registration details 
None 
Transfer Out Date 
tod 
Date the patient transferred out of the practice, if relevant. Empty for patients who have not transferred out 
dd/mm/yyyy 
Transfer Out Reason 
toreason 
Reason the patient transferred out of the practice. Includes 'Death' as an option 
Lookup TRA 
Death Date 
deathdate 
Date of death of patient – derived using a CPRD algorithm 
dd/mm/yyyy 
Acceptable Patient Flag 
accept 
Flag to indicate whether the patient has met certain quality standards: 1 = acceptable, 0 = unacceptable Boolean 
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 5 of 14 

RESTRICTED - COMMERCIAL 
2.  Practice 
 
Column name 
Field name 
Description 
Mapping 
Practice identifier 
pracid 
Encrypted unique identifier given to a specific practice in CPRD GOLD 
None 
Value to indicate where in the UK the practice is based. The region denotes the Strategic Health Authority for 
Region region 
Lookup PRG 
practices within England, and the country i.e. Wales, Scotland, or Northern Ireland for the rest 
Last Collection Date 
lcd 
Date of the last collection for the practice 
dd/mm/yyyy 
Date at which the practice data is deemed to be of research quality. Derived using a CPRD algorithm that 
Up To Standard Date 
uts 
dd/mm/yyyy 
primarily looks at practice death recording and gaps in the data 
 
 
3.  Staff 
 
Column name 
Field name 
Description 
Mapping 
Staff Identifier 
staffid 
Encrypted unique identifier given to the practice staff member entering the data 
None 
Staff Gender 
gender 
Staff’s gender 
Lookup SEX 
Staff Role 
role 
Role of the member of staff who created the event 
Lookup ROL 
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 6 of 14 

RESTRICTED - COMMERCIAL 
4.  Consultation 
 
Column name 
Field name 
Description 
Mapping 
Patient Identifier 
patid 
Encrypted unique identifier given to a patient in CPRD GOLD 
None 
Event Date 
eventdate 
Date associated with the event, as entered by the GP 
dd/mm/yyyy 
System Date 
sysdate 
Date the event was entered into Vision 
dd/mm/yyyy 
Consultation Type 
constype 
Type of consultation (e.g. Surgery Consultation, Night Visit, Emergency etc) 
Lookup COT 
Consultation Identifier 
consid 
The consultation identifier linking events at the same consultation, when used in combination with pracid 
Link Event tables 
Staff Identifier 
staffid 
The identifier of the practice staff member entering the data. A value of 0 indicates that the staffid is unknown 
Link Staff table 
Duration 
duration 
The length of time (minutes) between the opening, and closing of the consultation record 
None 
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 7 of 14 

RESTRICTED - COMMERCIAL 
5.  Clinical 
 
Column name 
Field name 
Description 
Mapping 
Patient Identifier 
patid 
Encrypted unique identifier given to a patient in CPRD GOLD 
None 
Event Date 
eventdate 
Date associated with the event, as entered by the GP 
dd/mm/yyyy 
System Date 
sysdate 
Date the event was entered into Vision 
dd/mm/yyyy 
Consultation Type 
constype 
Code for the category of event recorded within the GP system (e.g. diagnosis or symptom) 
Lookup SED 
Consultation Identifier 
consid 
Identifier that allows information about the consultation to be retrieved, when used in combination with pracid 
Link Consultation table 
Medical Code 
medcode 
CPRD unique code for the medical term selected by the GP 
Lookup Medical Dictionary 
Staff Identifier 
staffid 
Identifier of the practice staff member entering the data. A value of 0 indicates that the staffid is unknown Link 
Staff table 
Identifier that allows freetext information on the event to be retrieved, when used in combination with pracid 
Text Identifier 
textid 
Link Freetext 
and event type ‘Clinical’. A value of 0 indicates that there is no freetext information for this event 
Episode 
episode 
Episode type for a specific clinical event 
Lookup EPI 
Entity Type 
enttype 
Identifier that represents the structured data area in Vision where the data was entered 
Lookup Entity 
Additional Details 
Identifier that allows additional information to be retrieved for this event, when used in combination with 
Link Additional Clinical 
adid 
Identifier 
pracid. A value of 0 signifies that there is no additional information associated with the event. 
Details table 
  
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 8 of 14 

link to page 9 link to page 9 link to page 9 link to page 9 link to page 9 link to page 9 link to page 9 RESTRICTED - COMMERCIAL 
6.  Additional Clinical Details 
 
Column name 
Field name 
Description 
Mapping 
Patient Identifier 
patid 
Encrypted unique identifier given to a patient in CPRD GOLD 
None 
Entity Type 
enttype 
Identifier that represents the structured data area in Vision where the data was entered 
Lookup Entity 
Additional Details 
Identifier that allows information about the original clinical event to be retrieved, when used in combination 
adid 
Link Clinical table 
Identifier 
with pracid 

Data 1 
data1  
Depends on Entity Type 
Lookup Entity 
Data 2 
data2 
Depends on Entity Type 
Lookup Entity 

Data 3 
data3 
Depends on Entity Type 
Lookup Entity 

Data 4 
data4 
Depends on Entity Type 
Lookup Entity 

Data 5 
data5 
Depends on Entity Type 
Lookup Entity 

Data 6 
data6 
Depends on Entity Type 
Lookup Entity 

Data 7 
data7 
Depends on Entity Type 
Lookup Entity 

                                                 
♦ Each entity type may be associated with up to seven data fields. Content of each data field is dependent on the entity type – the fields may 
contain raw data values, or may be encoded values that represent dates, read codes, text etc. The file Entity.xls contains information on all 
entity types, and provides the number of data fields associated with the entity, description of the data in each field, and details of the lookups 
needed to decode the data. 
 
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 9 of 14 

RESTRICTED - COMMERCIAL 
7.  Referral 
 
Column name 
Field name 
Description 
Mapping 
Patient Identifier 
patid 
Encrypted unique identifier given to a patient in CPRD GOLD 
None 
Event Date 
eventdate 
Date associated with the event, as entered by the GP 
dd/mm/yyyy 
System Date 
sysdate 
Date the event was entered into Vision 
dd/mm/yyyy 
Consultation Type 
constype 
Code for the category of event recorded within the GP system (e.g. management or administration) 
Lookup SED 
Consultation Identifier 
consid 
Identifier that allows information about the consultation to be retrieved, when used in combination with pracid 
Link Consultation table 
Medical Code 
medcode 
CPRD unique code for the medical term selected by the GP 
Lookup Medical Dictionary 
Staff Identifier 
staffid 
Identifier of the practice staff member entering the data. A value of 0 indicates that the staffid is unknown Link 
Staff table 
Identifier that allows freetext information on the event to be retrieved, when used in combination with pracid 
Text Identifier 
textid 
Link Freetext 
and event type ‘Referral’. A value of 0 indicates that there is no freetext information for this event 
Source 
source 
Classification of the source of the referral e.g. GP, Self 
Lookup SOU  
NHS Speciality 
nhsspec 
Referral speciality according to the National Health Service (NHS) classification Lookup 
DEP 
FHSA Speciality 
fhsaspec 
Referral speciality according to the Family Health Services Authority (FHSA) classification Lookup 
SPE 
In Patient 
inpatient 
Classification of the type of referral, e.g. Day case, In patient 
Lookup RFT 
Attendance Type 
attendance 
Category describing whether the referral event is the first visit, a follow-up etc 
Lookup ATT 
Urgency urgency 
Classification 
of the urgency of the referral e.g. Routine, Urgent 
Lookup URG 
 
 
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 10 of 14 

RESTRICTED - COMMERCIAL 
8.  Immunisation 
 
Column name 
Field name 
Description 
Mapping 
Patient Identifier 
patid 
Encrypted unique identifier given to a patient in CPRD GOLD 
None 
Event Date 
eventdate 
Date associated with the event, as entered by the GP 
dd/mm/yyyy 
System Date 
sysdate 
Date the event was entered into Vision 
dd/mm/yyyy 
Consultation Type 
constype 
Code for the category of event recorded within the GP system (e.g. intervention) 
Lookup SED 
Consultation Identifier 
consid 
Identifier that allows information about the consultation to be retrieved, when used in combination with pracid 
Link Consultation table 
Medical Code 
medcode 
CPRD unique code for the medical term selected by the GP 
Lookup Medical Dictionary 
Staff Identifier 
staffid 
Identifier of the practice staff member entering the data. A value of 0 indicates that the staffid is unknown Link 
Staff table 
Identifier that allows freetext information on the event to be retrieved, when used in combination with pracid 
Text Identifier 
textid 
Link Freetext 
and event type ‘Immunisation’. A value of 0 indicates that there is no freetext information for this event 
Type immstype 
Individual 
components of an immunisation, e.g. Mumps, Rubella, Measles 
Lookup IMT 
Stage 
stage 
Stage of the immunisation given, e.g. 1, 2, B2 
Lookup IST 
Status status 
Status 
of 
the immunisation e.g. Advised, Given, Refusal 
Lookup IMM 
Compound 
compound 
Immunisation compound administered – may be a single or multi-component preparation, e.g. MMR 
Lookup IMC 
Source 
source 
Location where the immunisation was administered, e.g. In this practice 
Lookup INP 
Reason 
reason 
Reason for administering the immunisation, e.g. Routine measure 
Lookup RIN 
Method 
method 
Route of administration for the immunisation, e.g. Oral, Intramuscular 
Lookup IME 
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 11 of 14 

link to page 14 RESTRICTED - COMMERCIAL 
9.  Test 
 
Column name 
Field name 
Description 
Mapping 
Patient Identifier 
patid 
Encrypted unique identifier given to a patient in CPRD GOLD 
None 
Event Date 
eventdate 
Date associated with the event, as entered by the GP 
dd/mm/yyyy 
System Date 
sysdate 
Date the event was entered into Vision 
dd/mm/yyyy 
Consultation Type 
constype 
Code for the category of event recorded within the GP system (e.g. examination) 
Lookup SED 
Consultation Identifier 
consid 
Identifier that allows information about the consultation to be retrieved, when used in combination with pracid 
Link Consultation table 
Medical Code 
medcode 
CPRD unique code for the medical term selected by the GP 
Lookup Medical Dictionary 
Staff Identifier 
staffid 
Identifier of the practice staff member entering the data. A value of 0 indicates that the staffid is unknown Link 
Staff table 
Identifier that allows freetext information on the event to be retrieved, when used in combination with pracid 
Text Identifier 
textid 
Link Freetext 
and event type ‘Test’. A value of 0 indicates that there is no freetext information for this event 
Entity Type 
enttype 
Identifier that represents the structured data area in Vision where the data was entered 
Lookup Entity 
Data 1 
data1 
Qualifier 
Lookup TQU 
Data 2 
data2 
Normal range from  
None 
Data 3 
data3 
Normal range to  
None 
Data 4 
data4 
Normal range basis 
None 
Depending on the Test Entity Type, tests have either 4, 7, or 8 data fields 
Data 1 
data1 
Operator 
Lookup OPR 
Data 2 
data2 
Value 
None 
Data 3 
data3 
Unit of measure 
Lookup SUM 
Data 4 
data4 
Qualifier 
Lookup TQU 
Data 5 
data5 
Normal range from 
None 
Data 6 
data6 
Normal range to 
None 
Data 7 
data7 
Normal range basis (or peak flow device for entity type 311) 
Lookup POP (or PFD) 
For some test entity types (data 1 to data 6 same as above): 
Data 7 
data7 
Normal range basis 
Lookup POP 
Data 8 
data8 
Expected delivery date 
GEN_SDC 5
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 12 of 14 

RESTRICTED - COMMERCIAL 
10. Therapy 
 
Column name 
Field name 
Description 
Mapping 
Patient Identifier 
patid 
Encrypted unique identifier given to a patient in CPRD GOLD 
None 
Event Date 
eventdate 
Date associated with the event, as entered by the GP 
dd/mm/yyyy 
System Date 
sysdate 
Date the event was entered into Vision 
dd/mm/yyyy 
Consultation Identifier 
consid 
Identifier that allows information about the consultation to be retrieved, when used in combination with pracid 
Link Consultation table 
Product Code 
prodcode 
CPRD unique code for the treatment selected by the GP 
Lookup Product Dictionary 
Staff Identifier 
staffid 
Identifier of the practice staff member entering the data. A value of 0 indicates that the staffid is unknown Link 
Staff table 
Identifier that allows freetext information (dosage) on the event to be retrieved, when used in combination with 
Lookup Common Dosages 
Text Identifier 
textid 
pracid and event type ‘Therapy’. A value of 0 indicates that there is no freetext information for the event. Use 
Link Freetext 
the Common Dosages Lookup (constituting ~ 95% of dosage strings in data) to interpret values < 100,000 
BNF Code 
bnfcode 
Code representing the chapter & section from the British National Formulary for the product selected by GP 
Lookup BNFCodes 
Total Quantity 
qty 
Total quantity entered by the GP for the prescribed product 
None 
Numeric daily dose prescribed for the event. Derived using a CPRD algorithm on common dosage strings 
Numeric Daily Dose 
ndd 
None 
(represented by textid < 100,000). Value is set to 0 for all dosage strings represented by textid > 100,000 
Number of Days 
numdays 
Number of treatment days prescribed for a specific therapy event 
None 
Number of Packs 
numpacks 
Number of individual product packs prescribed for a specific therapy event 
None 
Pack Type 
packtype 
Pack size or type of the prescribed product 
Lookup PackType 
Number to indicate whether the event is associated with a repeat schedule. Value of 0 implies the event is not 
Issue Sequence Number 
issueseq 
None 
part of a repeat prescription. A value ≥ 1 denotes the issue number for the prescription within a repeat schedule 
 
 
 
 
 
 
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 13 of 14 

RESTRICTED - COMMERCIAL 
 
dd/mm/yyyy: Valid dates are in the format DD/MM/YYYY. Missing dates are NULL, and invalid dates are set to 01/01/2500 
 
PAT_SES: The Index of Multiple Deprivation (IMD) socio-economic status (overall measure as a quintile) will be implemented in the ‘ses’ 
   field of the Patient file for all patients belonging to English practices that have consented to linkage. The SES quintile will be calculated on the 
   basis of lower level super output area (LSOA). Townsend scores (also based on LSOA) will be made available for the same patients (as a 
lookup 
   file), on request.  
 
PAT_STAT: Transferred out period is the time between a patient transferring out and re-registering at the same practice. If the patient has 
   transferred out for a period of more than 1 day, and the transfer is not internal, this value is incremented. 0 means continuous registration, 1 
   means one ‘transferred out period’, 2 means two periods, etc. If the patient only has ‘temporary’ records then this value is set to 99. 
 
PAT_GAP: Number of days between patient’s transferred out date and re-registration date for the patient’s ‘transferred out periods’, regardless 
  of whether the transfer was internal or not. 
 
GEN_SDC: The date in dd/mm/yyyy format can be obtained as follows: 
          0 = An invalid/ missing date 
          2 = A date greater than 31/12/2014 
          3 = A date earlier than 01/01/1800 
   All other values = number of days between the date and the 31/12/2014 offset by 10. 
    
   Example:  A value of 4027 decodes to the date 01/01/2004. 4027 – 10 = 4017 days prior to the date 31/12/2014 is the date 01/01/2004  
 
Data Specification, Clinical Practice Research Datalink, MHRA 
Page 14 of 14 

Document Outline