Meta records for prosecution related statistics

HM Crown Prosecution Service Inspectorate did not have the information requested.

Dear HM Crown Prosecution Service Inspectorate,

I am interested in knowing the patterns of prosecution of all the various offences prosecuted by CPS; the effects of different sentencing decisions on recidivism; differences in sentencing for pleas vs convictions; changes over time; differences in prosecution rates and outcomes by locality; rates of successful prosecutions depending on various factors; etc.

My research interests include cross-linking and complex analyses such as "recidivism by charge and sentence, controlled for all other factors", "ultimate outcome by initial charge", "sentence severity by post code and ethnicity, controlled for charge severity", "prosecutorial/judicial decisions by economic efficiency / net societal benefit", "likelihood of pleading guilty by pre conviction detention duration, controlled for likelihood of conviction at trial based on all other factors", "primary (eigenvector/regression) factors for likelihood of plea/conviction/sentence/recidivism", "categories of LiP cases by greatest difference between LiP conviction rate/sentence vs likely conviction/sentence if represented by counsel", "effectiveness of prosecution/defence counsel controlled for difficulty of case", "effective causes of general deterrence / specific deterrence / rehabilitation / restoration", etc.

# Basic approach of this request

I would like to obtain — in a later request — the raw data necessary to independently make such analyses. They are broadly similar to, but superset of and not covered by, the many statistical publications by the Government and Judiciary relating to offences.

I am first making this request under the FOIA with a view to understanding what records you have that would be most relevant, obtainable, and useful to my research — and in what formats, with what searchability / redactability / aggregablity, with what redundancy with data already publicly available, etc. — in order to later make a separate FOI request for the actual data.

# Intended database to be compiled by me

Ideally, as an outcome of my later request — including requests to other entities for related or linkable data — I would like to be able to take data from you and others and use it to synthesise, by my own additional efforts, a fully structured database amenable to analysis which can answer my research questions above (among others).

My intention is to make my own resulting work freely available to the public.

At present, I expect I would need something roughly like the following:

* for each case brought by CPS (i.e. ever seen by a court or defendant, regardless of later changes), to the entire extent records exist in reasonably accessible form, excluding cases that are pending:
- the associated case ID numbers
- the date first brought
- the CPS organisational subdivision responsible (e.g. by locality, specialty, etc)
- the firm, name & bar number of each prosecution and defence counsel
- the trial and sentencing court(s), and names of judicial officer(s), before whom it was brought (these may be different in the case of e.g. a magistrates' court holding trial but moving the case to crown court for sentencing)
- for each defendant:
-- non personally identifying demographic details (e.g. their age, gender, ethnicity, immigration status, first part of post code, etc. at time of offence) and any other statistically aggregable details (e.g. not an English speaker, drug addiction, mental health issues, homeless, etc)
-- some kind of uniquely identifying information (like a hashed identification number) that would enable connecting different cases involving the same defendant while minimising personally identifying data (like name and address) which may be subject to restrictions or should be abonymised due to a sentence being "spent"
- the full list of specific charges (i.e. the most specific formal definition, like a sub-sub-paragraph of a particular act, or particular common law offence, as it would appear in a charging document), including any offense whatsoever (e.g. fines or other minor cases, summary only, either way, indictment only), and for each charge:
-- if not in the first charging document, the date first brought in this case (e.g. a superseding indictment)
-- the list of aggravating or mitigating factors alleged which are amenable to categorisation or discretisation (i.e. primarily any category or list entry set forth in the Equal Treatment Bench Book or Sentencing Council Guidelines' sections on culpability, harm/severity, aggravating/mitigating factors list, etc.), preferably using a standardised code such as exact statutory reference, culpability × harm category (e.g. "A2"), etc.
-- the sentence initially sought by prosecution (including whether concurrent or consecutive, etc.)
-- the post code in which the offence was alleged to have occurred
-- the dates of trial and of sentencing (if any)
-- the outcome of this charge (e.g. judgment in default, dropped due to plea, pled guilty, pled not guilty found not guilty, pled not guilty found guilty, pled not guilty found no case to answer [e.g. on motion to dismiss mid or pre trial], nolle prosequi, Mental Health Act committal, deportation, etc.)
-- the culpability/harm category, aggravating/mitigating factors, etc. (as above), as determined by the court
-- the actual sentence or other terms imposed by court (including e.g. bail amount and conditions; deferred or suspended sentence; early plea discount; sentencing conditions; custodial sentence duration and terms; various forms of discharge and binding over; various forms of community service, ASBO/CBO, restitution, victim compensation, programme participation, and other non custodial sentences; etc.)
-- the actual sentence served on this charge (e.g. due to early release on parole, for good behaviour, etc.)
-- if initial outcomes involved something whose outcome could only be known later (e.g. conditional bail, deferred or suspended sentence, rehabilitation programmes, etc), the ultimate outcome of that (e.g. remand to custody for violation of conditions, successful or non successful completion of rehabilitation programme, finishing suspended period without further offences and therefore sentence being withdrawn, offence being spent, etc)
- whether this case was appealed, and if yes, with what outcome
- the gross costs incurred in prosecuting this case (separate from HMCTS or Judiciary costs), and any offsets that change the net cost of prosecution (e.g. as a result of orders requiring the defendant to pay costs or restitution)
… etc.

# Considerations as to your data

I expect that the information I seek is likely to not be found in a single database, but rather scattered across several different ones, with some codes by which they can be linked. For instance, a lawyer on a case may be linked by bar number, with a separate database of firm affiliation; a charge may be listed by code, which needs to be cross referenced in a lookup table; an appeal may be in a different database, linked by a reference to the lower court case ID; recidivism would likely need to be found by linking cases via a defendant identification number; rehabilitation programme participation and outcomes may be linked to an inmate number rather than an identifier that can link different cases; etc.

I also expect that some of this information may be only found in records that are not amenable or available to statistical analysis — e.g. paper, sealed, privileged, etc. As my interest is statistical analysis, and I do not wish to contest claims of seal or privilege, these would not be of interest to me.

I would prefer to minimise any data which is or may be subject to a right to privacy, such as the identity of defendants whose conviction has been or may yet be spent, or identity of victims — but will need at least some way to link separate cases involving the same person.

# Request

Therefore, under the FOIA, I respectfully request that you give me a full description of and technical information about all bulk data sources ("relevant data") that
a) are reasonably likely to assist me in compiling, by my own additional efforts, any part of my stated database plan, or
b) are otherwise reasonably relevant to my stated research interests.

This includes any such information available or known to you, regardless of whether you would hold the relevant data. I will follow up separately with whoever does hold it, and would like to have all information that would assist me in doing so. I understand and accept that your information about others' data is not definitive and may be inaccurate — and I request it anyway, as I believe it would be informative.

This also includes any such information about relevant data where the data is already publicly available — both as signposting, and because you likely have additional unpublished information which would make the publicly available data substantially more useful.

# Limitations and clarifications

Please note that this request does not ask for the actual content of any data sources (but, as below, does cover manuals, technical documentation, and similar descriptive information). In this request, I simply want to understand what data is available — with particularity enough to assess its utility for my research, how to limit your costs, how I may need to integrate multiple data sources, etc. In particular, my intent now is *not* to request the underlying raw data; I will request it in future after first understanding what is available and any considerations that would best optimise the effectiveness of that request.

Consistent with the second paragraph of my request, I specifically request that you not delegate this request to other entities if doing so would cause your costs of response to exceed the reasonable limit. If the relevant data is not held by you, please limit your response to your own knowledge or records — though this includes your knowledge/records about others' data.

I particularly request that your response cover, to the best of your knowledge and records you hold,

A. any relevant data which is stored in a bulk-processable database format, including technical formats such as SQL, JSON, CSV, etc and statistical or research formats such as created by R, MATLAB, etc.;

B. any relevant data which was compiled for any similar research (e.g. the data actually used as input by the researchers, not just the derivatives thereof which are the outcomes of that research);

C. details of relevant data's database/spreadsheet format, structure, enums, query functionality (for FOIA purposes), content, data quality, coverage, limitations, etc., to assist me in understanding what is obtainable within the reasonable costs limit that would best serve my research interests;

and conditionally:

D. if you know of any other entities that hold relevant data for the purposes of the FOIA (even if you also hold the data):
i. the identity of the entity
ii. what data you believe that entity holds
iii. what you believe that entity names that data, where you believe it would likely be found or filed, and any other information you have that would aid me in requesting (or aid that entity in finding) that data

E. if you do hold relevant data for the purposes of the FOIA, or if you would be a source of any exemption (even if held by another entity):
a description of any concerns or blanket exemptions which would likely be raised against release under the FOIA, including what parts of what data would be affected to what degree (considering e.g. the options of partial release, release only in certain aggregates, release with randomly altered or hashed data so as to protect privacy without altering statistical analyses, etc);

N.b. on this item: I of course do not expect you to commit to any particular decision as to a future request. Rather, I am asking you to assist me in anticipating and understanding any concerns likely to be arise, and possible mitigations to allay them.

F. to the extent it would stay within the reasonable costs limit while answering the above:
i. all meta documentation — technical documentation, manuals, data quality assessments, usage notes, data schema, or the like — about the relevant data, particularly documents that are primarily used by researchers, statisticians, or database administrators
ii. if providing the meta documentation requested in the preceding paragraph would exceed the reasonable costs limit, then instead please provide as much information about such meta documentation as you can while staying within the costs limit — e.g. document titles, authoring bodies/teams, linked data sets, number of pages/bytes, format, date/revision, location within your system of records, and the like

# Form and format

If a responsive record is already publicly available online, I request that you please provide a direct link to it and reference to the section (e.g. paragraph #, page #, text snippet) that addresses this.

If it is not, I request that you please provide it in its original, native, electronic format, as stored on your computer system.

# Likely sources of responsive information

I appreciate that this request likely appears complex. However, I do not here actually request a large number of records, nor from a large number of teams.

My expectation is that you maintain only a few databases with relevant data, each of which will have some schema and descriptions as well as a handful of meta documents. I likewise expect that all information I request here is held by only a couple teams — namely, your research/statistics team, and your IT/database management team. They too are the ones I believe most likely to know about other sources of relevant data.

I expect that other researchers, both internal and external, will have conducted closely related analyses before — and therefore that you likely already have all the information requested already collected. What I requested is what any researcher or statistician would need to know — for properly citing, differentiating, or analysing prior work; assessing whether they can answer a question put to them; discovering new research possibilities; formulating a research plan and writing the sources, methods, & background sections of a publication; merging and normalising raw data into a useful format; compiling and analysing their work; and obtaining peer review.

In short, it is just the basic necessities for anyone beginning a research project in this area.

# CPSI vs CPS

"You" in this request means CPSI, not CPS.

I understand that CPSI and CPS are separate entities. I am also making a substantially identical request to CPS.

This request is specifically made to CPSI, because I believe your core functions as an inspectorate are likely to have resulted in you having compiled, researched, audited, reported upon, or otherwise dealt with relevant data in ways that CPS itself may not have done.

# Closing

As I do not know what relevant data is potentially available, my request is stated broadly. This is primarily a request for your advice and assistance — in making a future FOI request, and in understanding data sets that would be most beneficial to my research. As this is a core public function under s. 16 FOIA, I trust that you will interpret it accordingly.

If responding to this request would exceed your reasonable costs limit despite all the limitations above, please provide a specific proposal for a modification of my request which would not exceed the limit while being most beneficial to my stated objectives (to the best of your judgment) — i.e. if you would otherwise give a denial on the basis of costs, please instead give me a clarifying response to which I can simply say "yes, I agree with your proposed revision" without having to guess at what is or isn't reasonably feasible for you.

Thank you in advance for your time and consideration.

Sincerely,
Sai

PS Sai is my full legal name. I do not use a title.

Info (HMCPSI), HM Crown Prosecution Service Inspectorate

Thank you for contacting HM Crown Prosecution Service Inspectorate
(HMCPSI). Please accept this email as receipt of your correspondence.

Where a response is required, we aim to respond to you within 20 working
days.

We may not respond if your query:
• contains offensive language
• has already been answered in a previous reply to you
• is illegible
• is selling or promoting a product
• is for information only

Privacy notice:
Our privacy notice for how we handle your personal information when you
contact us can be found on: Privacy policy (justiceinspectorates.gov.uk)

show quoted sections

Info (HMCPSI), HM Crown Prosecution Service Inspectorate

Dear Mr Sai

Thank you for your information request which we received on 11th April 2023

Your request is being processed as a Freedom of Information request reference 001/23

The FOI Act is a public disclosure regime, not a private regime. This means that any information disclosed under the FOI Act by definition becomes available to the wider public.

There is a 20-working day limit in which we are required to respond to requests.

The deadline for your request is 8th May

Kind Regards
Anisa

Anisa Bega
Business Services
7th Floor, Tower
102 Petty France
London SW1H 9GL

http://www.justiceinspectorates.gov.uk/h...

show quoted sections

Info (HMCPSI), HM Crown Prosecution Service Inspectorate

2 Attachments

Dear Sai

 

Please find attached response to your FOI request.

 

Thank you

 

Kris

 

 

[1]HMCPSI Logo RGB  

  Kris Cottle

Senior Management Team Support

Business Services

7th Floor, Tower

102 Petty France

London SW1H 9GL

 

[2]www.justiceinspectorates.gov.uk/hmcpsi

 

 

 

 

 

 

 

 

 

 

 

show quoted sections

Dear HMCPSI,

Thank you for your response re ref 001/23.

I respectfully request internal review, as your response does not address the substance of my request, and does not fulfil the duty to describe information held by others (see e.g. paragraph 2.12 of the s 45 Code of Practice).

> As part of our duty to provide advice and assistance I can tell you that all data that HMCPSI collects in relation to an inspection, is destroyed on publication of the report.

1. You state that "data that HMCPSI collects" is destroyed. However, that is inapposite to my request.

My request expressly excluded actual underlying data — see the first line of my clarification, "Please note that this request does not ask for the actual content of any data sources". Any underlying data HMCPSI collected would not be within the scope of my request to begin with.

Rather, my request is entirely about "meta records" — i.e. what you know about available data sources, how they can be analysed, how they are organised or structured, how they can be searched, etc.

Please note the first line of my request: "I respectfully request that you give me a full description of and technical information about all bulk data sources ("relevant data")". I asked for *descriptions of and technical information about* relevant data, not the data itself.

I requested that you "cover", i.e. describe, the kind of data listed in clarifications A & B, not that you provide the actual underlying data listed in them. See e.g. clarifications C, E, & F, describing in detail several kinds of meta records which I request — and, by alternative in F(ii), requesting meta-meta records, namely a list of such documentation.

Given that HMCPSI conducts inspections and collects research data on a regular basis, I would expect that quantitative researchers and technical staff in HMCPSI would retain records *about* that data — what kinds of questions can be answered from available data, what data they may need to query, how it can be queried, what is available from where, how it is structured, what extract/transform/load operations need to be run (particularly if data from multiple sources need to be synthesised), data quality issues, schemas, etc. It seems rather implausible to me that your research and technical staff would destroy all such metadata and, every single time they begin an investigation, start from zero knowledge or documentation of what data is available for their research and how it can be used; how to ingest, process, or analyse it; etc.

You did not state that HMCPSI destroys all descriptions of or technical information about relevant data, and your response does not address meta records at all. It misses the substance of my request, and seems to indicate that you fundamentally misconstrued my request.

Therefore, I respectfully request that you please address directly whether or not you hold any "description of" or "technical information about" any "relevant data", as described in my request — and that if you do hold such "meta records", that you please provide them.

2. Is any data about reports prior to publication available under the FOIA?

For instance, your link references four inspections "in progress":
* Serious Fraud Office (SFO) Case Progression follow-up inspection, https://www.justiceinspectorates.gov.uk/...
* Joint inspection on meeting the needs of victims in the CJS, https://www.justiceinspectorates.gov.uk/...
* Complaints Handling Inspection, https://www.justiceinspectorates.gov.uk/...
* County lines and the national referral mechanism Inspection, https://www.justiceinspectorates.gov.uk/...

Any data collected by "in progress" inspections would not seem to fall within your statement that "data … is destroyed on publication of the report".

It is data you surely do currently hold — and, given that you are an inspectorate conducting research, you surely also hold meta records about that data, as meta records are a prerequisite to (and necessary byproduct of) conducting any analysis or preparing any report that draws conclusions from that data.

For example, your "in progress" documents seem to state that you hold a number of records which are relevant, quantitative data, or meta records about such data, and about which you are currently conducting analysis:
* SFO:
- "methodology": "the relevant parts of a sample of casework files", "SFO generated case management documentation such as the investigation strategy documents created at the outset and decision logs", "at least one of the five cases will be examined post charge", "relevant supporting processes and systems"
- "resources": "training of legal inspectors in relation to the digital systems, casework processes and recent reviews"
* victims:
- "delays in the CJS": "data includes: Crime recorded to police decision, Police referral to CPS decision to charge and, CPS charge to case completion at court"
- "methodology": "Report to the police, Pre-charge (to include out of court disposals), Post charge, Conviction and sentence, Post- conviction, Pre-release/ release from prison."
- "key activities": "data capture and analysis, and an assessment of data gaps and recording practices;
case file sampling;"
* complaints:
- "annex A": "Question set from 2018 inspection" (all questions being quantitative data)

I therefore respectfully request that you please tell me whether any data related to any "in progress" inspections is available under the FOIA (including data from prior inspections, as described in e.g. the complaints document Annex A), and whether you hold any meta records *about* that data (such as is the subject of my request) — and if yes, to the extent that it is relevant to my request, that you please provide such meta records.

> Our inspection reports do include some data that may be useful to your research. Under exemption 21, we do not have to supply this to you as this information is accessible by other means ie on our website and the link is attached. https://www.justiceinspectorates.gov.uk/...

3. Could you please provide more specific reference(s) to the most relevant reports?

The link you provided lists around 430 HMCPSI reports without distinction.

That is a great many, and most seem to be largely irrelevant to my research.

The majority seem to do with high level strategic, logistical, or operational matters, mechanics of how CPS representatives coordinate with others, and the like — rather than statistics or evidence based analyses of sentencing efficacy, differential treatment or outcomes, etc., as I explained about my research interest. While I have the greatest respect for such important and necessary work, I have no personal interest in the administration of prosecutorial authorities, which seems to be the primary subject of most of your reports.

Likewise, a great deal of your reports are based on purely qualitative data, such as interviews, fieldwork, freeform documents, etc. As I stated in the background section of my request: "I also expect that some of this information may be only found in records that are not amenable or available to statistical analysis — e.g. paper, sealed, privileged, etc. As my interest is statistical analysis, and I do not wish to contest claims of seal or privilege, these would not be of interest to me." Qualitative data such as ad hoc interviews, fieldwork, and the like are not amenable to statistical analysis, and therefore out of scope for my request.

However, some of your collected data are in scope — see e.g. above re "in progress" inspection re complaints, annex A, which appears to describe quantitative data which you hold and did not destroy, contrary your response. That annex is itself one example of a meta record, as it describes relevant data amenable to statistical analysis. I expect that you also have more detailed meta records about that data, e.g. schema, non-boolean response options, aggregate statistics and counts, quality assessments, etc. And it cannot be the only time you have relied on quantitative data.

I am interested in quantitative analysis of the criminal justice system from a perspective of judicial decision-making, public interest, causal attribution, outcomes, and the like, as I described. I believe that some of your investigations and reports do address my interests (because they are based on or make reference to quantitative data that would allow me to do my analyses). As to your published reports, I am only interested in that subset.

I cannot feasibly read through several hundred lengthy reports, the great majority of which is irrelevant to me. You know your own work much better than I, and your research staff know how your research is or was conducted and thus what inspections or reports are most relevant. In particular, you surely employ researchers who specialise in quantitative research, data analysis, or statistics — and those staff could surely identify which of your reports are based on or address the most substantial quantitative bulk data.

See also e.g. my request about such "already published" responses: "If a responsive record is already publicly available online, I request that you please provide a direct link to it and reference to the section (e.g. paragraph #, page #, text snippet) that addresses this." Your link to the list of all reports issued by HMCPSI in the last 25 years, which intermixes largely irrelevant content with some content that is relevant, is not a "direct" link to the responsive record(s).

I therefore respectfully request that you please identify which particular reports are responsive to my request and most relevant to my research interests.

4. Could you please tell me what other authorities you believe are likely to hold relevant data, what data you believe they hold, and any information you have that would aid me in identifying it as the subject of an FOI request?

This was requested in paragraph 2 of my request:
> This includes any such information available or known to you, regardless of whether you would hold the relevant data. I will follow up separately with whoever does hold it, and would like to have all information that would assist me in doing so. I understand and accept that your information about others' data is not definitive and may be inaccurate — and I request it anyway, as I believe it would be informative.

clarified further:
> If the relevant data is not held by you, please limit your response to your own knowledge or records — though this includes your knowledge/records about others' data.

… as well as all paragraphs of my request, clarified further as e.g.:

"D. if you know of any other entities that hold relevant data for the purposes of the FOIA (even if you also hold the data):
i. the identity of the entity
ii. what data you believe that entity holds
iii. what you believe that entity names that data, where you believe it would likely be found or filed, and any other information you have that would aid me in requesting (or aid that entity in finding) that data"

… to which you have not responded.

You acknowledge that HMCPSI does "collect" data, so you must know what data is available and from whom — i.e. at minimum, you hold the information described in clarification D (and pragmatically, you must also hold the information described in clarifications C, E, & F, as addressed above).

I therefore respectfully request that you please provide specific information about relevant data held by other entities, as I requested.

Please let me know if I can be of any assistance in further clarifying the purpose or scope of my request about meta records — particularly if you have any inclination to interpret this as requesting any underlying data that you have collected, as that is precisely what this request is *not* about.

Sincerely,
Sai

Info (HMCPSI), HM Crown Prosecution Service Inspectorate

1 Attachment

Dear Sai

Please see the attached response to your request for an internal review.

Kind regards

Carmel Vega

show quoted sections