CLML (Crown Legislation Markup Language)

We manage legislation texts as data using the Crown Legislation Markup Language (CLML), in XML. This is the most accurate data we hold, capable of being transformed to a variety of other formats. The base format for most legislation content on the website is XML conforming to the CLML schema.

You can access the XML data on the website via the legislation API by adding "/data.xml" to any legislation content page URI e.g. the XML version of the page http://www.legislation.gov.uk/ukpga/2010/1 is http://www.legislation.gov.uk/ukpga/2010/1/data.xml

The CLML mark-up standard was designed to represent UK legislation in XML and is owned and maintained by The National Archives. As a result, this format contains the most complete semantic information. However, as it needs to represent a huge variety of legislation from 1200s to the modern day it is quite complex.

This format will be most useful to you if you want the most complete semantic information available for data analysis. CLML contains very little information relating to presentation of text. You can access the CLML schema online at http://www.legislation.gov.uk/schema/legislation.xsd

 

HTML5

The HTML5 legislation format is an alternative presentational view to the XHTML version. Legislation data is available from the API or to download in HTML5 or XHTML. For most purposes we recommend using the HTML5 data. There are style sheets available for presenting the texts.

You can access the HTML5 data on the website via the legislation API by adding "/data.xml" to any legislation content page URI e.g. the XML version of the page http://www.legislation.gov.uk/ukpga/2010/1 is http://www.legislation.gov.uk/ukpga/2010/1/data.xml

This format is a serialisation of the AKN XML. In other words, it’s derived from the AKN XML (rather than the CLML XML) and contains the full AKN XML mark-up in addition to the presentational information needed to display the text.

This format will be most useful to you if you want XML semantic mark-up and presentational information together.

 

Akoma Ntoso

Akoma Ntoso (AKN) is an international legal XML standard that represents format and structure but is not specifically designed for UK legislation. It is very easy for machines to read, and several tools are available to interrogate the data in this format. AKN has recently become an OASIS standard.

You can access the AKN XML data on the website via the legislation API by adding "/data.akn" to any legislation content page URI e.g. the AKN XML version of the page http://www.legislation.gov.uk/ukpga/2010/1 is http://www.legislation.gov.uk/ukpga/2010/1/data.akn

The AKN XML data is derived from the CLML XML but is somewhat simpler than the CLML XML version. However as the schema was not specifically designed for UK legislation, mark-up for some less common document structures may be more difficult to interpret.

This format will be most useful to you if you want to work with an XML data format but want something a little less complex than CLML, or if you want to relate UK legislation data to that provided by other international legislation publishers who use this format. AKN XML contains fairly rich semantic information but very little presentational information.

 

XHTML

XHTML is the default view for legislation content on the www.legislation.gov.uk website, e.g. http://www.legislation.gov.uk/ukpga/2010/1, and it is derived from the CLML XML.

This format will be most useful to you if you want to view or display the data as presented on legislation.gov.uk. XHTML contains the full information required to display legislation text but very little semantic information.

 

Plain Text (Complete)

Plain Text (Complete) versions of legislation are useful if you want to analyse words rather than structure. The “Plain Text (Complete)” legislation format contains only the text of the legislation itself. All XML tagging, metadata, semantic mark-up, formatting and code required to display the data has been removed.

This format will be most useful to you if you want to carry out text analysis of legislation documents and don’t want to process XML or HTML tagging.

 

Plain Text (Operative)

Plain Text (Operative) versions of legislation only include the operative text - the text that is legally binding. This format is useful if you want to analyse words rather than structure.

This format is similar to the "Plain Text Complete" format. Whilst the Plain Text Complete format contains all the legislation text this format only contains “operative” text, i.e. text with legal significance. Text that does not have legal value, e.g. section heading and preambles, has been removed.

This format will be most useful to you if you want to carry out text analysis of legislation documents using a Plain Text format that focuses on only legally significant text.

 

PDF

"As enacted" legislation is available to download as print PDFs. This is the version that was officially printed and made available for sale. This format consists of "Original Official Print" PDF versions of legislation documents that were published (as hardcopy) and laid before parliament (UK legislation).

This format provides the best version of legislation for hardcopy printing and most accurately reflects text, formatting and layout of the originally drafted legislation documents.

Newer PDFs will have searchable (extractable) text whilst older PDFs may not. PDFs for very old documents may be images of hardcopy legislation publications produced before the introduction of digital printing.

Website URIs for this PDF format do not follow the usual legislation API URI pattern e.g. http://www.legislation.gov.uk/uksi/2019/1464/pdfs/uksi_20191464_en.pdf