Handling Complex Text

Legislation, contracts, specifications use Complex Text as a way of packing in a great deal of information, making them hard to read. This often has the result that the people who should read them don’t read them, put off by the complexity and the legalese, potentially leading to disastrous consequences.

 

The first thing one notices about Complex Text is that it is intimidating. Complex objects are described in the text – references are made to text tens or hundreds of pages back, or tens or hundreds of pages forward, often using bullets – see Section 2, paragraph 5.3(a),  – there probably is a glossary – references are made to other documents, which may use different meanings for some of their more technical words, and the document may be frequently revised. And yet you need to be constantly au fait with the whole thing. 

 

Factor in the Four Pieces Limit (see FourPiecesLimit.com for an introduction) and it seems impossible. One way around it is to turn the text into an Active Structure – a machine reads the text and the words are built into a structure with the exact meanings they are intended to have. Many common words in English have multiple meanings – ”run” has 82 definitions, “set” has 62, “on” has 77, making it easy to jumble things up if one is not sure about the exact definition, particularly after a while, when one is trying to map the legislation onto something that may not have been envisaged when it was written.

 

We need to make it clear there will be some work upfront – a lawyer will latch onto a term and make it their own, while a technologist will latch onto the same term and think it means something else – there is often the situation where people are talking past each other. 

 

Did we mention the problem with people of different specialties – an economist and an epidemiologist have very little common vocabulary, and yet they need to come to a consensus to have any hope of persuading a politician. Having a machine which understands the vocabularies of both is a good foundation for that consensus. Getting people to agree on meanings of words and groups of words initially is a good way of avoiding trouble down the track, where people have been pulling in different directions without realising it.

 

Having a machine read the text to build a structure that embodies their precise meanings and their connections to avoid confusion would be a good enough reason to do it, but there is another significant benefit. The structure that the machine builds can be activated – the words now function as pieces of machinery.

Here is a fragment of the Anti-Money Laundering and Counter-Terrorism Financing Act, where a complex object is being defined.

 

The AML/CTF Act describes a particular transaction:

 

 

Same-institution same-person electronic funds transfer instruction is treated as a wordgroup, in the same way that “ambient temperature” is treated as a wordgroup. Many wordgroups don’t need definitions, as it is sufficient to link them to the precise meanings of the words used. Some, like “same-institution …” do have definitions, and there are wordgroups in those definitions, like “bank account”, which again has a definition, as does “payer”. The result is that the machine has the definitions of all the pieces of machinery  involved, and can use this machinery to simulate exactly what was envisaged in the Act. It can either be used to check that the Act is complete and consistent, or to develop and test strategy on a working model, or where limited volumes are involved, it can be used operationally to monitor and assess transactions in a more thorough way than a program could. The Act uses the word “reckless” (a bank faces large fines if it is reckless) – it would be excruciating to try to get a program to faithfully implement that, whereas an Active Structure already has the definition for “reckless”.

But What About …

Active Structure is a unique product from one supplier. Machine Learning (ML), Deep Learning (DL), Large Language Models (LLM), ChatGPT are in the news – can’t we use one of them? A piece of legislation is a unique world unto itself. The definitions it uses create the law – it must be read as a standalone document. ML, DL. LLM etc. use statistics over a large amount of text, with no understanding of what any of the words mean, other than the word regularly appears in sequence with other particular words. When the legislation is used on an individual case – this real property, these financial assets, this income stream, it has to be precise in its handling, not smeared by statistics from a large amount of irrelevant text.

Active Structure greatly reduces the chance of getting it wrong