Friday, May 27, 2022

English is Purple and One Dimensional

English is Purple and One Dimensional

and why "Source Code" is a terrible place to put Software

Everyone agrees that a Picture is Worth a Thousand Words, but have you really stopped to think about why? 

An obvious reason is that pictures are 2 dimensional, and can make use of all the colors of the rainbow, while if I were to assign English a color and a shape it would have to be strictly Purple and quite Linear in nature.  Allow me to explain... 

It is purple because while we think of English as a Primary Color, English is in fact composed of two completely separate and distinct things, just like purple is actually a mixture of blue and red. 

What I mean is that English can be disambiguated into two completely separate and distinct parts.  A Blue Abstract Model representing the underlying facts and data about the idea being expressed, along with a Red Algorithm, that can interpret the meaning embedded in the underlying facts and generate the purple English that us human beings want to read, when given the Blue Abstract Model as an input.  We'll dig into exactly how this works below.

Additionally, the shape of an English document is very linear and one dimensional in it's nature.  In other words, when we read a book we read all the words in a big long line of words, starting with the first word, and then the next word, and the next, sequentially until "The End." 

We can read faster or slower.  We can skip over parts and move forward and backwards along the line - but in general, the meaning of language is conferred through it's vocabulary, syntax and grammar, rather than through it's physical location on the page or within a book.

These two facts about English, and about language in general are what make it such a terrible place to put software, technical systems, protocols, and complex ideas in general.

Instead, the English technical specification that human beings need and expect for any given project should actually be a report, written against an abstract model of the idea - which can be shared amongst all languages, including English, French, German, Spanish, C#, Python, SQL, etc.  All of these languages are purple, and one dimensional, and they are not a good format to assign as the "source" for complex ideas.

What's the Alternative?

At this point, you're probably asking... "Well, that's great EJ - but what's the alternative?", and that's a completely reasonable question.

While we might start with a Purple English description of an idea, literally the next step should be to extract a Blue, Data-Based, Abstract Model of the idea from the purple language (as described below,) that includes enough fidelity to unambiguously represent the strict facts about the underlying concepts.

Step 3 is then to validate that we have successfully accomplished this by creating a Red Algorithm (basically a "report") that reassembles the Blue Model back into the original Purple English.  Doing this proves, by it's very existence, that the model has enough internal detail to accurately represent the original idea.

The key is that by pulling apart the Purple into these two separate parts, a different "report", i.e. a different Red Algorithm, can take the same underlying facts from the model - but construct longer, possibly more detailed report about the idea... possibly in French, or German.  It would use the same underlying facts.  The same underlying elements and details.  The same information being communicated - but arranged into a different syntax, grammar - and possibly using a different vocabulary, i.e. possibly a completely different language.

Each of these different reports however hopefully represents exactly the same idea - the same underlying facts - which should be Language IndependentTruth should be language independent.

What is "Truth"

Answering this question may seem academic, but having a shared understanding of what makes something "true" is essential for meaningfully differentiating between a Linguistic Description of a system and a Blue Model or "Digital Twin" of the system as described below.

The best description I've heard for the definition of truth is that...

Something is True if it Comports with "Reality".

The problem with this definition is that of course everyone creates their own version of reality. 

So, while trying to communicate an idea with Bob, Alice will say some words, which hopefully have the same meaning to Bob as they do to her.  And then, based on Bob's understanding of Alice's purple words, he will say words back to her to convey what he think's she meant. 

And if Bob seems to be thinking of the same thing as Alice, then she may agree.  And if not, she might disagree and say more purple words back to Bob, in order to attempt to update his understanding to more accurately match her understanding of the idea that they're trying to communicate about. 

But this is all clearly an exercise futility,  even when both parties are speaking the same language, because "reality" is completely Subjective.  And all of this becomes dramatically more difficult when the parties are speaking different languages, like English and Python or C#, for example.

By contrast, the blue model described above can actually serve as "digital twin" (more below) of the idea - and then all we each have to do is agree that it accurately represents our understanding of the idea in question. 

This opens the door to simply define this "digital twin" as being "Reality" - at which point the truthiness of literally any linguistic statement, can be Objectively Tested by simply checking if it "comports with reality" - where reality is defined as our Digital Twin. 

Creating a "Digital Twin" for the underlying idea

Language's Purple and One Dimensional nature make it an undesirable candidate to be the "Source" encoding of complex ideas.  It is so inefficient at communicating complexity because all it can do is dance around an idea, relying on the parties consuming the idea to share the same understanding of every word and inference of the language being used.

The Blue Model described above, by contrast, is not a linguistic description of the idea.  It is not a language at all, in fact.  Instead, it is digital instantiation of the idea.  A digital example of the idea that really serves as the platonic ideal of the idea that we are ultimately trying to capture with Language, and as a result, provides us with the the opportunity to agree on a shared "reality". 

This "digital twin" can be created in virtually any no-code tool, from databases, to spreadsheets, to no-code services like Airtable, Tray.io, Bubble.io or others.  The only requirement is that it is not "code".  I.e. that it is just the decisions about how a system should behave, and should generally be exportable to json, xml, csv or a similar data-based, non-linguistic format.

This multi-dimensional data structure literally forms a physical picture in space.  Not a description, but a digital instantiation of the idea being discussed..

With this digital twin in hand, all the project stakeholders can agree that it is an accurate representation of the idea.  No words are needed.  They can simply look at the Digital Twin - and it if looks, and acts, and behaves as expected - everyone can give a simple thumbs up or a thumbs down. 

You could literally have 10 people approve the Digital Twin - and they could each literally speak a different language - and never actually communicate with each other directly in any way.  Instead, they all simply look at the model and give a thumbs up or a thumbs down.

Once everyone involved agrees, at an abstract level, that the digital twin accurately represents the system, protocol, or software that we are trying to actually build - everything else gets dramatically easier.

Mail Merge

Think of a simple mail merge, where we want to send the following email to Mary a Graphic Designer, along with Bob and Juanita who are applying to work in Sales.  

Dear Mary,
Thank you for your recent application to work with us as a Graphic Designer.  We will review your application and be in touch shortly.  Sincerely, HR Manager Ellen.
So while we want the same basic content within each of the emails, we actually need 3 different versions of the email, each one including candidate specific details like their name and job.  And, this email is just one of many different things that we need to do with this list of candidates.

We could do this by creating an email to Mary, and then copying and pasting it twice more, replacing Mary with Bob and Graphic Designer with Salesperson - and this is largely how software get's written, even in 2022.

Instead, however, we could also take the lessons learned in the 1980's, and split that 3 page purple content into it's two constituent parts. 

A list of candidates, along with the job they are applying for,   (a blue model)

Applicant  Phone     Job               Address...