Monday, November 7, 2022

Characteristics of a Great Single Source of Truth

The Power and Magic of Knowledge Graphs comes from a few, essential elements that they must support.  By far the most import of these is queriability, but a close second is that there is only 1 of them - like literally, in the world ideally.

Some additional attributes that make for a good single source of Truth are:

  1. A great Single Source of Truth is, basically, a traditional RDBMs
  2. + 3 simple types of read-only, calculated fields
    1. Parent Lookup Values i.e. CustomerTaxRate = order.CustomTaxRate
    2. Child Aggregations - i.e. LineItemSubTotal = sum(lineItems.SubTotal)
    3. Simple, In-Row Calculations - i.e. Total = CustomerTaxRate * LineItemSubTotal
  3. The ability to export 100% of this data to a Json, XML, CSV or similar data storage format as a snapshot of the SSoT at any given moment in time.

Queriability

The main problem with language in general is that it is not "queriable" without an intelligent agent (either human or AI) to parse the "natural language text" into a model that can be queried with semantic precision.

Singleton

This makes the cloud a fantastic place for a Knowledge Graph, because it can act as a Single Source of Truth by construction, based on the fact that anyone in the globe has access to literally a single instance of the top level graph.

Normalized RDBMs

This makes it queriable, and normalized at it's heart, allowing for an N-dimensional structure.

3 simple types of Read-Only, Calculated Fields

In addition to a traditional, normalized RDBMs, a good single source of truth allows for building on top of this N-Dimensional "Knowledge Graph" with three additional types of Inline, Read-Only, Calculated Fields that are updated automatically at write-time, for any part of the database on which they depend.  In this way, the basic, normalized structure can be fleshed out with mult-dimensional, related meta data about each and every element in the graph, that automatically updates/maintains itself any time the underlying, normalized part of the data changes - allowing for an always internally consistent representation of literally any normalizable concept/system possible.

A DMZ of Knowledge

The knowledge graph then, which focuses exclusively on capturing facts in Tripples (i.e. 2 nodes + a directed-edge connecting them) in an extremely granular and precise format.  There can be N agents updating this graph - each completely unaware (and unaffected) by how many agents there are on the other side of the DMZ consuming the graph.

The benefit of the graph, is that like the translation between AI and Human, where we don't speak their language at all, then can still communicate their knowledge, unambiguously, by simply responding to any "question" or "query" that we put to it, with a Json blob of data containing facts that unambigously settle the question being asked.

It will not be a narrative description of the answer, because we already tried that.  We had all our agents (the people) write down everything in their head and publish that "language" to "the internet" and then we put AI "search engines" in front of all of those documents.  In this model, when we ask a question, the protocol is that the now increasingly intelligent "search agent" gets to return 10 links to us, where hopefully, somehwere in the first few links will be some text that may/may not answer our question.

We tried this from the early 90's through ~2015.  

When you ask google "When was the war of 1812" - it does not start parsing the pages of the internet to answer this question.  

This is still the essentially the model for the left side of Google, but increasingly, the actual answer to your question can be found directly, as a direct answer to the actual question asked - on the right side of the page.  The reason for this is that the right side of the page is an automated report, built from the Json blob of actual answers to the given question, which is the result from querying a Knowledge Graph.  

Putting it all together

With this powerful framework unlerlying all of the technology, the Single Source of Truth that lives "in the cloud" can be exported in pieces, so that each consumer needs only look at the part of the graph that relates to the part of the problem being addressed, and yet, with confidence that each small piece correctly relates to the whole, in the grand scheme of things.

This structural representation of the idea makes for a powerful framework on top of which to build cross-platform software that will tend to remain internally consistent, because most of the decisions are simply following their upstream knowledge graph.