Placing Knowledge on Center Stage

In this post I survey the evolution of various programming paradigms and emphasize the importance of expressing and isolating domain knowledge. Paradigms such as OOP, AOP, DCI, DDD, and Hexagonal are regarded as having a central goal of facilitating the representation of knowledge all while supporting integration with technical components. Finally I introduce a knowledge-driven architecture by Jeff Zhuk.

The vast majority of computing today can be decomposed into operations of a Turing Machine. Contrarily, the vast majority of humans think in terms of concepts far beyond symbols on a tape. Perhaps, as alluded to by Douglas Hofstadter in Godel, Escher, Bach, consciousness is merely an illusion established by a balance between self-knowledge and self-ignorance. Self-knowledge is the extent to which we are aware of our thoughts and are able to trace the actions of our mind. Self-ignorance consists of the sub-conscious as well as all of the functions of the central nervous system. Given a thought, we can likely factor it into constituent propositions and statements, which themselves may be further factored. On the other hand, we can’t feel the firing of the underlying neurons or operations of the cerebral cortex.

Natural characteristics of the brain and mind are in turn reflected in the architectures of computing devices. The CPU performs very basic arithmetical and logical operations the fundamental principals of which have remained unchanged since its inception. The software which the CPU ultimately runs however is far more complex than those basic operations. Programming languages and the practice of software engineering have been devised to tame this dichotomy. Yet today, many years after the first CPU and the first program, there remains an ongoing battle between the lower-level forces of hardware and the higher-level forces of domain knowledge.

Imperative & Declarative

The battle between man and machine is fittingly illustrated by the contrast between imperative languages and declarative languages. Imperative languages can be thought of as bottom-up abstractions over the underlying hardware. Declarative languages on the other hand are top-down - they represent information and leave it up to the language compiler to translate and convey this information to the underlying hardware. Functional languages in particular are declarative because they are implementations of the lambda calculus on a Turing machine. In terms of practical utility, imperative languages have been winning the battle as evidenced by the predominance of C decades after its creation. The bare-bones simplicity of C and its proximity to the underlying machine are part of the reason for its continual relevance. What this indicates, however, is that programming language technology has yet to attain the level of abstraction and expressive power to make something like C less relevant.

Object-oriented

Regardless of the continual prevalence of lower level languages, ambitious attempts at elevating abstraction can provide valuable insight. Take for instance object-oriented programming. Douglas Engelbart envisioned the computer as an extension of the human mind and OOP can be regarded as the incarnation of his vision. Today, OOP is a predominant programming paradigm. The problem is that the promise of object’s capacity to capture the end user’s mental can be deceptive. In the context of GUIs objects serve well in representing the domain. However, for other domains, especially ones based on reality such as LOB applications, OOP’s weaknesses in expressing collaboration can become a notable design and modeling hindrance. OOP can also be somewhat misleading because a class can rarely represent its counterpart in reality to the full extent. For example, a bank account class in an ATM application may model state to represent the available balance and expose behavior for adjusting the balance while protecting invariants. This however represents a small fraction of the functionality required to perform a withdrawal, which also entails aspects such as transactions, server connections, etc. The ATM withdrawal example is drawn from an article on DCI architecture which provides a framework for expressing collaborations between objects based on roles.

DCI, Hexagonal and Domain-Driven

The method of action of the DCI architecture facilitates explicit representation of domain knowledge by providing a tailored language of expression as an OOP based framework. DCI was devised in order to compensate for the lack of behavioral expressiveness in traditional OOP. Not surprisingly, similar instances of domain knowledge emphasis abound. An age old mantra in software engineering is the segregation of business logic from presentation and infrastructure logic. This segregation is beneficial not only due to advantages of traditional layering but also due to the emergent isolation of domain knowledge. Alistair Cockburn’s Hexagonal Architecture builds upon this idea and applies it at an architectural level. Domain knowledge is placed at the center with infrastructure components adapting to it. In a sense, knowledge “ripples” from the core throughout components which integrate this knowledge with infrastructure. Another prominent example of knowledge isolation is Domain-Driven Design. A fundamental premise of DDD is placing focus on the core domain, on domain knowledge. The intent is to capture the informational core of the business problem. The remaining components of a working system, while being absolutely essential, are supporting in nature. In retrospect, all of this makes a great deal of sense - after all, computers were designed to solve human problems, not the other way around.

Aspect-oriented

Aspect-oriented programming introduces new mechanisms of composition, partitioning and encapsulation through the notion of a concern. Concerns contain pieces of domain knowledge and the facilities provided by AOP enable composition of concerns and associated behaviors. As a whole, the aspect-oriented paradigm establishes an informational topology wherein knowledge propagates from the core domain out to supporting components. Much like the other paradigms, this type of topology is effective due to its positioning of domain knowledge at the center.

Rediscovering the I in IT

Despite significant advances in programming language theory and software architecture, the I in IT is all too often overshadowed by the T. Reg Braithwaite portrays this phenomenon in Economizing can be penny-wise and pound foolish by coloring code to depict the signal to noise ratio. Green colored code is code that directly express the problem at hand. Yellow colored code represents the accidental complexity of a programming language. Red represents code which has no identifiable function. The goal then is to eliminate red code, reduce yellow code and emphasize the green code.

How can we get there? Where are the weakest links? To some extent the issue is driven by the fact that programming languages carry a double burden. On one hand, a programming language is a place to organize one’s thoughts and express domain knowledge. On the other hand, a programming language must be compiled or interpreted to be ultimately converted into a series of elementary memory manipulation statements. As such, programming languages must be expressive yet simple to use, unambiguous and preferably verifiablee. Expressiveness, simplicity and verifiability are a tough bunch to triage.

Formal Techniques

Systems such as Hoare logic, algebraic specification languages and denotational semantics are powerful formal verification methods but demand a great deal of sophistication on the part of the programmer and are often impractical as a result. Type systems encompass formal methods which are sufficiently tractable to be widely applicable, yet mainstream programming languages usually support only the tip of the iceberg of the theoretical capabilities. For instance, algebraic data types such as discriminated unions and the associated pattern matching techniques are powerful mechanisms for expressing domain knowledge. Yet these techniques aren’t available in mainstream OOP languages such as Java, C#, etc. They are available in functional languages such as F#, but even most functional languages don’t support higher order techniques such as the polymorphic lambda calculus System F. This seemingly relentless friction leads to some concerning questions. Are modern programming languages approaching the boundaries of the balance between power and accessibility? Will programmers need to embrace more advanced formal techniques in order to advance the state of the art?

Knowledge-Driven Architectures

All of the above-mentioned paradigms share a common goal of facilitating the conversation between humans and computers. Semantic architectures embody yet another approach to distilling knowledge in software systems. Semantic architectures involve technologies and practices such as ontological engineering, the semantic web and the Web Ontology Language (OWL). These relatively new fields of computer science evolved from the observation that domain knowledge is the the most important aspect of a computer system. In order to be practical, knowledge representation schemes should allow not only for expressive but also for seamless integration with the infrastructure. Ontology languages such as CycL aim to provide such environments. With IT of the Future: Semantic Cloud Architecture Jeff Zhuk outlines a transition from existing SOA architectures to novel knowledge-driven architectures. Knowledge-driven architectures aim to align business and IT and eliminate duplication of knowledge. In this way, they are an evolution of the SOA vision.

Comments