Abstractions
Abstraction is a fundamental operation of the mind. (According to Wikipedia, abstraction is believed to have developed between 50,000 and 100,000 years ago). In philosophy, Locke views abstraction as the act of separating from ideas all other ideas which accompany them in their real existence. In computer science, Dijkstra views abstraction as the creation of new semantic levels. Locke’s definition is structural in that abstraction tends to remove aspects of an idea, especially aspects that make an idea concrete. Dijkstra’s definition emphasizes the goal of abstraction. After all, it doesn’t seem sufficient for abstraction to be the separation of arbitrary aspects from ideas. Instead, it is the chiseling away of specific characteristics that bring value to abstraction. Characteristics are removed yielding an idea that isolates interesting components of a less abstract idea. One advantage of this process is that of generality - propositions believed to be true about an abstract notion should also hold for the more specific notion. Another advantage is that of encapsulation - references can be made to abstractions instead of concretions thereby simplifying the problem space by disregarding unimportant details. Joel Spolsky, with the Law of Leaky Abstractions, observes that “All non-trivial abstractions, to some degree, are leaky”. This observation alludes to the dark side of abstraction - the side where abstraction is misguided, shortsighted and of course leaky. The law stems from the software engineering perspective, however, it is applicable to all manifestations of abstraction. In software engineering in particular, the act of abstraction is pervasive - function declarations, class declarations, assignment of variables are all acts of abstraction. As such, software engineering also provides ample opportunity for leaky and needless abstractions.
In hindsight, abstraction is absolutely fascinating. Without it, one would be hard-pressed to envision human thought let alone language, mathematics, art, etc. Abstraction breeds generalization and these concepts are perhaps shadows of a single principle cast in different directions. Abstraction itself hinges upon another capacity of thought, symbology, which at the most fundamental level is association between entities. And thus encapsulation is made - with association between a symbol and its meaning. Encapsulation is a mechanism for managing complexity in the face of limits upon resources of the mind. The mind can only have a certain number of things under consideration and without abstraction, generalization and encapsulation, compound ideas could not be formed. We’ve now derived Locke’s three acts of the mind:
The acts of the mind, wherein it exerts its power over simple ideas, are chiefly these three: 1. Combining several simple ideas into one compound one, and thus all complex ideas are made. 2. The second is bringing two ideas, whether simple or complex, together, and setting them by one another so as to take a view of them at once, without uniting them into one, by which it gets all its ideas of relations. 3. The third is separating them from all other ideas that accompany them in their real existence: this is called abstraction, and thus all its general ideas are made.
The concept of abstraction can be illustrated in a variety of ways, all reducible to each other. In a sense, the different portrayals of abstraction are different manifestations and implications of a fundamental axiom. Abstraction can be viewed as representation in that an abstract entity represents a concrete entity. An abstraction can also be defined as a strict subsets of commonalities between things.
Linguistics
In linguistics, abstraction emerges in various forms and one particularly peculiar form is the abstraction hierarchy that is the relation among syntax, semantics, and pragmatics. Syntactics, is the study of relations of signs to one another. This is an abstraction of relations between signs and meanings, which is semantics. Semantics, in turn, is an abstraction of pragmatics which is the study of relations between signs and contextual interpretations. This hierarchical aspect of abstraction is also revealed in linguistics as relations between words ordered by generality, such as for instance the ordering between Noam Chomsky, a specific person and the simply person, a more general term that encapsulates references to any person.
Nature
Abstraction is not a human made creation but is present throughout nature. Abstraction at the quantum level allows for the formation of compound structures such as atoms and molecules which in turn combine into cells and ultimately into all forms of life. It is no surprise then that abstraction manifests in human cognition which can be viewed as an extension of the complex chain of abstractions leading up to it, as an emergent behavior. The brain is the most complex structure in the universe and perhaps computer systems will have to emulate biology beyond artificial neural networks in order to exhibit such complexity.
Mathematics
Category theory is a pinnacle of abstraction in mathematics. It purports to formally unify all mathematical disciplines by way of abstractions - collections of objects and arrows. An example from an older discipline of Group Theory, follows. The Abelian group is an abstraction of the familiar integers (more specifically the operation of addition of integers). The group is named after, Niels Henrik Abel, who independently invented group theory to prove a theorem regarding solutions to 5th degree polynomials. This is an example of the power of generality in mathematics - it makes reasoning about certain aspects of mathematics more natural and even elegant.
Mathematical Physics
In 1928, With the Dirac equation, Paul Dirac postulated the existence of the positron. The mathematical model allowed for an electron with positive charge which seemed to contradict experimental results at the time. Ultimately, Dirac’s equation lead to a remarkable discovery in quantum physics and an eerie and intimate relationship between mathematics and reality. This marks yet another mathematical abstraction success story where mathematics, a discipline devised by humans, makes predictions about the physical world. In comparison to other disciplines, mathematics has an advantage in that it is self-defined and purified from real world concerns. Definitions and problem statements are reduced to elementary concepts which allows abstraction to thrive. By contrast, abstractions in programming can become “leaky” due to forces that cannot be removed from consideration.
Computer Science / Software Engineering
Abstraction is a cornerstone of computer programming and it echoes throughout the hierarchy starting with machine language, continuing with the C programming language and leading all the way up to user facing application components such as windows, buttons, etc. Low-level programming languages, such as assembly, are termed as such due to the low degree of abstraction between the language and the underlying hardware. Low-level languages are elementary, but difficult to use for construction of complex applications because of the impedance mis-match between human thought and machine codes. Furthermore, a low degree of abstraction implies a high degree of coupling to the underlying hardware thereby reducing portability. High-level programming languages consist of statements and expressions that are closer to natural language. Since high-level programming languages must still run on a computer they are translated to lower level languages, ultimately resulting in machine code. This translation process creates a layer of abstraction, which enables increased portability and expressiveness. The need for translation however is also a cause of trouble for abstractions in computer programming. The reality is that the mechanics of the underlying hardware cannot be entirely escaped and are bound to creep into higher levels if not managed properly. Take for example garbage collection which purports to abstract away memory management. While a powerful abstraction it carries a set of compromises that cannot be overlooked. One is determinism - since required memory manipulation operations are executed behind the scenes, the programmer cannot be certain of the memory state of the system. Another compromise is hidden complexity - while for the most part programmers can forget about memory management, they must also be aware of a complex GC system at play to understand certain intricacies of their programs.
OOP
In OOP, classes and interfaces are mechanisms of abstraction. The concept of inheritance in OOP borrows from inheritance in nature and allows specialization and reuse. Classes, however, support only a narrow view of abstraction. Any graduate of an entry level OOP course should be able to understand the difference between a class and an instance of a class - an object. The difference however is very much arbitrary. An instance of a class stores state in fields declared in the class definition. In this way, an instance of a class is distinguished from the class itself - it can store specific values. What if the instance could not only hold state, but can be augmented with new behaviours? Is it still the same “class”, or is it a new thing all together? From the perspective of syntax the difference between a class and an object is evident. From a perspective outside of syntax, an object can be viewed as a class for a whole new set of objects which derive from and specialize the class in some way. Prototype-based programming languages, such as JavaScript, don’t utilize classes as abstraction mechanisms, instead supporting the cloning of objects allowing any object to serve as a prototype for another. This is a more fluid and flexible approach to abstraction, although compromising on some of the benefits of OOP.
MDA
Model-Driven architecture aims to raise the level of abstractions in software engineering beyond specific development platforms with use of domain-specific languages and transformation tools. However, as of today, MDA techniques have yet to gain industry acceptance or demonstrate a concrete value proposition. This is due to a variety of reasons and perhaps there is a limit to the degree of abstraction attainable, at least with current approaches.
Von Neumann Architecture
As another testament to elevating abstractions, John Backus called for the liberation of programming from the Von Neumann style in his Turing Award lecture. The vast majority of modern computers are based on the Von Neumann Architecture and the vast majority of programming languages are abstract isomorphisms of this architecture. This basically means that there is a one-one mapping between the hardware and software. Backus coined the term “Von Neumann bottleneck” in reference to both hardware and software limitations that are byproducts of this isomorphism. In the hardware context there is a literal bottleneck which inhibits performance due to limits to data transfer between memory and the processing unit. In the intellectual sense, there is a bottleneck which inhibits reasoning about programs because the architecture encourages a variable-at-a-time thinking. Backus proposes functional programming languages as an evolution of the Von Neumann style and over 30 years after his lecture, his propositions are becoming mainstream. Unfortunately, we’ve yet to witness significant advancements beyond the Von Neumann bottleneck and it remains one of the challenges of abstraction in computer science.
Success
Although great challenges lie ahead, the fields of software and hardware engineering have had immense success with abstractions. Just consider all the intricacies involved in something as simple as opening a website as elucidated in Dizzying but invisible depth. It is a humbling experience to even begin to fathom the countless moving parts, the years of the evolution of human knowledge, the countless brilliant minds that make this seemingly simple action possible.
Needless Abstractions
Needless abstractions are abstractions that don’t bring value. The definition of value, of course, is subjective which highlights the qualitative nature of abstraction. In other words, not all abstractions are created equal - if abstraction is to create new semantic levels and manage complexity it must be invoked in ways appropriate to a given context. Needless abstractions arise when the programmer’s mental model of the program is reflected directly in code without regard for its utility. An interface may be extracted from a class thereby creating an abstraction, however if the interface isn’t used then it brings no value. Instead it can raise confusion and unnecessary dependencies. Avoiding needless abstractions can be difficult because designing abstractions is what programming is all about. The programmer scans his or hers mental model, detecting commonalities, forming composites, mapping relationships and this process can create abstraction-waste-by-product. An example from enterprise development is the generic repository. While the intent is alluring - to reuse and generalize data access code, the drawbacks outweigh the benefits. Moreover, the apparent benefits are misguided and can be attained using more appropriate methods.
The drive for abstractions is a natural consequence of programming, however it must be kept in balance, as often presented by Ayende with his “Limit your Abstractions” series. Conversely, avoiding abstractions for fear of complexity shouldn’t be the default position, because well designed abstractions have the potential to improve clarity, reuse and ultimately advance the state of the art. Additionally, abstractions can furnish encapsulation, which is much needed in programming where one must be capable of switching between levels of abstraction different by orders of magnitude.
When designing abstractions, it is beneficial to keep in mind principles such as YAGNI and KISS. True wisdom is held in combating complexity with expressiveness, not at the cost of expressiveness - just enough abstraction, but no more. Needless abstractions can emerge at all scopes of software engineering, from the developer to the architect. After all, developers and architects are on different ends of the same spectrum the central distinction being that architects analyze applications at higher levels of abstraction. Many of the driving forces however are the same, which is why many times patterns that apply at the class level are also applicable at the system integration level.
Abstraction Challenges
The leap in abstraction between machine language and C, achieved in the early seventies, is yet to be surpassed. For instance, the leap in abstraction between C and C# is minimal and owes more to garbage collection, runtime libraries and syntactic sugar than to semantics and expressiveness. Both languages are third generation. Fourth-generation languages are domain-specific pockets of higher levels of abstraction and declarative-ness, such as SQL. Fifth-generation languages, such as Prolog, are further abstractions away from machine language however we’re still far from working with levels of abstraction accessible to humans. Japan’s failed 5th generation computer project is a paragon of the challenges of abstraction. There seems to be a bottleneck in our current approaches and methodologies. If we encounter challenges with something seemingly simply as object-to-relational mapping then surely we will encounter challenges in mapping program code to natural thought. How do we map Von Neumann machines to a largely uncharted network of billions of neurons? Is there a limit to what can be achieved with current hardware technology? In order to attain greater sophistication, will computers have to become more like biological organisms? Will this compromise determinism?