A simple way to reduce complexity of information systems

When it comes to information systems, things can get pretty complex, to say the least. A typical information system like a web service, at the most basic level, is just one process in a massive, integrated data pipeline. It deals mostly with data processing: fetching data, transforming it and passing it on to another system. But as other systems pile up on top of it, the complexity builds up quickly. Managing and mitigating that complexity then becomes a major challenge for developer teams.

Traditionally, information systems have been implemented using software programming paradigms like Object-Oriented Programming, based on the concept of “objects”, which can contain data and code. Information systems that follow Object-Oriented Programming with no constraints tend to be complex, in the sense that they are hard to understand and hard to maintain.

The increase of system complexity tends to reduce the velocity of the development team as it takes more time to add new features to the system. Hard-to-diagnose issues occur more frequently in production. Issues that cause either user frustration when the system doesn’t behave as expected or even worse, system down time.

Three aspects of Object-Oriented programming are a source of complexity:
1. Data encapsulation in objects
1. Non-flexible data layout in classes
1. State mutation

Data encapsulation inside objects is beneficial in many cases. However, in the context of modern information systems, data encapsulation tends to create complex class hierarchies where objects are involved in many relations with other objects.

Over the years, this complexity has been alleviated by the invention of advanced design patterns and software frameworks. But information systems built with Object-Oriented programming still tend to be complex.

Representing every piece of data through a class is helpful for tooling (e.g. autocompletion in the editor) and errors like accessing non-existing fields are detected at compile time. However, the rigidity of class layout makes data access not flexible. In the context of information systems, it’s painful: Each and every variation of data is represented by a different class. For instance, in a system that deals with customers, there is a class that represents a customer as seen by the database and a different class that represents a customer as seen by the data manipulation logic. Similar data with different field names, but the proliferation of classes is unavoidable. The reason is that data is “locked” in classes.

In multi-threaded information systems, the fact that the state of the object’s is allowed to be mutated is another source of complexity. The introduction of various lock mechanisms in order to prevent data from being modified concurrently and to ensure the state of our objects remain valid makes the code harder to write and to maintain. Sometimes, before passing data to a method form third-party libraries, we use a defensive copy strategy to make sure our data is not modified. The addition of lock mechanisms or defensive copy strategy makes our code more complex and less performant.
Data-Oriented Programming (DOP) is a set of best practices that have been followed by developers in order to reduce complexity of information systems.

The idea behind DOP is to simplify the design and implementation of information systems by treating data as a “first-class citizen”. Instead of designing information systems around objects that combine data and code, DOP guides us to separate code from data and to represent data with immutable generic data structures. As a consequence, in DOP developers manipulate data with the same flexibility and serenity as they manipulate numbers or strings in any program.

DOP reduces system complexity by following three core principles:
1. Separating code from data
1. Representing data with generic data structures
1. Keeping data immutable

One possible way to adhere to DOP in an Object-Oriented programming language is to write code in static class methods that receive data they manipulate as an explicit argument.

The separation of concerns achieved by separating code from data tends to make the class hierarchy less complex: instead of designing a system with a class diagram made of entities involved in many relationships, the system is made of two disjoint simpler subsystems: a code subsystem and a data subsystem.

When we represent data with generic data structures (like hash maps and lists), data access is flexible and it tends to reduce the number of classes in our system.
Keeping data immutable brings serenity to the developer when they need to write a piece of code in a multi-threaded environment. Data validity is ensured without the need to protect the code with lock mechanisms or defensive copy.
DOP principles are applicable both to Object-Oriented and to functional programming languages. However, for Object-Oriented developers, the transition to DOP might require more of a mind shift than for functional programming developers, as DOP guides us to get rid of the habit of encapsulating data in stateful classes.

Yehonathan Sharvit has been working as a software engineer since 2000, programming with C++, Java, Ruby, JavaScript, Clojure and ClojureScript. He currently works as a software architect at CyCognito, building software infrastructures for high scale data pipelines. He shares insights about software at his tech blog. Yehonathan recently published the book Data-Oriented Programming available from Manning.

Full-stack Web consultant who shares his passion for Clojure by leading Clojure workshops and speaking at conferences worldwide.