Everybody is talking IT systems integration, data exchange, process automation, instant distributed information access… and that’s good; there are many reasons why IT systems require integration. But why do integration projects fail? Why does integration maintenance sometimes become so expensive and ‘back-to-manual’ seems the only way to make things work again? Although the business context is important, I will follow the request of some of our clients and concentrate on one particular technical aspect – flawed integration based on a database.
In the good old days, IT systems were built upon two basic components – a database and a client application. The application was installed on the users’ computers and accessed one shared database. To integrate such a system with another system, an IT/DB guru needed to connect two databases to exchange data – simple as it was. It usually worked quite well, anyhow, nowadays the situation is dramatically different.
Issue #1 – Application changes
Each time one of the integrated systems is updated the database structure changes – it is unstoppable. Users want new and improved functionality so the underlying data are subject to change. The most problematic changes are the small ones that make the integration fail only a little further after the change is introduced (and tested :-).
Consider the following situation: in both integrated systems a customer email address is mandatory. Then there is a small change in one of them and the email is no longer mandatory (a real example of wise change). After the update the integration works perfectly. However, slowly, day after day as new customers are added (and some of them without an email address) it starts to fail. This is hard to detect and hard to fix.
Significant database structure changes are easier to detect but may result in a need to completely redesign the integration.
Issue #2 – Raw data access
In a modern Web-based application before the data loaded from a database are presented to a user it is enriched – cleaned, formatted, rounded, localized and aggregated. While accessing a database directly, one is losing all this additional logic built into the application. Moreover, one may implement (duplicate) this logic in the integration module differently (maybe better, but differently!).
Imagine a (real) situation. A user gets the same invoice data from two integrated systems with a different total amount – a result of raw data access and different rounding policies used. This would be hard to test, hard to detect and hard to fix.
Issue #3 Semantical mismatch
One name may mean different things – not only in literature. Right naming is a key skill of any IT guy. Typically, during design&coding the meaning of a name seems to be obvious and hey, long table and column names are not fancy and thay are definitely not the ones used in a user interface.
While integrating a database one may find strange names – e.g. German mixed with English; strange (non ASCI) characters as far as commonly used, multi-meaning terms like ‘type’ or ‘code’. The problem is that they could be misleading. To get a total of all invoices issued last month a simple query had been created to sum-up a ‘total’ column of matching invoices. It worked perfectly, except a few days ago the business started claiming something is wrong with the final number.
Has the DB failed? No, but since last year the invoices are also issued in another currency… While designing the integration somebody assumed there will be only one currency (it was always the same value in a currency column in our DB, so we ignored it). Does it mean the entire balance for last year is to be corrected…. Hard to detect, hard to fix and very costly.
Issue #4 Security & QoS
Many DB based integration initiatives start with a promise that the data will be accessed only in read-only mode (RO). Even if it is the actual truth, database level security control is very limited. There is no control over what data is accessed. It is very rare to enable access only to certain tables and columns, and simply not possible to limit access to only part of the data in a table (e.g. financial information of one department, not the entire company…).
Moreover, accessing the database can influence the basic application, as a complex or frequent data synchronization operation can significantly degrade the entire database performance resulting in basic application instability and failures.
We’ve named only some drawbacks of a database based integration of two systems. If you consider multiple applications integrated using a shared database the problems grow exponentially.
Originally, most IT systems where data oriented. The entire design started with the design of a database structure. Such a structure used to be a solid, hard to change, heavily normalized foundation of the entire application. It is not like that anymore. The design used to start from a domain model (expressed in object-oriented notation). Today it moves even further towards user behavior analysis – from the modeling of data structures (nouns) to modeling of functions and behaviors (commands and events).
Contemporary applications use a database on their own, rarely assuming other applications will access it directly for integration purposes. A developer adapts the database structure to a task/new functionality he is to deliver “yesterday”, so he does not care about an integration module accessing HIS database in an unsupported, not covered by the license or even illegal way.
The number of legacy applications that publish their database schema changelog is decreasing. Even though such a document can be generated automatically, the cost of tracing the changes, analyzing the potential impact and implementing the fixes on the client side is horribly high and time-consuming (delaying the deployment of a new version).
APIs to the rescue
In the old days there were no alternatives – today there is one: an API (Application Programming Interface).
“Just as a graphical user interface makes it easier for people to use programs, application programming interfaces make it easier for developers to use certain technologies in building applications. By abstracting the underlying implementation and only exposing objects or actions the developer needs, an API reduces the cognitive load on a programmer. While a graphical interface for an email client might provide a user with a button that performs all the steps for fetching and highlighting new emails, an API for file input/output might give the developer a function that copies a file from one location to another without requiring that the developer understand the file system operations occurring behind the scenes.”
API provides access not only to pure data but also to the application’s functions (operations). API is designed and implemented with backward compatibility in mind so by design it warrants fewer (if any) problems on upgrades. API gives better security control over users accessing it and a more fine-grained approach for permissions granting. API structure and naming is aligned with the application domain and functionality. It mostly uses the same domain language as the user interface. APIs are better documented than database structures (sorry DB guys). API enables data modification without system integrity violation…
Modern API not only enables data access and method executions but it also supports callbacks. Instead of periodical pull request to check for data change, an integration module can ask an API to call-it-back for a particular event e.g. when data matching some criteria is created/modified.
Using an API nowadays has a very low technology entry barrier – REST API clients are simple to implement with only a few lines of code. With API usage the integration problem is not about technology anymore but about what it was from the beginning – the appropriate matching of data and functions of one application (domain) with another. It is about finding a common language, terms and meanings that match the specifics of the systems under integration.
Good luck with your integrations!