Column: Encapsulation, DB Abstraction: Performance vs Scalability

By Tim Quax on 19 oktober 2010

Database statements getting in the way of your perfect object-oriented design is a real hassle. Having your application depend on the implementation details of the underlying data storage is no pretty sight either.

Using techniques like Object Relational Mapping or the use of a simple Database Abstraction Layer would solve this issue. However, there are quite a few catches on the subject.

Why bother?
As I said earlier, the ideal application doesn't depend or mess around with the underlying architecture. The biggest problem you'll encounter on this adventure is performance:

1. Having an extra layer in your application creates overhead (how much overhead depends on the implementation)
2. Having one way to potentially communicate with a lot of different underlying database architectures automatically means you're sticking to the standard query language, thus not using any database-specific optimizations.

Having database statements bother your business logic is for me personally the biggest concern. It basically means your database queries get completely interwoven with your bussiness logic, thus your object-oriented design. It's not only the taint throughout the entire application source code. Changes within the database structure will be hell to implement, as the access to your data is not centralized.

Why not to bother?
The performance is a real killer. Any abstraction layer will reduce the overall speed more or less depending on the amount of additional code that have to be executed. The more a database layer abstracts from the native database interface and tries to emulate features not present on all database backends, the slower the overall performance.

Besides the overhead, a more serious issue is the limit to the number of available database operations to a subset of those supported by all database backends the ORM or layer supports. Moreover, it may not fully support database-specific optimizations. These problems magnify significantly with database size, scale and complexity.

Last but not least, it goes without saying that you're adding another dependency to your application. Another link in the chain that may become obsolete, outdated or unsupported.

Data encapsulation
Having your GUI components directly linked to your database is really something of the past. Such applications violate one of the primary principles of any object-oriented design: data encapsulation.

Encapsulation allows your application to interact with data, in form of an object, without any knowledge of the implementation details or underlying structure. When the application and the database are very tightly coupled the GUI, business logic and SQL statements are all interwoven. The application can become a maintenance nightmare, with any changes in the database schema resulting in unexpected failures and a fiery hell.

So far we can conclude that we do indeed need encapsulation, if we want to move past the eighties and be good little developers. We still need a technique to implement this, there are a few options we can choose from.

Object-relational Mapping
ORM is a programming technique for converting data between a variety of architectures within object-oriented programming languages. This actually creates a "virtual object database". The application does not request data through queries, it requests objects that can be used from within the programming language in question.

What's important to note is that ORM in fact maps your SQL data to objects that your code can fetch and update. This means no more queries, thus no more database-specific performance improvements. It also means you still most likely have to follow the strict language of the ORM, which it uses, instead of the SQL you are familiar with. The extra layer also provides more overhead.

Database Abstraction Layer
A database abstraction layer is an application programming interface (API) which unifies the communication between an application and databases such as MySQL, PostgreSQL, Oracle or SQLite. Like all vendor lock-in enthousiasts, all database vendors provide their own interface for their products effectively leaving everything up to the application programmer to implement code for all database interfaces.

Database abstraction layers do in fact speed up development time as they provide a consistent API and hide the database specifics behind this interface as much as possible. It goes without saying that there are many abstraction layers with different interfaces in numerous programming languages.

My biggest concern lies with the fact that an abstraction layer or an ORM unifies the language. The number of solutions that still allow database-specific optimilizations to sit next to the regular unified language are very limited (of which I am aware of). I only know of one such library; OpenDBX, which is a database abstraction layer. It's library is just for C, however.

In PHP for example, you can use the standard database abstraction layer (MDB2) to handle the database architectures underwater. Though you're still sending queries, it will depend on the programmer to use the standard SQL language to keep everything compatible.

And we still need data encapsulation, thus mapping your database specifics to objects. Which means we're all bound to ORM projects when we really need all advantages (portability, scalability, encapsulation). When you just need encapsulation, you might want to consider building your own tool.

Read more on the subject:

Devshed - use database abstraction layer in PHP
Active Record - ORM for Ruby on Rails
More info on Object Oriented Design

This is a column from Tim Quax's blog. For more of his posts, check out his site.

React on this article

Enter the code here: