Agile Database Development
Traditional approaches to data don’t match the pace of today’s organizations/needs/environment.
The culture of database development is rooted in the 1980s view that a central DBA must control the database. No one else can touch it. That certainly helps to ensure that the database is defined in a consistent manner, but it makes changing the database a slow process. The slow pace of the traditional approach does not match the rapid rate at which the needs of today’s organization change. The traditional database approach is also a severe drag on the 2-4 week cadence of Agile software development teams.
The way that database work often manifests today is that an Agile development team designs their code and then requests changes to the database schema: then they wait for the DBA to implement the schema changes, modify load programs, test all the changes, and deploy the changes. Then – and only then – can the developers start to test their code. And if they need more changes, start all over again. By then, the development team’s iteration would be over with!
Agile relies on each developer being able to change whatever they need to change, testing it on the spot, and then making those changes available to other developers to build upon. The turnaround time for changes is minutes or hours - not days or weeks. Otherwise, the short two-week iterations that are common for Agile teams would not be possible.
Databases get in the way of the Agile process, because of the way that databases are traditionally developed and maintained. We need to fix that. This article is the first of several that will cover different strategies and techniques that bring the speed achieved through Agile software development to database development.
Schemas As Code
A key part of the traditional database approach is the management of schemas. One way to overcome the long turnaround time for schema changes is to use a database technology that does not require schemas! Many NoSQL databases either do not use schemas to constrain data, or make schemas optional, and so programmers are in full control of what gets stored and the relationships among the data. This indeed affords a great deal of flexibility, but it comes at a price: it is like having no type checking in a programming language. That is, the programmer can do whatever he or she wants, but there is no safety net: for a schema-less database, there is no referential integrity rule system. This has major implications for maintainability. One can address that by creating software tests that add back the integrity checking. This means more work for developers, but there are strong tools and practices to support that approach. Effectively, this approach moves the data integrity rules from proximity to the data, to proximity to the application code that uses the data.
There are some classes of problems for which schema-less databases are a powerful choice, specifically document- and network-centric ones. However, in other cases, schema-based solutions are almost mandatory, such as when the data is shared across a wide range of applications.
When schemas are used, they provide constraints and ensure data consistency, but they are a potential bottleneck for database development because – at least traditionally – schema changes are funneled through a small set of individuals. However, many people are developing SQL databases in a more Agile way with great results. The keys to changing things for the better is to stop treating the database as the central artifact for which access is controlled, but instead treat the code that is used to create the database as the central controlled artifact: that is, the DDL and scripts that create and populate the database. Treat that as code, and then apply standard Agile coding practices to it – supporting shared edit access but version control in a code repository, combined with comprehensive regression testing.
Once we do this, we can then stop worrying about whether someone has changed the database, because we have Agile tests for the code – tests that check that the schema is the way we want it. We can also stop stepping on each other in the shared database test instance, because we can now each have our own database instance for development, which we create using the code – the DDL that we pull from the source code repository. We can stop worrying that someone has run tests and modified the test data, because when we run our tests, we recreate the database from the code (the DDL), and we load our test data as part of test setup.
Large test runs using production-like data can still be done, in a shared test instance, but that is no longer the instance that we are all using for our developer testing, so we don’t have to worry about stepping on it during our development.
DBA As Coach
One more thing: Instead of being a bottleneck, the Agile DBA is a coach, helping others to keep their schema changes efficient: the DBA pairs with programmers, and checks the code – the DDL – and works alongside everyone else, helping them.
The Agile challenges of database development are clearly many: we have only touched on some of them here. In future posts we will address other challenges in more depth.
For starters, a fundamental mindset shift is needed: Think of DDL as code!