Database Versioning: The ignored aspect of version control.

Version control is an important aspect of writing code. From the initial check-in to the final build, it is crucial to keep track of every change that is made to the code. Like stacking blocks of legos, you will find what started with a single brick evolving into a city. Databases, however are often ignored from version control. If you are lucky you use some form of ORM to connect to your database like Active Record. This will handle database migrations for you and provide database version control to some degree. However for legacy projects you are out of luck. More often than not I have seen the code managed very carefully with awesome branch and merge strategies implemented. Whereas all the database scripts simply dumped in a directly called SQL. Or if you are lucky there will be sub directories with stored procedures, tables and functions in their own directories. 


Here's how it goes on a typical release day. The developer hands you a shiny release package with a nice bow on top. "Nice !" You say. Suddenly he reaches in his trench coat and hands you a soggy bundle wrapped hastily in brown paper. "What that ?" you ask. "SQL scripts" comes the simple reply. "Well...what about rollback ?" you ask helplessly. "Restore from backup !" the developer smiles. "But...the database is 3 TerraBytes!" you yell. The developer leaves. 

Just kidding. That has never happened to me. And here is how you can avoid this from happening to you. You see there is a difference between versioning your database scripts and actually versioning the database. By versioning the database you get the ability to apply and rollback changes or "deltas" to your database. The simplest way to do this is , as mentioned above, is to use an ORM. However if you have an already established database and code and switching to ORM is not an option then there are other ways in which you can do this. There are products available on market that can help you version you database. For SQL Server Red Gate offers SQL Source Control that integrates with leading VCS. For MySQL there is dbv.php. Then there is software like Liquibase which is feature rich, free and supports a lot of databases. 

And when all else fails grab a scripting language of your choice and automate it yourself. With a mix of automation and policies you can bring any database under version control. Before you start you will need baseline of the database. A baseline will help you create the initial empty copy of the database on top of which you will apply all your changes. There are two way you can do this. Extract the schema of the database or else maintain a skeleton copy of the database i.e. truncate all the data in all the tables. Personally for established projects I like to create skeleton copies of the databases rather than extracting schemas because sometimes extracting schemas can be quite hairy. And secondly even if you manage to extract schema there is no guarantee you will be able to recreate the whole database from schema. So take it as a pro tip, a zipped up skeleton copy will make your life easier. After than implement some simple policies. All scripts must be checked into scripts directory. Every scripts must have a corresponding valid rollback script checked into rollback directory. Every script must start with a integer which must be the next available integer. So if the last script was called 04_create_blah.sql then the next one should be 05_delete_foo.sql. Now create a table in your database and called it VersionControl which will contain version control meta data. This will include last version applies, the date and time, user name who applied it. Then write your automation script to read from the directory and sort by name, read from the table and figure out the last version and figure out which scripts need to be applies. You can also have your script take parameters for rollback to apply the scripts in reverse. And your have poor man's DB version control. Remember this solution works if your database is not shared by multiple applications. You can always expand on this solution and make a more complex solution to suit your needs. Another problem you will face is very soon your scripts will reach in hundreds at which point you will find it is taking too long for you to apply all the scripts on top of the baseline. At some point you will have to take the baseline again. Just baseline your database, archive your scripts and rollback scripts. This is assuming your data loading scripts were handled else where and this mechanism was handling only DDL scripts. And you are all set again. 

What has been your experience with database version control ? Let me know in comments below. 

Next Article: Database Versioning: Using Liquibase


0 #22 Michael Van Devender 2016-02-17 17:07
If you use Microsoft SQL Server, here's a free SQL Server Management Studio add-in:

VersionSQL adds a "Commit" command to the right-click menu in Management Studio's Object Explorer. Right-click a database and click commit to check in the entire database and all its objects. Or right-click a single stored procedure / view / table / etc. and click commit to check in just that schema object.

No separate program to run, no complicated features, doesn't modify your database. Just a quick way to get your SQL into source control.
+1 #21 Lea 2015-02-27 20:31
At work we use Databazoo Dev Modeler. Not only it lets us version several databases in each project, but it also serves as a complete database IDE for developers.

As for the "full mysqldump" approach, I'd love to see that on 4x100GB databases :D
0 #20 Bjørn 2014-03-26 07:04
Quoting Roman:
What about 'DELETE FROM' SQL commands? How would you rollback that? This is question that my team asking when I try to introduce some db versioning approach.

By creating a temporary table. If I need to preserve data from Table1 in case of rollback I usually create a _20140326_Table 1 copy of either the whole table or the deleted data. In case of rollback the data are copied back and table deleted. If the _yyyymmdd_ tables are not occupying too much space they are left and cleaned up by a cleanup procedure run once a year.
0 #19 Roman 2014-03-24 19:34
What about 'DELETE FROM' SQL commands? How would you rollback that? This is question that my team asking when I try to introduce some db versioning approach.
0 #18 Uri Margalit 2014-03-19 15:14
DB versioning is easy with DBmaestro. Check-Out & Check-In the database objects preventing code overrides, out-of-process changes, out-of--sync between the database and the file-based version control repository.
DBmaestro also integrates with Delphix to branch the database and merge it quickly.
0 #17 J M 2014-03-15 09:50
I have found git to version control Wordpress Development well.

A mysqldump of the Wordpress database is included in each commit.

To restore, drop all tables and run the .sql file.

Diffs accurately show changes in the database between commits.

For enhanced clarity make each table a separate sql file.
0 #16 Kyle Hailey 2014-03-15 07:41
DB versioning is easy with Delphix. Bookmark, rollback, branch a 10TB database in minutes with almost no storage overhead. Works for Oracle, SQL Server, and Postgres.
0 #15 Richard Gardiner 2014-03-15 02:12
+1 for FluentMigrator. It works well. I did add a couple of extensions, the main one being methods to add a basic data dictionary. This enabled the developers to code a database change along the meta data for usage and meaning, all without moving out of the language (c#), and solution they using for the rest of code of their code,
0 #14 Will Sargent 2014-03-15 01:00
I'm surprised no-one has mentioned Tungsten Replicator or Golden Gate yet. They're certainly useful for migrating between different schema versions.

Tungsten Replicator:
Golden Gate:
0 #13 James Linden 2014-03-14 19:42
For MySQL/MariaDB, I generally use my schema generator script to create schemas with triggers to version data on a record level. It originally started as a simple "undo" feature for a custom app, but I ended up using it for pretty much any custom app now. It doesn't have any intrinsic capability to deal with table structure changes, so they have to be handled manually. It understands the concept of a 'user' (ie: a foreign key to a user table), so it also provides a history of record changes tied to that user.

Add comment

Security code