Problems Encountered with TDD and Possible Solutions - Archiving using Database Triggers

How many times have you needed to archive a piece of information, say sales quotes, without completely getting rid of it? You don't want a sales dashboard to be cluttered with old quotes but at the same time, you will want to run reports on how many sales quotes get converted on a regular basis. So you have to archive this info, but how?

Possible Solutions

Archive Field

On the quotes table you could just add a simple boolean field named archive with a default value of 0. You update all places in your codebase that retrieve quotes and ensure that only quotes with archive set to 0 are retrieved. The pros of doing it this way is it is pretty simple. If you have good separation between your database layer and application layer, then you should only have to filter the quotes in the persistence layer.

Archive Date Field

Another possibility is to add a deleted_at column instead of an archive column. This would be identical to an archive column with the exception of also storing the date in which the record was archived. Again it is a pretty simple solution and it has the benefit of giving you the date the record was archived.

Let me point out an issue with using archive and deleted_at columns. Assume you have a employee table with the following schema: - id (auto increment) - username (unique) - email (unique)

An employee moves from one company to another which requires the employee to be archived. Note that in this example we don't want any of the employees history or relationships to be transferred to the their new company, hence why the employee needs to be archived. The employee is archived meaning the archive or deleted_at column is updated. Now the employee tries to sign up to a new company. Do you see the issue? The employee can't sign up with the same email address due to the unique email address constraint*.

*I would like to point out that one way of fixing this issue could be to remove the unique constraint from the email field. I don't think this is a good solution because it means one person could sign up multiple times with the same email address.

Another issue is that if your application layer and persistence layer does not have good separation of concerns, then you will have to add the archive and deleted_at filter in multiple places. You also need to remember to add this filter when creating new queries.

Archive Table using Triggers

Triggers are a useful way of managing databases. They allow you to maintain the integrity of the data being stored without having to write any application layer code. They are also very good at creating audit trails. If you have ever archived a record by setting an archive field to 1 or saving a Datetime to a deleted_at, then you will understand that the application code needs to be written for filtering out the archived records. Using triggers we can instead save the record into a identical archive table before we delete it. No filtering needed. You can still access the records by retrieving them from the archived table. One big plus of using triggers is that you now have an audit trail that can easily be queried.

There are some downsides to using triggers in this scenario. One is that it is not clear from the application layer that triggers are present. To solve this issue, in your persistence layer you can comment saying that any deletes will fire a trigger. Another downside is that you are now tying your implementation to the database.

How to test database triggers

Database triggers can be tested like any other integration test that interacts with the database, using the Arrange Act Assert testing pattern. Arrange in this scenario means setting up the database state so that the trigger can be fired. Act would be executing an Insert, Update or Delete statement that would cause the database trigger to be fired. Assert would be to check that the database trigger has inserted, updated or deleted the data it was supposed to.

Tags: testing, database, TDD, BDD, mysql, triggers

Problems Encountered with TDD and Possible Solutions - Databases

I have been practicing TDD and BDD for about 2 years now and I am starting to appreciate the benefits of it. Knowing that you have tests that prove your software meets business requirements is very powerful. Knowing that those tests can be run in an instant, at any point in time, relieves a lot of pressure that can occur when making changes to your software.

Whilst implementing TDD has been very beneficial, it hasn't come with it's own set of pain points. Some of the time the testing goes successfully throughout a new project and all is well. Some of the time it starts off really well until we need to start refactoring the project's software or making a significant design change. Some of the time you are making a change in a legacy project and figuring out how to write tests for that project.

In this series of posts, I would like to outline some of the problems encountered when implementing TDD and possible solutions to these problems.

Development Infrastructure

At my current company we have recently switched our development environment to use docker for developing our web application. We use docker-compose for managing our containers and these containers consist of a:

  • an nginx container with php 5.6
  • redis server
  • redis commander
  • memcached
  • 5 mysql database containers, with each container containing:
    • a script for initialising the database structure
    • a script for seeding the database with test data

I would like to note that this project is about 1-2 years old. It has a suite of end to end tests that are run using codeception and phantomjs, all require a database connection and session to run. There are a handful of unit tests. None of these containers are used in production at this moment. Due to the complexity of the system I have to rely on integration and end to end tests.

The Problem

Running tests that rely on the database can be good way to ensure that your application is functioning for the end user. It ensures the software and data are interacting with each other correctly.

When using databases in each test, the database ideally should be in the exact same state. This is fine if you have a small data set that takes a couple of milli seconds to tear down and seed the database. If it takes longer than a second then your test suite is going to slow right down.

To keep the database in the same state, for each test, you have to do two things:

  1. Insert test data on a per test basis.
  2. Reset the database to it's previous state. This must take into account what data was changed.

So how do we ensure the database state stays the same, whilst maintaining state?

Possible Solutions

The simplest solution would be not to not care if the database state changes. This is not a good idea. Tests can start relying on data that was inserted in previously run tests. When you then change those tests or change the order in which tests are run, other tests may fail. This did happen to tests run in my current company and involved changing multiple tests to ensure all tests passed.

Another solution could be to truncate all the tables, then seed the database after each test. The advantage of this solution is you don't have to care what data was changed during the test, since the database will always revert back to the same state. The disadvantage is if your seed data is too large this may slow down your tests. Another disadvantage is that your database seed scripts need to be accessible from your test suite, and due to my current companies current infrastructure we are unable to do this. We would have to move the seed data scripts to be accessible by the web container.

One more solution would be to manage the test data inside each test. This involves keeping track of changes made to the database during each test, then reverting those changes. The advantage of this is it's quick and it all data is managed within each test which can be easier to keep track of. The disadvantage is it that the system you are testing may perform multiple database queries it can become quite hard to track changes made to the database.

For small test data sets I would recommend the the truncate and seed approach. For most cases it is easier to manage and if you are using in memory database the speed impact shouldn't be too large.

For larger more complex data sets, managing the data sets on a per test basis would speed up the tests and easier test maintenance especially if the data sets are fairly complex. This is the solution we are choosing to implement due to our the complexity of the data that gets inserted into the database.

As with most things, it always depends but I hope that this has helped you decide how to design your own database test suite.

Tags: testing, database, TDD, BDD