I have been practicing TDD and BDD for about 2 years now and I am starting to appreciate the benefits of it. Knowing that you have tests that prove your software meets business requirements is very powerful. Knowing that those tests can be run in an instant, at any point in time, relieves a lot of pressure that can occur when making changes to your software.
Whilst implementing TDD has been very beneficial, it hasn't come with it's own set of pain points. Some of the time the testing goes successfully throughout a new project and all is well. Some of the time it starts off really well until we need to start refactoring the project's software or making a significant design change. Some of the time you are making a change in a legacy project and figuring out how to write tests for that project.
In this series of posts, I would like to outline some of the problems encountered when implementing TDD and possible solutions to these problems.
At my current company we have recently switched our development environment to use docker for developing our web application. We use docker-compose for managing our containers and these containers consist of a:
- an nginx container with php 5.6
- redis server
- redis commander
- 5 mysql database containers, with each container containing:
- a script for initialising the database structure
- a script for seeding the database with test data
I would like to note that this project is about 1-2 years old. It has a suite of end to end tests that are run using codeception and phantomjs, all require a database connection and session to run. There are a handful of unit tests. None of these containers are used in production at this moment. Due to the complexity of the system I have to rely on integration and end to end tests.
Running tests that rely on the database can be good way to ensure that your application is functioning for the end user. It ensures the software and data are interacting with each other correctly.
When using databases in each test, the database ideally should be in the exact same state. This is fine if you have a small data set that takes a couple of milli seconds to tear down and seed the database. If it takes longer than a second then your test suite is going to slow right down.
To keep the database in the same state, for each test, you have to do two things:
- Insert test data on a per test basis.
- Reset the database to it's previous state. This must take into account what data was changed.
So how do we ensure the database state stays the same, whilst maintaining state?
The simplest solution would be not to not care if the database state changes. This is not a good idea. Tests can start relying on data that was inserted in previously run tests. When you then change those tests or change the order in which tests are run, other tests may fail. This did happen to tests run in my current company and involved changing multiple tests to ensure all tests passed.
Another solution could be to truncate all the tables, then seed the database after each test. The advantage of this solution is you don't have to care what data was changed during the test, since the database will always revert back to the same state. The disadvantage is if your seed data is too large this may slow down your tests. Another disadvantage is that your database seed scripts need to be accessible from your test suite, and due to my current companies current infrastructure we are unable to do this. We would have to move the seed data scripts to be accessible by the web container.
One more solution would be to manage the test data inside each test. This involves keeping track of changes made to the database during each test, then reverting those changes. The advantage of this is it's quick and it all data is managed within each test which can be easier to keep track of. The disadvantage is it that the system you are testing may perform multiple database queries it can become quite hard to track changes made to the database.
For small test data sets I would recommend the the truncate and seed approach. For most cases it is easier to manage and if you are using in memory database the speed impact shouldn't be too large.
For larger more complex data sets, managing the data sets on a per test basis would speed up the tests and easier test maintenance especially if the data sets are fairly complex. This is the solution we are choosing to implement due to our the complexity of the data that gets inserted into the database.
As with most things, it always depends but I hope that this has helped you decide how to design your own database test suite.