When you have a new format or location for data, you need to support them both for a period. The data formats may be similar but housed in different technologies, or the technology could be the same but the format/schema of the data could be different. The data formats may be identical but you wish to have two servers that update each other. Sometimes this period where both sources will be months or even years, sometimes just a matter of minutes while you update all the servers in your enterprise with the new version, but don’t want any downtime.
The pumping mechanism can be one that you implement yourself or can be based on other technologies. In the simplest case, you have two tables that both contain the data in question; you put triggers on both tables to update the other side. In a more complex case, you have a file repository of data that needs to be kept in sync with a database.
In either case, you can adopt an event-driven or polling based strategy.
- you don’t want to, or can’t, remove all uses of a datasource in a BigBang deployment
- you are unsure of the new system and want to run in parallel for a while
Don’t use when
- the changes in data are radical or conceptual; for instance, if you have a one-to-one relationship between entities in the old model, and in the new model you also permit one-to-many relationships you won’t be able to pump that data both ways
- it is very important to get the latest data and you can’t transactionally update both stores.
- you have more than two sources of data; trying to do a multi-master merge is very hard and would require many constraints like a low number of updates and a globally unique way of identifying changes (i.e., GUIDs or a unique ID service). This would almost certianly also require changes to both systems which is what you are trying to avoid.
- you won’t be able to kill the old system; if you can’t then you will have three things to maintain: the old, the new and the pump that shackles the new system to the old and holds it back!
Most of my experiences are when both of these sources are databases and I’m refactoring the database to a new schema. Sometimes it isn’t performant enough to use a view if the changes to the schema are deep. For instance, if the new schema requires a cursor, a CROSS APPLY call or any other subquery that executes once per row, then performance may be an issue, especially if people perceive that “the old system was faster”.
I’ve also used this when making a transition from a 2-tier to a 3-tier stack. Some callers are going right into the database and some are looking at the application tier that uses another schema but they both have to stay in sync. This was a bad one that was done with a combination of handwritten sprocs and triggers that had a lot of special cases to cope with the weirdnesses of the old system.
This is one that isn’t nice, but is often a must have.
If the data formats are very different but both in databases, you may find that datawarehouse strategies serve well and both can be reading and writing to a common repository.