Data Agnostic Services
One of the first things we think of when looking at business requirements is “what data are we going to operate on and what tables/attributes do we need to create?”.
Understanding our data requirements is necessary and forms the core of Domain Driven Design. The “issue” is the way we leap to the conclusion that we need to create tables/attributes to store our data.
We have been told for decades that we should use techniques like XML to pass data around our enterprise solutions. So, what if we also stored the data in XML format?
Consider our previous example of an Insurance Policy service. Instead of creating separate tables for each of our concepts, we create one table that contains a single XML data field to store our data and a set of meta-data fields that describe what that data is and where it resides with respect to other data records. Using this approach, we are still able to persist our data, but we have significantly reduced the number of stored procedures and business object functions that we need to maintain. By centralising data access we can also extend the CRUD functionality into the CRUD+ realm by adding: version control; access control; and auditing.
Create, Read, Update and Delete (CRUD) Model
So, let’s start with the simplest possible model. We need a single table called “Instances” that will contain our data agnostic XML fragment and some metadata that describes it. We need a few simple stored procedures:
- SaveInstance – combines Create and Update
- DeleteInstance – logical deletion only
The next quest is “what metadata are we taking about?”
In answer to that question we are going to need at least the following:
- InstanceGUID – a unique identifier for each row of data which becomes the primary key for the instance table
- InstanceName– a human readable name for the data so the user knows what they are selecting
- ConceptGUID – determines how the data should be represented (Policy, Coverage, Exclusion, etc)
Parent – Child Relationship
Now we should consider some of the key structural relationships between records within our databases. One of the big ones is the relationship between a parent record and its children.
In our Insurance Policy example, we can consider the Policy to be a parent record for the following child record types:
We could add an extra field called ParentGUID to our metadata that tracks the relationships between the child record and its parent. We could then simply ask for all Coverages for a specified Policy by providing a Policy InstanceGUID and the ConceptGUID of a Coverage.
An obvious database performance optimization is then to create a secondary index on ParentGUID and ConceptGUID.
When you look at how most users navigate a data model through their user experience most of the requests for data are for either a record using a known InstanceGUID, or for a group of child records when you have the ParentGUID and ConceptGUID. This make basic navigation of the records very efficient. There are lots of other search criteria that will ultimately need to be provided of course and we discuss those in the Command Query Responsibility Segregation (CQRS) model section later in this chapter.
Versioning and Audit Trail
Increasingly, we are being asked to add versioning of our data and audit trails. In a traditional approach we would end up creating a history table for each of our database tables. So, for the Policy table there would be a Policy History table, for the Coverage table we would have a Coverage History table, and so on...
Using our data agnostic Instances table as a base we can now extend our model to include an InstanceHistory table.
Now whenever we write a record to the Instances table using our SaveInstance stored procedure, we also write a copy of the data to the InstanceHistory table.
The basic idea is that we only keep the latest version of the record in the Instances table, and all versions are available in the InstanceHistory table.
Experience with adding audit trails in real systems has led us to expand the metadata to include the following fields:
- DataVersion – A sequential numeric field for the version number starting at 1
- CreatedBy/On – a timestamp for when the first version of the record was created and who created it
- InsertedBy/On – a timestamp for when this version of the record was added and who added it
- DeletedBy/On – a timestamp for when the record was logically deleted and who deleted it
Another common requirement for new solutions is the ability to have row level security built in so that visibility of records can be limited to specific users or groups of users. Again, with our traditional approach we would need to add some form of mapping table for each concept to be secured along with associated stored procedures, business objects and user experience to maintain it.
With our data agnostic approach, we can simply add an InstanceAccess table.
We would make the InstanceGUID and UserGroupGUID the table primary key and immediately we can secure any record in the database to any set of users and groups using a single user experience. We can now validate any usage of the ReadInstance stored procedure to ensure that the requesting user has access to the record.
One of the major benefits of the data agnostic approach to building database repositories is our ability to go way beyond basic CRUD functionality to CRUD+ with very little software. Because we are only operating with a single Instance table, our functionality is immediately available to all types of data within the solution. In the above sections we have added row-level version controls and row-level access control with a few lines of T-SQL in our database stored procedures.
Using a data agnostic storage approach means that we can now also extend our data models indefinitely to provide more functionality to the enterprise with adding more software. Essentially, we are getting more of our asset (functionality) within incurring additional liability (software).
Command Query Responsibility Segregation (CQRS) Model
If we were just navigating up and down a parent-child hierarchy of records we would be pretty much done with our database design. However, we do sometimes need to search through the records to find relevant results.
For example, let’s consider the simple request for all Home Insurance Policies in London that begin in June. The fact that we want Policies limits the search to just Policy Instances, but if we had to shred the XML to figure out which one were for London Homes that begin in June we would be waiting a long time for the result – its is doable but VERY slow!
How would we solve this problem in a traditional approach? We would probably add a secondary index to the Policies table to optimize the query performance for Policy Type, AddressCity and EffectiveStartDate. Obviously, this is not easy to do if we just have an Instances table with and XML field!
We can solve the problem by taking a related CQRS model approach of using a read-only lookup table. From past experience on many projects using traditional database design approaches, we have found that adding optimized read-only look up tables is an excellent method for improving performance. We can do the same here by simply adding a database trigger to our Instances table and updating a set of read-only lookup tables to mirror the data we need to search and filter by.
Using the database trigger approach is going to add a minimal amount of time to the initial data insert/update, but having an optimized look up table has a big impact on the performance of the searches we need to perform. The look up tables only need the set of fields that are going to be searched and not all fields from the record, they can also be indexed in multiple ways to serve many different search requests.
Creation of the read-only look up tables and the associated triggers are excellent candidates for simple code generation techniques.
Doesn’t putting all our data into a single repository create a single point of failure for our solution? If we were putting a database in a physical disk on a physical server in an office location the answer is yes it does. But if we were doing that then you would likely have a lot of single points of failure in your solution!
With modern database software we have many ways of creating highly available database services. Using Microsoft SQL Server as a database server for example we could use:
- Clustering of database servers
- Always On Availability Groups
- Azure SQL as a PaaS
There are many choices for building robust solutions with excellent availability characteristics.
A key advantage to having a single DaaS Data at Rest service is that you only have a single repository to protect. You don’t end up with your data spread over multiple services each of which has its own data repository. You can backup a single repository and know that your data is consistent across the solution instead of having to backup multiple database across multiple services and not knowing the consistency of the solution data if you had to execute a restore after a failure.
There must be a limit to the amount of data you can put in a single Data at Rest repository? Not really – the Data at Rest repository is a logical construct. The actual implementation of the underlying database storage can be anything. For example, we could shard the Instances table by ConceptGUID and create separate tables for different sets of Concepts. We could even partition Instances of the same Concept using the InstanceGUID across multiple file/disk sets or even separate databases/servers. Careful use of the existing database technology enables a fully horizontal scaling model for our Data at Rest service. The excellent database server software we have at our disposal today means that we can utilize a single database server for most SME solutions and only need to consider scaling options for very large enterprise solutions.
In contrast, consider the issues you would have trying to scale a traditional database design as you hit a single database server scale limit. You would have to break up the relational data model across multiple databases and then figure out data synchronization and optimization.
A critical success factor for modern business architecture is sustainability – we need to be able to maintain existing functionality and rapidly adapt to changing business needs without incurring significant costs or risks.
Using a data agnostic approach for our Data at Rest repository means that we can easily make changes to our underlying data models. For example, we can easily:
- Add or remove fields – its just XML!
- Add or remove whole Concepts
- Restructure how Concepts inter-relate
- Reuse existing Concepts in multiple places
- Add new functionality to extend our CRUD+
- Change our underlying database storage technology if more effective and efficient solutions become available
Maintaining a strict separation of concerns means that we can make changes to how our data is stored without impacting how it is shared (in motion), used (in action) or presented.
Onboarding new resources is also easy – we don’t need to spend weeks ramping up a new hire trying to understand the data model and relationships. We only have a few simple tables and stored procedures to understand with an architecture that can be fully articulated on a few pages and presented in a single workshop.