Data in Motion – Importing Data into trellispark

by | Jan 4, 2022 | Dev Team

An important feature of most enterprise applications is the ability to ingest data from multiple sources, using multiple channels, at different frequencies. As any experienced developer will tell you, the cost of migrating legacy data into a new application can be a significant fraction of the total cost of solution development and initial adoption. The trellispark framework significantly mitigates both the effort and risk of this process.

When creating a new enterprise application, we need to be able to import data from one or more legacy applications that are being replaced. The challenge here is that the new data model is generally very different to the legacy data. This means that we have to convert from the old data format to the new data format. In the process, we also need to be able to cleanse the incoming data to ensure that it meets the needs of the new data model. Ideally, we would also want to preserve the legacy data where possible so that we could refer back to it the future for reporting, comparison, and analytics.

We may also need to initially populate the new solution with data from many other sources including CSV files, spreadsheets, etc. These data sets also need to be mapped against the new data model and cleansed during the import process.

During this initial migration process we may need to be dealing with missing data fields, possibly even fields which are mandatory in the new solution. In some cases we can default the incoming data values, in others we really need a person to inspect the incoming data and manually provide the extra data.

The trellispark implementation of Data Agnostic Services makes this data migration task significantly easier than traditional relational approaches. Since trellispark uses a data agnostic records storage mechanism based on XML it is relatively straight forward to take any incoming data set and map it into an XML format. During this mapping process we can add additional code to cleanse the incoming data and resolve inter-record dependencies such as foreign keys for parent-child relationships (one-to-many) and peer relationships (many to one and many to many). The fact that we are using XML also means that we can store additional “legacy” fields that are no longer used by the new solution alongside the fields used by the new data model.

To make the data ingestion process easier, trellispark provides a T-SQL generation mechanism that will create a “migration” table from the definition of each of the record concepts. Using a multi-step process, a developer can copy the required data into the concept migration tables where it can be processed and imported into the primary record storage. The T-SQL generation mechanism also creates a default migration process stored procedure that will actually control the migration and implement any specific business rules and validation checks.

But why stop there?

We don’t just do bulk data migration once. Many solutions require a constant injection of bulk data from other solutions and third parties. Many of the integrations we see between applications use a Extract Transform Load (ETL) mechanism. In the ETL process, a file is received at some pre-determined frequency, validated and then the data is added to the primary record storage. In reality, these integrations can largely be dealt with in the same way as the initial data migration.

For this reason, we extended the design of the migration table framework to include the constant addition of new data arriving by ETL. The Data in Motion components simply need to validate the data source, load the incoming data into the migration table and then let the migration process take over. Now the ETL payload may contain one or more records which may arrive as a file in a file storage, an attachment to an email message, or the payload of any messaging technology/service bus.

The use of the code generation migration table approach based on the data model of the new user experience allows trellispark to significantly minimize the errors, effort, and risks in ingesting data from legacy and external sources.

Future releases of trellispark will include mapping technologies that will simplify the population of the migration tables by mapping common data formats such as spreadsheets, CSVs, other existing database tables.