By Michael Pansky, Chief Product Officer
If your organization is planning a data update or migration in the near future, stay tuned to Intellibridge’s new series “Data Migration Strategies” to receive the latest technological tips for large data projects. This week’s topic examines the primary causes of data loss during a migration as part of ETL process, with in-depth advice on how to avoid losing data and the tools available that will help ensure your migration project is a success.
Data loss is an inherent risk even in the simplest of migrations, making larger data migrations a much riskier endeavor. Migrations in modern-day organizations are increasingly becoming more complex not only due to the volume of information being stored, but also due to the spider web of network architectures. As organizations grow and expand, it’s common to have multiple legacy data sources interacting with newer data lakes. To complicate things further, often these sources may span across varying classification environments and a combination of hybrid deployments with both on-prem, cloud-based platforms.
The goal is to ensure all of this data, from the variety of potential sources and locations, are clean (not corrupt or unwanted), consistent (no data drift) and kept (not dropped or lost). Let’s take a look at what causes data loss. There are a number of factors to consider:
- Field mapping incompatibilities
- Automatic truncation of longer values
- Unknown validation rules or settings
- System or network interference
Each of these factors should be reviewed during the project planning phase, and we’ll delve into each one more below.
Data mapping incompatibilities
During the initial data migration planning, each table and field are mapped to a field in the target data source. It is crucial during this process to validate the actual data types between each field for any data incompatibilities. Importing a string value into a date-formatted field, for example, may result in data loss.
Automatic truncation of longer values
To avoid data loss by automatic truncation, verify that each field in the target system has the appropriate character length to accept the incoming data values. If a field has a character limit of 50, and the data being imported has longer strings, then anything beyond the 50th character will be truncated and dropped. One strategy is to extend the character limit to the maximum value for every field, and then adjust the character limits as necessary once the data migration has been completed.
Unknown validation rules or settings
Unknown rules or improperly setting validation rules on the target system, before the data is migrated, can lead to data loss. Review the default settings that the target data source has automatically implemented and adjust these settings as needed to avoid losing any data.
System or network interference
Data loss can also occur if designated tables, rows, or fields are locked by any system or user permissions that are currently in use, as these tables may be skipped. Verify that automatically scheduled jobs are not set to run, and that no users are updating the data content.
Ensuring data integrity during a migration is of the utmost importance, and minimizing potential data loss or corruption is a key to the project’s overall success. There are a number of tools available to help avoid and even eliminate data loss, including IntelliBridge’s TorqFlow.
TorqFlow is a low-code External Data Platform that helps companies scale their data intake while reducing costs. For engineers attempting to avoid data loss, say goodbye to custom solutions, long dev cycles, and manual data cleanup. Scale and accelerate your external data integrations with less engineering resources. Learn more about how TorqFlow can save you time and resources and contacting our team with any questions.