Data QA sucks. Here’s why — and how — you should do it anyway.

If you’re reading this, you probably don’t need to be convinced that manual QA is a real pain — it’s slow, it’s tedious, and problems usually slip through the cracks despite your best efforts. Plus, even if you catch and fix every single mistake today, chances are good that tomorrow’s first code push will render all your hard work irrelevant. 

Various wires and electronics on a table.

You might also know, or at least sense, that manual QA is a huge time and money sink. Studies show that knowledge workers waste a whopping 50 percent of their time “hunting for data, identifying and correcting errors, and seeking confirmatory sources for data they do not trust”. If we zero in on data scientists, the picture gets even bleaker — sixty percent of data scientists spend “most of their time” cleaning and labeling data, and with the majority of those folks earning six-figure salaries, that makes for a pretty big QA bill. 

Unfortunately, though, skimping on data QA is even more expensive. Organizations included in one Gartner study estimated that, on average, poor data quality costs them about $15 million per year. Eighty-four percent of CEOs also say that they’re “concerned about the quality of the data they’re basing their decisions on” — and that uncertainty can lead to lost opportunities, even if the data is ultimately sound. 

Is it possible to avoid the costs of manual QA without suffering the consequences of bad data? Fortunately, it is. The key is to use manual testing only for a few very specific tasks, and to invest in automation for everything else. 

When does manual QA make sense?

If you want to keep your QA costs down, you need to save manual QA for special cases. In general, you should only have a human validating data if:

  • You’re collecting ad-hoc or one-off data. If you aren’t tracking a metric or event on an ongoing basis, it might be more efficient to QA a small dataset manually (if your one-off involves a large amount of data, though, you’ll probably still want to set up some automated tests).
  • You’re implementing something brand-new. It can be helpful to do a little bit of manual QA when you first start measuring something; reviewing incoming data directly can help you understand where issues are likely to crop up, and why, so you can be sure you’re checking for the right things as you design your automated tests. 
  • You’re testing for something subjective. This is more common with software QA that includes questions around design and user experience; it’s rarely the case with data, which is usually either present or not, formatted correctly or not, etc. However, you may occasionally need a more subjective review to ensure that the way you’re labeling and presenting data is aligned with your stakeholders’ expectations. 

When should you automate data validation?

In short, if you’re not in one of the situations described above, you probably can (and should) rely primarily on automation. Automated QA is definitely a must for:

  • Regression testing to confirm that new code changes aren’t affecting your existing analytics tags or other data collection and storage infrastructure
  • Repetitive testing, i.e., any test that needs to be performed at regular, frequent intervals to guard against third-party changes or other factors outside your control
  • Data-driven testing to ensure that you’re correctly handling a wide range of possible user inputs and actions

For most businesses, a good automated test setup is really the only way to get accurate, up-to-date answers to mission-critical questions like: 

  • Are the analytics tags on my website (still) firing properly?
  • Am I capturing the key conversion events in my sales funnel?
  • Is my data layer present on all pages of my website?
  • Are key variables being set properly on specific URLs?

Even if you aren’t yet in a position to invest in automating all of your data validation, sparing your staff the burden of regression, repetitive, and data-driven testing will yield huge gains in speed and accuracy — which means more time for actually analyzing data and growing your business. 

DataTrue is an international leader in automated data assurance solutions. To learn more about how we can help your business achieve higher data quality without increasing your QA costs, schedule an intro call or start a free 30-day trial

Related Blogs

Automated data layer testing
The data layer is generally regarded as the best practice for structuring how data is transferred from your website to your tag manager, and ultimately your tags. Adding a data...
Bad Data Mistakes post image
Ten Most Common Bad Data Mistakes
Hi, in this blog post we are going to talk about some of the most common mistakes that can lead to bad data or a poor analysis of that data. We...
Surfstich tag migration case study feature image
Surfstitch experiences a seamless tag migration with Panalysis and DataTrue
Surfstitch.com, one of Australia’s largest online retailers and the world’s largest online action sports and youth apparel network experienced a seamless tag migration with the help of DataTrue partner, Panalysis....