Couple of weeks back I was having conversation with a fellow CTO, he was demonstrating analytics product to me. There were many instances in the demo (dashboards/reports) where there was lot of dimensional data were missing (for example industry verticals, Product category etc…). Obviously during the course of the demo, discussion around data quality broke out. Fellow CTO mentioned that they do not encourage customers to spend time and energy to fix data quality issues from the analytics prospective if numbers around data quality issues represent less than 1% of overall $ numbers. I kind of agreed with his argument and justification (again from directional analytics prospective) around not fixing these data quality issues because:
1) These issues do not interfere with analysis if analysis hinges upon directionality of the business.
2) ROI from fixing these issues is not significant as the data represented by these issues will have less than 1% impact on the directionality of the analysis (which is statistically insignificant).
This got me thinking that data quality is truly multi-dimensional problem (like the story about an elephant in the room and blindfolded men describing the elephant, everyone concludes it as a different object even though everyone is feeling the elephant). As data quality professionals, it is important for all of us to bring that prospective in any data quality initiative. Best way to doing this would be to build a data quality score card with the quality assessment and its impact on the context in which data will be used. This type of score card can and should be used in prioritizing fixing of data quality issues. This will also help in justifying ROI of the data quality issues.
As indicated in chart, each context is analyzed from the prospective of data quality attributes. Each context is given Red, Green or Yellow indicator. Obviously any red indicators need to be addressed before data can be used in that context. In this example, it helps to demonstrate that compliance reporting requirements cannot be met, until data quality issues associated with credit ratings, address data quality are completely resolved. This helps with demonstrating the need, necessary ROI and helps in prioritization of which attributes to be addressed first.
I would love to hear from you as to how did you prioritize and justified quality imitative, what tools/techniques you used?