We have all been in the situation where applications/software we implement gets shot at because of the poor data quality which gets shown to the end users through those applications. In most situations it is really easy to shoot the messenger (Application/software in this case) rather than address the root cause behind the message (poor data quality). Time in and time again we have experienced shooting of the messenger by everyone in the business (because it is easy to do but really not good for the outcome it brings about for the business). In this discussion, I am going to talk about “what to do to avoid getting into trap of data quality issues when BI projects/initiatives are undertaken?”
BI (I am using BI or Business Intelligence as term in a broad sense to represent all reporting, data warehousing/datamarting or analytics initiatives in the organization) is often a catalyst in bringing data /process quality issues to forefront really fast and easily. Given the visibility and pervasive value BI brings to the organizational informational needs, in cases when data quality issues are brought to forefront, suddenly debate ensues if the focus should solely be shifted on fixing data quality issues and putting BI initiatives on back burner until data quality issues are fixed. This approach/organizational attitude(I have experienced this more so in SMB customers) has some risks/issues associated with it (as mentioned below), and BI teams should avoid this trap by being proactive.
1. Belief that data quality can be completely fixed, creates unreasonable expectations and sets the whole team (which is likely to work on fixing data quality) up for failure. Data quality issues will exist as long as data is created and used by people. Citing data quality as a reason to defer BI implementation will deprive organization of the Informational needs which could be met by BI. Remediation of data quality is an iterative process (and not a one shot deal) and this is discussed in his recent post by Jim Harris in “Missed It By That Much”
2. BI initiatives will always highlight possible data quality and process issues, making data quality as one of the major contingencies to derive value from BI initiatives is short sightedness (it is not as black and white as it is made to sound by people looking at what is wrong because of data quality issues) and will not help organizations derive proper ROI and edge from the BI investments.
So what do you do when data quality issues threaten to derail the BI initiatives?
Proactive: BI initiatives are driven by organizations for the insight they offer into the data for making operational as well as strategic decisions. Because BI initiatives make it very easy to look at both aggregated and detail level data across the organization, they often highlight many data quality and process related issues within the business. It is really important to highlight this side benefit of BI implementation to all stakeholders’ right from beginning and all throughout BI initiative:
1. BI initiatives will make it very easy to spot data quality issues, use artifacts from BI initiatives as your microscope to find data quality issues.
2. They will highlight process issues by showing gaps or lack of relationships in data
3. BI initiatives can be and should be used to benchmark and monitor ongoing data quality and enforcement of business processes.
In fact, including these benefits of BI into the ROI calculation will ensure that
1. When data quality issues are found, organization will not be surprised. It’s almost like they are expecting BI initiative to highlight Data/Process quality issues.
2. BI initiatives will get credit (towards ROI) when these data quality issues are found and monitored.
3. Artifacts from BI (Like reporting or data warehouse, data mart, Single view of data) will now have official role to play in overall data quality measurement and monitoring imitative
What can BI teams do proactively to address data quality issues before they snow ball into show stoppers for BI initiatives?

Process for Identifying Data Quality Issues before BI Implementations
1. Prioritize functional/business areas for which BI is to be implemented, if this is already done, stick with existing priorities.
2. Distill down those priorities in to set of business questions(10-20 for each requirement to cover all bases) which BI initiatives will answer. These questions will have to be developed in partnerships with business users
For example: After implementing data mart for pipeline analysis, marketing department should be able to do competitive analysis to understand which competitor is being run into most of the time. Allow trending and analysis of this information by all aspects of the pipeline (Time, Sales Organizations, Regions, Verticals etc…)
3. Itemize data which will be required to answer/fulfill business question. For example in above scenario following data should be available.
a. Pipeline/opportunity records data with competitor names captured
b. Opportunity data where sales organization is captured on it etc….
4. Now that you are aware of the data groups required to support the business question, do profiling of those data groups: In above example, you want to look at competitor field, profile that field in light of the question and understand
a. Density: How often is competitor and win/loss field captured? (e.g. only 10%)
b. Accuracy/Validity: How many instances of competitor field are usable Vs useless? (Out of 10% from the above, 70% are usable)
c. Enrichment effort: How many instance of competitor field needs cleaning? (Out of 70% from above, we need to clean up almost 50%)
d. Recency: What is the trend (recency) of entering this information (May be sales started entering this data recently, for last two quarters you had 80% of transactions with this information, so if focus is for last two quarters, reasonable amount of information is available for answering question above)
5. Once equipped with this information build the scorecard for accuracy of the data and validity of information/insight which can be gleaned from this data in the context of question…
6. So to come down to making decision about what BI team should do next, one will need to build necessary context around accuracy of information Vs. observed data quality to provide proper idea about what business users can expect to get out of this initiative if continued as is….In above example, some of the metrics which can be defined could be
i. If all of available data is looked at for analyzing competitive information only 10% of data (at best) would have any information. So if you do not put any constraint on the time frame, you are going to get insight into only 10% worth of business at best. This will require some effort on the part of parties involved to clean the data; otherwise this percentage will go further down.
ii. If you look at only last two quarters and information going forward, best you can get visibility is about 80% (provided data quality/cleanliness issues are addressed otherwise it will be less than that) of your business.

Sample of data quality score card
iii. If business decides to go ahead with the option (ii) above, an agreement/plan needs to be reached to have marketing/sales department implement the process to enter competitive information on each deal (going forward) as well as BI needs to commit to implement metrics/reports/dashboard to track this as one of the indicators and provide comparison of data in percentages as well as absolute numbers (Sample shown in the data quality score card image) to show ongoing progress or lack there off.
iv. If option 2 above is not acceptable to the business users, BI initiative needs to implement metrics to track the current data quality and ongoing data quality as a first step towards continuing BI initiative. This will help in motivating marketing/sales department to make necessary investments into fixing data quality (at least going forward). This won’t totally derail the BI initiative. After implementing the quality measurement initiative, BI team can still go ahead and implement the rest of the solution and engage all parties to start working on data quality (where ever applicable). This way when data quality goals are achieved by business or better understood by business they can start using BI implementation for competitive analysis with proper context of data quality in mind (I can use all of the data if I am doing competitive analysis for Printers and it will be roughly about 60% complete across all time frames, but for other products, I am better off looking at YTD data)
Doing above exercise will help in avoiding any surprises resulting out of visibility into data quality issues during BI implementations. This will also help in positioning BI as critical and required partner in enterprise data quality initiative. With some efforts, planning (and a little bit of luck), you can both save the messenger and also will allow business to attack the message.
Read Full Post »