Archive for July, 2009

Part I

Many of us have been using Agile methodologies for doing product development or for doing IT Projects very successfully. I have noticed that Agile methodologies are very well suited for addressing enterprise data and information quality (EDIQ) initiatives. I almost feel like Agile and Enterprise data quality (addressing of) was a match made in heaven.

Let us inspect key tenets of the Agile methodologies and relate those to what one has to go through in addressing enterprise data/information quality issues (EDIQ).

  1. Active User involvement (Collaborative and Cooperative approach): Fixing/addressing data quality issues has to be done in collaboration with the data stake holders, business users. Creating a virtual team where business, IT, data owners participate is critical for the success of data quality initiatives. While IT provides necessary fire power in terms of technology and means to correct data quality, ultimately it’s the business owners/data owners who will decide the actual fixes.
  2. Team empowerment for making and implementing decisions: Executive sponsorship and empowerment of the team working on data quality are key components of a successful enterprise data/information quality initiative. Teams should be empowered to make necessary fixes to the data and the processes. They should also be empowered to do enforcement/ implementation of these newly defined/refined processes for addressing immediate data quality and ensuring ongoing data quality standard is met.
  3. Incremental, small releases and iterate: As we know, big bang, fix it all approach for addressing data quality does not work.  In order to address data quality realistically, incremental approach with iterative correction is the best way to go. This has been discussed in couple of recent article in “ Missed it by that much” by Jim Harris,  and in my own article
  4. Time boxing the activity: Requirements for data quality evolve. Usually scope of activities will expand as team starts working on data quality initiative. Key to success is to chunk the work and demonstrate incremental improvements in a time boxed fashion. This almost always forces data quality teams to prioritize the data quality issues and then address them in the priority order (this really helps in optimally deploy resources of the organization to get biggest bang for the buck)
  5. Testing/Validation is integrated in the process and it is done early and often: This is my favorite. Many times data quality is addressed by first fixing the data itself in environments like data warehouse/marts or alternate repositories for immediate impact on business initiatives. Testing these fixes for their accuracies and validating its impact will provide for a framework as to how you ultimately want data quality issues fixed (What additional process you might want to inject, what additional checks you might want to do at the source system side, what are the patterns you are seeing in the data etc…). Early testing/validation will create immediate impact with business initiatives and business side will be more inclined to invest and dedicate resources in addressing data quality on ongoing basis.
  6. Transparency and Visibility: All throughout the work one does for fixing data quality, it is extremely important to provide clear visibility into the issues and impact of the data quality, efforts and commitments it will take to fixing data quality and the ongoing progress made towards achieving business goals about data and information quality. Maintaining a scorecard for all data quality fixes and showing trend of improvements is a good way to provide visibility into improvements done in enterprise data quality. I had discussed this in my last article and here is a sample scorecard.

There are many other aspects of Agile methodologies which are applicable to the enterprise data quality initiatives

a)     Like capturing requirements at high level and then elaborating those in visual formats in a team setting

b)    Don’t expect that you are going to build a perfect solution in first iteration

c)     Completing task on hand before taking up another task

d)    Clear definition of what a task “completion” means etc…

In summary, I really feel that Agile methodologies can be easily adapted and used in implementing enterprise data/information quality initiatives. Use of Agile methodologies in  will ensure higher and quick success. They are like a perfect couple “Made for each other”

In my next post, I will take real life example and compare and contrast actual artifacts of Agile methodologies with the artifacts which are required to be created for enterprise data/information quality (EDIQ) initiatives.

Resources: There are several sites about Agile methodologies, I really like couple of them

  1. Manifesto for Agile Software Development
  2. There is a nice book “An Agile War Story: Scrum and XP from the trenches by Henrik Kniberg
  3. Agile Project Management Blog

Read Full Post »

How many times in the careers of data quality professionals, they are called upon when data quality has become pervasive issue in the organization? I always wonder as to why it has to be this way? Why can’t data quality be considered, injected into any business initiatives before it becomes such a big issue?

I personally think there is a hope and there is a way around it. We (data quality, integration, applications, IT professionals) just have to make sure that data quality becomes one of the most critical initiatives on the radars of executive management and it needs to be championed by CIOs. IT can act as enablers, evangelist in the whole process.

In a recent interview of Jim Harrisconducted by Ajay Ohri,Jim gives an example where in a financial services company had a critical data source in the form of applications received through mail or over the phone. All the data entry personnel were compensated on how many applications they enter in a given time frame, whereas most critical information for the financial institution was correct social security number (there lays the issue). As Jim explains in his findings, when ever social security number was not present or was not legible, data entry personnel entered their own social security number to proceed with data entry operations. This clearly creates for a very low value data base for this financial services company.

Had the IT/CIO data quality evangelist participated in the process of capturing this information and if they had asked some critical/right questions about the intent and usage of capturing this information (e.g. what is critical and most important information in this application form? What happens if the information is not accurate? Are there ways to validate the information being entered), one could have easily

  1. Put some checks around social security number to be valid (cross ref with existing customers etc….)
  2. Encouraged reward model to the team based on accurate information/valid data quality of number of applicants rather than just number of applicants. (Aligning goals of organization through correct data quality and right rewards model)
  3. Create a process where in doubt transactions/records can be triaged through and corrected before accepted and fully complete records etc….

Again my intent is not to pick on one instance or industry. Point I am making here is that Data Quality/Information Quality should always be injected in any new/existing initiatives at the front end rather than back end of the business process. Data quality and how it will be ensured becomes one of the input/drivers in implementing any new business initiatives (it would go side by side with the business objectives). Also consideration of data quality is not a just technology issue but it is a business issue (and hence in above example, I suggest that questions/suggestions from Data quality evangelist could influence how team is compensated)

I know that this discussion would automatically lead into data/information governance and management but there are really small steps any IT organization (CIOs) could take to fix this issue incrementally. Create a role whose (one of the many) responsibility is to make sure that

  1. Any business process/initiative which captures/modifies data has a set of requirements around intended use of the information and assumptions around what that information will be
  2. Outline what data quality/validity checks be performed in a mandatory fashion to achieve clean data for the purpose/intention of the business use
  3. Create a monitoring system to ensure that outlines created for validating data quality are being implemented and are being worked on.
  4. Evangelize importance of correct data and correlation between data quality and information quality.

I would love to get your thoughts on this topic… I think that prevention is better than cure (while we cannot always prevent, we can try) and time has come (given the explosion of data and emphasize on objective decisions based on data rather than gut feel) for all of us to start pushing for data quality at the front end of the process.

Read Full Post »

We have all been in the situation where applications/software we implement gets shot at because of the poor data quality which gets shown to the end users through those applications. In most situations it is really easy to shoot the messenger (Application/software in this case) rather than address the root cause behind the message (poor data quality). Time in and time again we have experienced shooting of the messenger by everyone in the business (because it is easy to do but really not good for the outcome it brings about for the business). In this discussion, I am going to talk about “what to do to avoid getting into trap of data quality issues when BI projects/initiatives are undertaken?”

BI (I am using BI or Business Intelligence as term in a broad sense to represent all reporting, data warehousing/datamarting or analytics initiatives in the organization) is often a catalyst in bringing data /process quality issues to forefront really fast and easily. Given the visibility and pervasive value BI brings to the organizational informational needs, in cases when data quality issues are brought to forefront, suddenly debate ensues if the focus should solely be shifted on fixing data quality issues and putting BI initiatives on back burner until data quality issues are fixed. This approach/organizational attitude(I have experienced this more so in SMB customers) has some risks/issues associated with it (as mentioned below), and BI teams should avoid this trap by being proactive.

1. Belief that data quality can be completely fixed, creates unreasonable expectations and sets the whole team (which is likely to work on fixing data quality) up for failure. Data quality issues will exist as long as data is created and used by people. Citing data quality as a reason to defer BI implementation will deprive organization of the Informational needs which could be met by BI. Remediation of data quality is an iterative process (and not a one shot deal) and this is discussed in his recent post by Jim Harris in “Missed It By That Much

2. BI initiatives will always highlight possible data quality and process issues, making data quality as one of the major contingencies to derive value from BI initiatives is short sightedness (it is not as black and white as it is made to sound by people looking at what is wrong because of data quality issues)  and will not help organizations derive proper ROI and edge from the BI investments.

So what do you do when data quality issues threaten to derail the BI initiatives?

Proactive: BI initiatives are driven by organizations for the insight they offer into the data for making operational as well as strategic decisions. Because BI initiatives make it very easy to look at both aggregated and detail level data across the organization, they often highlight many data quality and process related issues within the business. It is really important to highlight this side benefit of BI implementation to all stakeholders’ right from beginning and all throughout BI initiative:

1. BI initiatives will make it very easy to spot data quality issues, use artifacts from BI initiatives as your microscope to find data quality issues.
2. They will highlight process issues by showing gaps or lack of relationships in data
3. BI initiatives can be and should be used to benchmark and monitor ongoing data quality and enforcement of business processes.

In fact, including these benefits of BI into the ROI calculation will ensure that

1. When data quality issues are found, organization will not be surprised. It’s almost like they are expecting BI initiative to highlight Data/Process quality issues.
2. BI initiatives will get credit (towards ROI) when these data quality issues are found and monitored.
3. Artifacts from BI (Like reporting or data warehouse, data mart, Single view of data) will now have official role to play in overall data quality measurement and monitoring imitative

What can BI teams do proactively to address data quality issues before they snow ball into show stoppers for BI initiatives?

Process for Identifying Data Quality Issues before BI Implementations

Process for Identifying Data Quality Issues before BI Implementations

1. Prioritize functional/business areas for which BI is to be implemented, if this is already done, stick with existing priorities.
2. Distill down those priorities in to set of business questions(10-20 for each requirement to cover all bases) which BI initiatives will answer. These questions will have to be developed in partnerships with business users

For example: After implementing data mart for pipeline analysis, marketing department should be able to do competitive analysis to understand which competitor is being run into most of the time. Allow trending and analysis of this information by all aspects of the pipeline (Time, Sales Organizations, Regions, Verticals etc…)

3. Itemize data which will be required to answer/fulfill business question. For example in above scenario following data should be available.

a. Pipeline/opportunity records data with competitor names captured
b. Opportunity data where sales organization is captured on it etc….

4. Now that you are aware of the data groups required to support the business question, do profiling of those data groups: In above example, you want to look at competitor field, profile that field in light of the question and understand

a. Density: How often is competitor and win/loss field captured? (e.g. only 10%)
b. Accuracy/Validity: How many instances of competitor field are usable Vs useless? (Out of 10% from the above, 70% are usable)
c. Enrichment effort: How many instance of competitor field needs cleaning? (Out of 70% from above, we need to clean up almost 50%)
d. Recency: What is the trend (recency) of entering this information (May be sales started entering this data recently, for last two quarters you had 80% of transactions with this information, so if focus is for last two quarters, reasonable amount of information is available for answering question above)

5. Once equipped with this information build the scorecard for accuracy of the data and validity of information/insight which can be gleaned from this data in the context of question…

6. So to come down to making decision about what BI team should do next, one will need to build necessary context around accuracy of information Vs. observed data quality to provide proper idea about what business users can expect to get out of this initiative if continued as is….In above example, some of the metrics which can be defined could be

i. If all of available data is looked at for analyzing competitive information only 10% of data (at best) would have any information. So if you do not put any constraint on the time frame, you are going to get insight into only 10% worth of business at best. This will require some effort on the part of parties involved to clean the data; otherwise this percentage will go further down.
ii. If you look at only last two quarters and information going forward, best you can get visibility is about 80% (provided data quality/cleanliness issues are addressed otherwise it will be less than that) of your business.

Sample of data quality score card

Sample of data quality score card

iii. If business decides to go ahead with the option (ii) above, an agreement/plan needs to be reached to have marketing/sales department implement the process to enter competitive information on each deal (going forward) as well as BI needs to commit to implement metrics/reports/dashboard to track this as one of the indicators and provide comparison of data in percentages as well as absolute numbers (Sample shown in the data quality score card image) to show ongoing progress or lack there off.
iv. If option 2 above is not acceptable to the business users, BI initiative needs to implement metrics to track the current data quality and ongoing data quality as a first step towards continuing BI initiative. This will help in motivating marketing/sales department to make necessary investments into fixing data quality (at least going forward). This won’t totally derail the BI initiative. After implementing the quality measurement initiative, BI team can still go ahead and implement the rest of the solution and engage all parties to start working on data quality (where ever applicable). This way when data quality goals are achieved by business or better understood by business they can start using BI implementation for competitive analysis with proper context of data quality in mind (I can use all of the data if I am doing competitive analysis for Printers and it will be roughly about 60% complete across all time frames, but for other products, I am better off looking at YTD data)

Doing above exercise will help in avoiding any surprises resulting out of visibility into data quality issues during BI implementations. This will also help in positioning BI as critical and required partner in enterprise data quality initiative. With some efforts, planning (and a little bit of luck), you can both save the messenger and also will allow business to attack the message.

Read Full Post »

During tough economic times, CFOs are tightening the belt and are looking into managing expenses across organization; in order to achieve necessary cuts in the expenses they recruit help from all business units/division/areas within the organization. Managing expenses is enterprise wide tasks and CFOs (Chief Financial Officers) lead this initiative. Impact of managing expense is far reaching; it helps organization to impact profitability by reducing expenses.

I believe that data quality is an enterprise level challenge as well; data quality affects all the parties who use, generate or maintain data for the purpose of managing their business. Typically CIOs (Chief Information Officers) lead (or should lead) all initiatives associated with data quality and they should look for recruiting help from across the organizations (just as CFOs or CEOs do). Creating enterprise wide awareness and urgency about the impact of the issues associated with data quality (and hence information quality) will ensure success and proper level of sponsorship of the data quality initiatives within the organization.

While there are several ways in which impact of the data quality can be highlighted to the organization, following are some practical examples which I have found to be easy to use while communicating soft ROI or impact of data quality on individual business units.

Sales Organization:

Capturing good quality data would directly impact on helping to close additional deals or help with closing of the deals which might have been lost otherwise. While this is a broader statement, I can give few examples where this has helped me in realistic situations….

Capturing Win/Loss reasons consistently on deals sales team marks as closed (Won or Loss) would result in better analysis of reasons why organization is losing deals. Based on this information, sales and sales operations organization can always recruit help from marketing or engineering to overcome reasons behind the losing those deals. Many times once better understanding about why deals are being lost against a competitor or lost in industry vertical is well established, minor tweaks in positioning or minor product enhancements could turn the tide and will reduce loss of deals in those situations. Of course all this will be possible if good quality data about win/loss reasons is captured. In order to make sure that this data is captured on an ongoing basis, Sales Operations need to commit to investing time on monitoring capture of Win/Lost reasons. They also need to figure out if there are standardized templates they might want to use to capture this information succinctly.

Another example in similar category would be to capture the competitor information (Competitor name, name of the products being competed against) on all deals. Better analysis of the win/loss data against competitor can help with recruiting marketing/engineering help to fend off competition. Again what it takes is qualitative data about competition and products in every deal in which sales is competing.

Interesting thing about this is that one or two deals (off-course depending upon your ASP) salvaged could pay for entire incremental expense of enterprise data quality initiative for the year or so.


Amongst many aspects, marketing department budgets and effectiveness is decided by how many sales qualified leads are generated on periodic basis. In many instances, speed with this Marketing is able to nurture and grow leads to be sales qualified leads really depends on the quality of the information captured. These days marketing automation tools provide personalized messaging but for that messaging to work marketing really needs to gather quality information about leads. It’s relatively easy to justify effectiveness of marketing campaign or even lead nurturing process by considering data quality of the leads data. For example, a client of mine decided to market their product to clients who had ERP system installed in-house (SAP or Oracle Applications). Unfortunately, lead data which was captured through webinars and trade shows did not capture this information in a consistent format; on top of that this field had a lot of data quality issues (free form text). Investing marketing and IT time to clean this data would greatly increase the effectiveness of any campaign or lead nurturing program to be run against these leads.

Operations and Finance:

If finance or operations team is particular about capturing all the contract and customer/account details in a clean way first time around on the transactions, they can save time it takes to ship and then bill the products to customer (save time wasted on figuring out where to ship/where to bill etc…), thus saving several days on shipping and invoicing clients. This will directly result in improved cash flows (early billing = early countdown of terms = early payment), greater customer satisfaction etc…If the data on existing contracts or existing customer is not clean and if your organization is engaged into business of reselling into those accounts or on renewals, it is worthwhile to make an effort to ascertain data quality of this data and save several days it takes to go through shipping and billing process.

In summary, I believe that data quality is enterprise wide issue which impacts almost everyone who creates uses and maintains data generated through business operations. CIOs need to champion the data quality cause at enterprise level just the way CFOs or CEOs champion the cause of cost cutting across enterprise by enlisting help from each and every department. Some of the ideas given above can be used for highlighting softer ROI of data quality across enterprise. Details and specifics of the above examples will vary from organization to organization based on the business they are in and the business model they have within their organizations.

Read Full Post »

Recently I have been reading a lot about data quality. My curiosity in data quality field was re-ignited by my experience with SMB customers in the recent past.

Data quality in many ways is manifestation of business process issues (lack of the process or lack of enforcement of business process). I believe that data quality is a leading indicator of the possible underlying business process issues.

It is really important to look at data quality holistically, it is not only IT’s responsibility but it is a business issue and should be addressed by all stakeholders. Business is responsible for instituting right amount of processes and enforcing of those processes on an ongoing basis to ensure availability of high data quality. Availability of high quality data can only be ensured through partnership between various stakeholders within an organization. These stakeholders are all of the parties who are either:

a)      Using data (decisions makers),

b)      Generating data (users using systems or people in different roles running the business),

c)       Maintaining data (owners of the business units) or

d)      Providing a mechanism to capture data (IT)

In his recent post, Jim Harris (The Data information Continuum)  talks about how data quality and information quality is interrelated.  He also talks about how information quality standards are subjective to specific initiatives within the business units (rather than being enterprise wide). Extending Jim’s thoughts further, I believe that any time business makes a decision to use data for the purpose of either making strategic or tactical decisions, IT and business should look at any and all underlying assumptions about the data in light of how it is going to be used. In nutshell, every time data is used in new ways or new patterns by business, a mini data quality project should be spawned to ensure that there is no data/process quality issues which will make decisions/information gleaned from the data inappropriate or ineffective.

Let us take one real life example to illustrate my point. Let us assume that IT has built a BI system to provide analytics/reporting on pipeline data. Sales now want to decide on allocating more resources in the area where sales cycles are shorter and market is growing. To execute on this strategy, Sales operations team decides to look at the current and past pipeline by Industry verticals and success rates, and cycle times by each industry vertical. While looking at data by industry verticals, Sales Operations realizes that almost 80% of the accounts don’t have industry vertical information captured or the information is not accurate. In this situation, lack of Industry Vertical information can be construed as data quality, but in reality, Sales needs to identify this as a business need and implement a process by which when a new account is created, lookup on D&B data for industry vertical and entering that industry vertical information on account record as a mandatory process.

I truly believe that data quality and business process issues go hand in hand. It is not solely IT’s job to fix data quality issues as most of the time data quality issues lead into business process implementation/enforcement issues.  Data quality monitoring is not a onetime project, there needs to be awareness across all stakeholders that any time they are going to use data for making decisions in ways different than they have used in the past, they need to start a data/process quality assessment project.  More on practical ways to implement this ongoing data/process quality improvement process in next blog entry….

Read Full Post »

I came across a post on “Justifying Marketing Budgets By Using BI on Dashboardinsight.com

Extending discussion in this article and taking it at a bit lower level: What type of metrics can be used by marketing to justify the budgets? There are several supporting metrics (beyond core metrics around brand awareness and campaign success in generating leads etc….) and data points which marketing should not overlook in justifying budgets. Following is a sample of some metrics

  1. Marketing can co-relate positive changes in organizational KPI’s/metrics based on achievements of the marketing. For example with increase in brand awareness or after winning industry awards/or mentions..  What is the increase in ASP?, increase in win rates  or decrease in sales cycles.
  2. How marketing’s efforts in positioning against competition has resulted in increased win rates against competition or reduction in days it takes to close the deals or stopped price deterioration when in competing deals.
  3. Demonstrate how the lead gen activity has helped sales with the increased pipeline. Most of the integrated campaign management tools today provide visibility into pipeline generated as a result of direct marketing activity.
  4. How changes in bundling or positioning of product has resulted in increased sales of the product or increase in ASP and Win Rates.

I think that marketing makes a greater impact on the organizational top/bottom line. Marketing should take its time and get into the details of incremental value added to the top and bottom line of the organization through correlative metrics such as described above.

Read Full Post »