Archive for the ‘Data De-duping’ Category

This is a sixth blog entry in a blog series highlighting the critical nature and the importance of executive sponsorship for data governance initiatives. In last few entries, I explored need to understand the KPIs, goals behind those KPIs and necessity to get your hands on actual artifacts used by executives in reviewing these KPIs and their goals.

My approach has been very simple and straightforward: data governance initiatives need to absolutely be able to demonstrate impact on top and bottom lines by helping executives improve on the KPIs which are used as means to achieve higher profitability, lower costs and compliance. The process of garnering executive sponsorship is a continuous one. Visibility of data governance organization, its impact across the board; helps in establishing awareness and understanding of how data governance initiatives help organizations. This visibility and awareness makes it easy to maintain ongoing executive sponsorship.

Once you, as a data governance team, have clearly understood KPIs, goals behind those KPIs and have access to the artifacts used by executives, it is time to go back to the technical details. At this stage it is extremely important to map which systems, business processes automated by those systems and data is either directly or indirectly responsible for the outcome of those KPIs. This process of mapping dependency between KPIs, systems, business processes and data can be somewhat automated using metadata management repositories. It is important to capture this information using tools and technologies so that this information can be readily available and shared with other teams and systems.  Technology solution will also facilitate change management, impact analysis in future. The lineage and the metadata I am talking about here, go beyond technical metadata and gets into the realm of business (process and semantic) metadata as well.

This dependency information will come in very handy in establishing scope, definition of the efforts being planned towards specific data governance initiative/project. When collecting information about the systems, business processes automated by those systems and data, it is important to capture relevant information with long-term, repeatable perspective. Information such as:

1.     System name and information,,

2.     Landscape information (where is it being installed/managed/housed, which hardware/software are being used? touch points with other systems etc.)

3.     Ownership and responsibility information from both business and technology perspective. (Which technology teams are responsible for managing, changing and maintaining these systems? Who are the business stake holders who approve any changes to the behavior of these systems? etc.)

4.     Change management processes and procedures concerning the systems and data.

5.     End-users/consumer information (who uses it? How do they use it? When do they use it? For what do they use it? In).

6.     Any life cycle management processes and procedures (for data, systems) which might be into existence currently.

7.     Specific business processes and functions which are being automated by the systems?

Many a times, some of this information might already be present with the teams managing these systems. This exercise should identify presence of that information and make a note of that information. The point here is not to duplicate this information. If the information does not exist, this exercise will help capture such information which is relevant not only for the data governance initiatives, but is also usable by system owners and other stakeholders.

Goal of this step/answering this question is to baseline information about systems, business processes automated by the systems and data. This information is going to help in subsequent stages for establishing, change management processes, defining policies and possibly implementing and monitoring policies around data management/governance.

From this phase/question data governance initiative starts transitioning into nuts and bolts of the IT systems and landscape. In next few blog posts, I will be covering various aspects which data governance team should consider as they start making progress towards establishing official program and start working on it.

Previous Relevant Posts:

Litmus Test for Data Governance Initiatives: What do you need to do to garner executive sponsorship?

Data Governance Litmus Test: Know thy KPIs

Data Governance Litmus Test: Know goals behind KPIs

Data Governance Litmus Test: How and who is putting together metrics/KPIs for executives?

Data Governance Litmus Test: Do You Have Access to the Artifacts Used by Executives?

Read Full Post »

This post was encouraged by similar writing about good data management post by fellow practitioner and blogger Henrik Liliendahl (Right the First time)

For data professionals like me, who have been working and preaching importance of the data quality/cleanliness and data management, it feels really good when you see good examples of enterprise information management policies and procedures at play in real life. It feels like the message of importance of data as an asset, as a “critical success enabler for the business” is being finally heard and accepted.

Recently I had a wonderful experience shopping for a laptop at http://www.dell.com. As I was shopping on their website, I configured my laptop (I’m sure all the options for my laptop were being populated from a Products MDM catalogJ). When I was ready to check out, before calculating shipping charges, website prompted me to enter my shipping address. When I entered my address, website came back to me with two corrected addresses which where enriched with additional information such as four digit extension of the zip code, expanded abbreviations etc. Website forced me to choose one of the corrected/enriched addresses before I proceeded with my order completion. This probably meant that they have implemented a solution which checks for validity and conformance of address information being entered before letting this information enter into their systems. Obviously, this investment from Dell has many benefits for Dell and hence they must have invested this effort in implementing these data quality/standardization solutions as a part of broader Enterprise Information Management framework. I was really happy for Dell. This process also meant that my order was delivered ahead of schedule without additional charge.

I am writing this because I believe in applauding and appreciating efforts done the right way. For transparency purpose, I am not related with dell.com in any professional way (employment, contract etc…), also nor did dell hire me to write this blog post. I am one of the thousands of customers they have. I just want to say good job Dell.com

I would like to appeal to all fellow bloggers and practitioners to cite examples of good information management, data management or data governance practices at work from public domain and write about them. Tweet about them under #goodeim tag. We have heard too many horror stories; there are many organizations which have been diligently at work implementing very successful information management practices, let us encourage and applaud those efforts openly.

Read Full Post »

This is a fourth blog entry in a series of blog entries highlighting how to go about securing executive sponsorship for data governance initiatives. In previous posts, I have highlighted the need for  understanding specific KPIs/metrics which executives track,  and tangible goals which are being set against those KPIs.

Almost always, there is either individual or group of individuals who work tirelessly on producing these necessary reports with KPIs/metrics for executives. Many a times these individuals have clear and precise understanding of how these metrics/KPIs are calculated, what data issues, if any, exists in underlying data which supports these metrics.

It is worthwhile to spend time with these groups of people to get a leg up on an understanding of metrics/KPI definitions, knowledge around data issues (data quality, consistency, system of record). The process of engaging these individuals will also help in winning confidence of the people who know the actual details around KPI/metrics, processes associated with calculating and reporting on these metrics. These individuals will likely to be part of your data governance team and are crucial players in winning the vote of confidence from executives as it relates to the value data governance initiatives create.

In one of my engagements with a B2 B customer, executive management had the goal of improving business with existing customers. Hence executive management wanted to track Net new versus Repeat business. Initially sales operations team had no way of reporting on this KPI, so in the early days they reported using statistical sampling. Ultimately, they created a field in their CRM system to capture new or repeat business on their opportunity data. This field was used for doing new versus repeat business KPI reporting. Unfortunately, this field was always entered manually by a sales rep while creating opportunity record. While sales operation team knew that this is not entirely accurate, they had no way of getting around it.

In my early discussions with sales operations team, when I came to know about this, I did a quick assessment for a quarter worth of data. After doing basic de-duping and some cleansing I compared my numbers versus their numbers and there was a significant difference between both of our numbers. This really helped me get sales operations team on board with data cleansing and ultimately data governance around opportunity, customers and prospects data. This discussion/interaction also helped us clearly define what business should be considered Net new and Repeat business.

Obviously, as one goes through this process of collecting information around metrics, underlying data and the process by which these numbers are crunched, it helps to have proper tools and technology in place to capture this knowledge. For example

a)     Capturing definition of metrics

b)     Capturing metadata around data sources

c)      Lineage, actual calculations behind metrics etc.

This process of capturing definitions, metadata, lineage etc. will help in getting high level visibility of the scope of things to come. Metadata and lineage can be used to identify business processes and systems which are impacting KPIs.

In summary, this process of finding people behind the operations of putting together KPIs helps in identifying subject matter experts who can give you clear and high-value pointers around “areas” which data governance initiatives need to focus early on in the process. This process will ultimately help you in recruiting people with right skill set and knowledge in your cross functional data governance team.

Previous posts on this or related topics:

Litmus Test for Data Governance Initiatives: What do you need to do to garner executive sponsorship?

Data Governance Litmus Test: Know thy KPIs

Data Governance Litmus Test: Know goals behind KPIs

Suggested next posts:

Data Governance Litmus Test: Do You Have Access to the Artifacts Used by Executives?

Read Full Post »