-
Five New Ideas From 2010 MIT Information Quality Industry Symposium
Posted on July 15th, 2010 1 commentHere are some quick thought from the first day of the MIT Information Quality Industry Symposium. It’s my favorite event of the year. I refer to it as the “anti-boondoggle.” All academic theory and very little vendor fluff. I suppose that what you get when MIT and the University of Arkansas organize events. I’ll either post another top 5 tomorrow, or a full recap.
Please comment if you’d like me to dive further into any of these topics.
1) Cloud Is No Longer The Focus
Last year everyone talked about Governance in the cloud. This year it’s dead. Why? I think it may be that this group, unlike the Sales 2.0, is focused on Enterprise scale monolithic systems. Last year at MITIQIS, many presentations were focused on the cloud impact on large scale information quality programs. This year, it’s all about internal, installed systems. I find this facinating. Did this group try cloud, and not see the value? Or is it that there is still a duality of idealogies: One that prefers to keep things internal, and a second that wants to move their IT responsiblility to SaaS apps?
2) Master Data Management (MDM) Isn’t The Only Solution
I was surprised that among the Information Quality vendors and practitioners, MDM was no longer the focus. Joe Bugajski focused on it, but others merely touched on how they would interact with MDM rather than focus on MDM as the central system in an Information Quality focused environment. This year, many people talked about Information Quality at the system level, and fixing business process and human interfaces to eliminate dirty data at the source. This reminded me of the Data Warehouse to Data Mart paradigm shift of 10-15 years ago. I just felt old writing that.
3) Data Quality is a Dirty Word
“Information Quality” is now in vogue. I was corrected several times in conversations when I mentioned data quality. This is somewhere between a more highbrow way of marketing ourselves, and snobery. I don’t think this matters in the least bit, but others believe it’s more accurate and lends more credibility to our practice. As you’ll notice throughout my writing, I resist heavily the practice of pluralizing the word data. I never write, “data are,” which I believe is gramatically accurate. I feel the same way here. I do “Data Quality” work, regardless of who says that term is wrong. All right… I’ll use it in this post and try it on for size. This is the Information Quality Symposium after all.
4) Free Sources Drive Down R&D Cost
Data is available from government sources and tools are available from open source communities. No surprise there, but there was in increased focus on it here at MITIQIS. Why? Talend, an information quality vendor, builds their tools on the back of those open source libraries. They credited various shared data models, methodolgies and data sources that allow them to shortcut proprietary R&B spend. Trillium also spoke up, and mentioned that they leverage some of the same open-source thinking in their full price solutions.
5) 60-90% of Operational Data is Valueless
I won’t say worthless, since there is some operational necessity to the transactional systems that created it, but valueless from an analytic perspective. Credit to Kirk Amidon for this insight - he attended the session where this stat was quoted. Similarly, Steve Adler from IBM and others discussed it in their presentations. Data only has value, and is only worth passing through to the Data Warehouse if it can be directly used for analysis and reporting. No news on that front, but it’s been more of the focus since the proliferation of data has started an increasing trend in storage spend. That wasn’t discussed at the conference… just my opinion.
-
KQIs (Key Quality Indicators) To Measure Data Quality
Posted on August 18th, 2009 No commentsAt the recent MIT Information Quality Industry Symposium, the hot topic was measuring the impact of data quality programs. In a bad economy, it makes perfect sense. If your company is cutting programs, you need to justify your data quality initiatives, or they too will be cut. My favorite presentation on the topic was from Delphine Clement, whose topic was the, “Cost of Non Quality Data.” I thought that was an interesting way to look at it, and she presented a very mature view of Data Management. Delphine credited sessions from previous MIT Information Quality Symposiums with some of the underlying theory. I’m sure there are others to credit as well, and if you know the history please comment.
Delphine reports on the Key Quality Indicators (KQIs) that matter the most to her business partners. She has taught the business community that KQIs are needed to build confidence in the KPIs. I like that the KQI approach mirrors the KPIs (in naming and level of importance), and that they are presented as a complementary report. Think of this as the metadata for the KPIs. That’s the way I rationalized it.
KQIs would make sense to any Data Quality lead, but it might not to a VP of Marketing or VP of Sales. It’s not their job to care how we do ours. So how do you bridge the gap with the executive KPI users? You must understand their needs, and show them that the KQIs are driving the data quality projects in your organization. They will only care if the KQIs help to resolve their issues. Also, KQIs may be used to show them progress in your data quality programs. When you complete a project and are able to turn a yellow (cautionary) indicator to green (good), they will understand how the project affected their work.
Delphine’s approach begins by asking business leads and other data users a simple question, “How should we measure data quality.” She gathers feedback via surveys from her business customers and measures progress through response trending over time. Sounds like internal Marketing, right? Delphine also presented a methodology for measuring direct vs. indirect cost savings from Data Quality initiatives. She has clearly spent a lot of time working on this approach and is doing a great job. I really enjoyed this presentation.
She also recommended involving the end users early on to define:
- What are the Key Quality Indicators (KQIs) that are important to the business?
- Should the KQIs be global or local?
- What is the cost of poor quality data?
- Are the KQI’s different by country?
I love these questions. Simple, direct, and open. Rather than telling our peers how we should be measured, ask them and include them in the KQI process.
-
MIT Information Quality Symposium Day 2
Posted on July 17th, 2009 1 commentWith Day 2 of the MIT IQIS complete, I thought it would be good to write up another summary. I was very impressed with the quality of speakers and their dedication to the field of Information Quality. The work shows a lot of innovative thinking and pride. (I’ll add in links and update later today)
Robert Grossman – Information Quality in the Cloud
Bob is part of the Open Cloud Consortium and passionate about the topic. He presented everything you need to know to understand where Cloud Computing is today, where it’s going next (based on open debate among dueling standards boards), and how it affects Information Quality discussions. He has a unique ability to take very complex topics and break them down into simple conversations.
The most interesting part for me was defining Public, Community and Private Clouds, which I couldn’t have described before this talk. I also appreciated his comment that Cloud is the only way to analyze 100TB of data, and that the alternative is to merely entomb it.
Delphine Clement - Cost of Non Quality Data
Delphine is from HP in France and discussed how they have approached their KQI – Key Quality Indicators. I like that KQIs mirror KPIs but that Information Quality is metadata reporting rather than business metrics so it’s separate. Delphine also presented a methodology for measuring direct vs. indirect cost savings from Data Quality initiatives. She has clearly spent a lot of time working on this approach and is doing a great job. I really enjoyed this presentation.
Lyn Robison - Diagnosing IT’s Impact on the Business
Lyn, from The Burton Group has a theory on how to measure data quality from an IT perspective, but I thought it was very pie in the sky. There were lots of questions about the politics of such an effort, and I don’t think the approach was practical. For instance, if your measured data quality metrics turn up as poor, the IT organization will blame the business. There’s no way this could work politically.
I liked that Lyn tried to compare the business people’s perception of Data Maturity vs. the IT perception, but how do you align IT perception and Business perception? Someone also asked, should IT be measured on poor data quality? The answer: Not if the Business owns the data.
Steve Sarsfield - Using Data Quality Scores to Sell IQ Value
Steve echoed others who encouraged Information Quality progress by “Leveraging a Crisis” to build momentum. He also asked us to present the “Do Nothing” approach, i.e. present to our management what would happen if they ignored the problem. Steve’s scoring method was based on the Trillium TS Insight product, but appeared to be a practical way to measure Data Quality. I think some of this can be done easily with or without Trillium, but I appreciated how the tool can manage the measurements over time.
Marillo Boccia – Data Quality in the Media Industry
Marillo is the Director of Database Marketing at Grupo Abril, the largest publisher in the Southern Hemisphere. He presented a project (done with the help of service provider Assesso) where his team personalized magazine ads for Banc Itau to 1.2 Million subscribers. Cool stuff. They merged their subscriber database with the bank’s and did a massive customer data cleanup to ensure very high data quality. They amazed their customers in the process.
Dan Defend and Aparna Vani - Data Quality Challenges for Yahoo’s Massive Data Environment
Dan and Aparna presented the Data Quality and Analytics sides respectively. They monitor website interaction and uncover trending and outage information by analyzing a constant flow of clickstream data. Their group deals iwth duplication challenges, security issues, and the need to report outage alerts instantly. Their work was also driven by past MIT IQIS conferences, and they presented their practical approach to establishing a central data quality process and framework.
-
Lightweight Data Governance: A Starting Point
Posted on June 22nd, 2009 4 commentsThis expands on the previous article, Lightweight Data Governance. I’ll continue to add to the theory in upcoming posts. If there are any areas you would like me to focus on, please add a comment, or email me directly.
A few weeks back I met with Steve Sarsfield to discuss the upcoming MIT Information Quality Symposium (MITIQS). It will be my first time presenting to a Data Quality focused group, so I was excited when Steve offered to provide some background. My main concern was, “How can someone in the commercial space keep the interest of a combined business, government and research focused community?” We discussed my approach, and I think I’m on the right track. I’m going to describe how we initiated Data Governance at my company, kept it simple, and found early success.
So where did we start? Data Governance grew from an expressed need by the executive team for better data quality. Sounds simple right? Fix the data. It’s like the Kenan Thompson SNL character talking about the economy: Fix It. The company decided that Data Governance was needed, and that they would let me define the path to getting there. I set the scope to include any project where I have an opportunity to build credibility in data or reporting. I’ve formalized processes where necessary, but kept it “lightweight” in most areas. With the current state of the economy, I see no other way to get there.
I previously led the Marketing Analytics department, and we had responsibility for B2B and B2C Analytics. Most of our efforts were focused on the B2B side, since that’s where the most perceived opportunity existed. When I moved into the Director of Global Data Governance role, I built from my strength and worked on B2B issues first. I attacked the low-effort, high-value projects. I looked to expand on the local efforts that were working well. If teams or projects came up with creative solutions, I looked to expand their work globally. My thought was that it’s really hard to come up with the underlying process definition, but that an existing process was easy to expand. It doesn’t work for every existing process, but some are natural fits that resolve longstanding internal issues.
That became the basis for Lightweight Data Governance. Find the projects or efforts that are successful on a small scale, and expand them globally. That way you start with a base of knowledge, documentation, and executive support that’s very hard to build from scratch.
Grow Data Governance efforts organically
Start with existing processes. Find out which can be expanded, centralized or automated.
Focus on project level ROI
Don’t try to sell your management on a huge program to start. Build the business case at the project level. It’s easier for management to support small positive ROI projects.
Partner to be unobtrusive to ongoing work
Find projects that are already in flight. Would Data Governance add to their impact? If so, partner with their leadership to help craft the deliverables to create mutual benefit.
Build momentum from early successes
Get testimonials! If the project went well and the community benefited, you should be able to get the project sponsor to say so.
Measure initiatives on DQ impact
This step is further along the Data Governance continuum. Begin to show the impact on the organization when projects focus on data quality. This cultural shift will underscore the importance of future Data Governance work.
Follow with Formal Data Governance
Does it make sense for the enterprise? Does executive support exist? If not how do you build it? This is where the more traditional theory in most Data Governance efforts becomes relevant.


