It's taken me a while to get into the idea of "big data" as something special. I've been working on largish structured data sets for a while, and I think the challenges of getting valuable and reliable insights out of that data is challenging enough. My first thought was "now I'm supposed to try to extract something meaningful out of a mishmash of inconsistent, sparsely populated, questionably reliable data?"
Some things I've read over the past few months have helped me get over that hump, so I wanted to share them.
First was "The Information" by James Gleick. He's my new favorite author. This was a great history of information theory.
Second was "Thinking Fast and Slow" by Daniel Kahneman. He's a Nobel prize winner and proponent of behavior economics. Fascinating read.
Last was "Chaos" by James Gleick about the origins and founding of chaos as a mathematical science. Another amazing read.
These three books so close together helped me get a much stronger intuition for the power and meaning of insights that can come from big data and machine learning techniques. If you're on the fence about the usefulness of big data, start reading.
Friday, June 29, 2012
Thursday, May 17, 2012
Namespace Collision
I always enjoy a good joke, especially when they're related to work. This one is a bit edgy, so I hope you forgive the nature of the content.
3M has a product called the Healthcare Data Dictionary that it uses in many of its products as a way of mapping and related concepts from different domains or standards. They've made this available for use online for free and are open sourcing it as well. Since I'm in healthcare, I thought I'd do a little searching around just for fun. One of the entries I stumbled across was "failure of orgasm." Oh, there are things much more risque, trust me! I thought it was quite funny, though, when the 3M HDD told me that related to this condition was "Parent - Reactions - has child." I'll leave it to you to interpret the direction of the implied causality and likelihood of actuality.
Clearly I took some liberty in misinterpreting the layout of the results, but the collision between the healthcare namespace (which has you thinking about sex and topics related to sex) and the relational namespace (where parents and children represent a hierarchy) makes for a fun joke.
3M has a product called the Healthcare Data Dictionary that it uses in many of its products as a way of mapping and related concepts from different domains or standards. They've made this available for use online for free and are open sourcing it as well. Since I'm in healthcare, I thought I'd do a little searching around just for fun. One of the entries I stumbled across was "failure of orgasm." Oh, there are things much more risque, trust me! I thought it was quite funny, though, when the 3M HDD told me that related to this condition was "Parent - Reactions - has child." I'll leave it to you to interpret the direction of the implied causality and likelihood of actuality.
Clearly I took some liberty in misinterpreting the layout of the results, but the collision between the healthcare namespace (which has you thinking about sex and topics related to sex) and the relational namespace (where parents and children represent a hierarchy) makes for a fun joke.
Thursday, April 19, 2012
The Information
I'm reading a book right now called The Information, by John Gleick. It's an incredible history and exposition on the origin of thinking and the articulation of information. One of my favorite quote (so far) suggests that information itself is a new kind of entity - not matter, not energy.
Given my current interest in data governance and the concept of "data as a corporate asset," I've latched on to this concept closely. Information is thing, but it's neither capital nor operating expense. It's a different kind of thing altogether, but measurable and worthy of management nonetheless.
Stay tuned for more reflections on this book while I'm reading the rest of it.
Given my current interest in data governance and the concept of "data as a corporate asset," I've latched on to this concept closely. Information is thing, but it's neither capital nor operating expense. It's a different kind of thing altogether, but measurable and worthy of management nonetheless.
Stay tuned for more reflections on this book while I'm reading the rest of it.
Thursday, November 4, 2010
TDWI St. Louis Chapter
For anyone who will be in the St. Louis area on November 12th, the St. Louis Chapter of The Data Warehouse Institute will be holding it's quarterly meeting. I expect this meeting will contain some great content from renowned speakers Neil Raden and Krish Krishnan.
For more information and to register, see the TDWI STL page.
Meetings are free and open to the public. You need not be a TDWI member to attend.
Hope to see you there!
For more information and to register, see the TDWI STL page.
Meetings are free and open to the public. You need not be a TDWI member to attend.
Hope to see you there!
Tuesday, November 2, 2010
Information Portfolio Components
My model for the Information Portfolio includes four components:
People represent the individuals or teams that use data to execute business processes.
Applications are either traditional end-user applications or integration solutions that move data through a process, between processes, or to/from people.
Processes are the business activities that leverage data to fulfill the operational objectives of the business.
Data is the cornerstone of the Information Portfolio. It is the stuff that moves through a process, between people and applications, to act as fuel in the execution of business services.
The Information Portfolio is a knowledge base or collection of metadata that links together these four concepts in meaningful ways that transform bare data, through context and meaning, into information that can be used in the delivery of business services. [Wisdom Hierarchy]
More on each of these components and how they relate to each other in the Information Portfolio as the month continues...
- People
- Applications
- Processes
- Data
People represent the individuals or teams that use data to execute business processes.
Applications are either traditional end-user applications or integration solutions that move data through a process, between processes, or to/from people.
Processes are the business activities that leverage data to fulfill the operational objectives of the business.
Data is the cornerstone of the Information Portfolio. It is the stuff that moves through a process, between people and applications, to act as fuel in the execution of business services.
The Information Portfolio is a knowledge base or collection of metadata that links together these four concepts in meaningful ways that transform bare data, through context and meaning, into information that can be used in the delivery of business services. [Wisdom Hierarchy]
More on each of these components and how they relate to each other in the Information Portfolio as the month continues...
Day 1 - PragProWriMo / NaNoWriMo
It's November again, which means that it's time to spend some focus time writing. After a little bit of a slow start with day one of writing, I'm at 1,292 words. The pace that I try to set is 2,000 words per day, which leaves a chance to take one day off each week and still hit the NaNoWriMo target for 50,000 words. The official goal for PragProWriMo is 2 pages per day, which would be well under a 2,000 word/day goal.
I've been distracted and not really planning for November, unlike last year. Last year, I had an outline developed and some sketches down on paper before I started writing The Practical Data Warehouse. This year, I spend the evening watching TV, commented to my wife that I really didn't have any clue on what to write... then took her advice and just started writing. This years concoction:
As November rolls on, I'll be posting snippets from the text. Good luck to all the other NaNoWriMo and PragProWriMo authors out there!!
I've been distracted and not really planning for November, unlike last year. Last year, I had an outline developed and some sketches down on paper before I started writing The Practical Data Warehouse. This year, I spend the evening watching TV, commented to my wife that I really didn't have any clue on what to write... then took her advice and just started writing. This years concoction:
The Enterprise Information Portfolio
A Model for the People, Applications, Processes, and Data of an Organization
As November rolls on, I'll be posting snippets from the text. Good luck to all the other NaNoWriMo and PragProWriMo authors out there!!
Monday, September 6, 2010
What's your cathedral?
There's a classic story about understanding the purpose of work: The Story of the Three Stone Cutters
Every time that I've heard this story in the past, I've identified clearly with the third stone cutter - I need to know what kind of structure I'm building and why we're doing it. I struggle to find motivation unless I really understand the mission.
And, I've always felt sorry for the first two stone cutters. I mean, the second one has a noble purpose and all. The first one doesn't need to know his greater purpose to be a good worker. The third one, though: he's the enlightened one!
Then, I started thinking about one of the last team building / motivational leadership meetings I was at. We spent a long time talking about how to help other co-workers in IT connect to the fact that we are a Catholic health care service provider. Our mission and purpose is all about providing care for patients, with a deference for those who are on the edges of society or in the greatest need. A great purpose for the organization. As IT leaders, we spend quite a bit of time working to help our staff understand how their day to day work of supporting servers and applications helps someone to do their job in Finance; and how that helps someone do their in Medical Records; and how that helps someone do their job in Credentialing; which helps us make sure that our physicians are qualified; which helps us make sure that patients are safe. In the worst of cases, an IT co-worker can feel six or seven times removed from the purpose of the organization. (I recently took a Gallup "strengths" survey and learned that "connectedness" is one of my strength, so I guess I don't really struggle with this too much.)
Still, I recently started to wonder if all of the leaders in that training were coming at things from the perspective that the third stone cutter is the only one who's really got it right, and that we're all supposed to strive to see the same metaphorical cathedral.
Recently, I've found at least as much satisfaction at work focusing on the quality of stones that I'm carving right now. I've got a larger purpose in mind, though it isn't really the patient care we provide. As I contemplate how to push data management principles forward, my justification stops with "this will make our organization smarter." Of course all kinds of great benefits will come from that, including improved patient care, fiscal responsibility, innovative service models, an end to world hunger, and peace for all. Right now, though, it is completely satisfaction enough to have the purpose of making people better, smarter decision makers.
So, if my cathedral isn't better health care for our patients, then is it a stretch to think that the first stone cutter's cathedral isn't that one beautiful brick and pride in his craftsmanship; and the second stone cutter's cathedral isn't the quality of life experience that he's providing for his family and for himself.
What's your cathedral look like? Is it right in front of your eyes, or off in the distance?
Once, there were three stone cutters working together on a job. A stranger came upon the first stone cutter and asked, "What is it that you're doing?"
"Cutting this stone into a perfect square block," answered the first stone cutter, continuing to focus with great care and precision.
The stranger moved on, leaving the first stone cutter to his craft. He came upon the second stone cutter, who was also working diligently on his pile of stones. "What are you doing?" asked the stranger, interested to see how this stone cutter would respond.
The second stone cutter stopped to look at the stranger and engage in the conversation. "I'm working here on this job to provide for my family. I have a loving wife and two wonderful children. I work hard here to make sure we have what we need and can take time to enjoy each other." The second stone cutter reached out a friendly hand. "I'm Alexander."
The stranger introduced himself and shared a brief story with the stone cutter about his own family. They said good bye, and the stranger moved on to another area of the project.
Around the outer edge of the site, the stranger saw a third stone cutter who was squatting behind a large carved stone, but staring toward the horizon. The stranger approached this stone cutter and asked, "Pardon me. May I ask what you're working on?"
"I'm building a cathedral," responded the third stone cutter without breaking his gaze toward the horizon, and into the future. "It's going to the new home for a parish that is renowned for it's financial generosity and support of the surrounding community. There will be a soup kitchen on the main level, a community garden in the courtyard, and offices for individual and family therapists. In three years, they expect to be providing services to over two thousand people a day."
The stranger stared into the distance, picturing the bustling crowds and smiling faces of the volunteers. "Thank you for sharing that vision," said the stranger.
Every time that I've heard this story in the past, I've identified clearly with the third stone cutter - I need to know what kind of structure I'm building and why we're doing it. I struggle to find motivation unless I really understand the mission.
And, I've always felt sorry for the first two stone cutters. I mean, the second one has a noble purpose and all. The first one doesn't need to know his greater purpose to be a good worker. The third one, though: he's the enlightened one!
Then, I started thinking about one of the last team building / motivational leadership meetings I was at. We spent a long time talking about how to help other co-workers in IT connect to the fact that we are a Catholic health care service provider. Our mission and purpose is all about providing care for patients, with a deference for those who are on the edges of society or in the greatest need. A great purpose for the organization. As IT leaders, we spend quite a bit of time working to help our staff understand how their day to day work of supporting servers and applications helps someone to do their job in Finance; and how that helps someone do their in Medical Records; and how that helps someone do their job in Credentialing; which helps us make sure that our physicians are qualified; which helps us make sure that patients are safe. In the worst of cases, an IT co-worker can feel six or seven times removed from the purpose of the organization. (I recently took a Gallup "strengths" survey and learned that "connectedness" is one of my strength, so I guess I don't really struggle with this too much.)
Still, I recently started to wonder if all of the leaders in that training were coming at things from the perspective that the third stone cutter is the only one who's really got it right, and that we're all supposed to strive to see the same metaphorical cathedral.
Recently, I've found at least as much satisfaction at work focusing on the quality of stones that I'm carving right now. I've got a larger purpose in mind, though it isn't really the patient care we provide. As I contemplate how to push data management principles forward, my justification stops with "this will make our organization smarter." Of course all kinds of great benefits will come from that, including improved patient care, fiscal responsibility, innovative service models, an end to world hunger, and peace for all. Right now, though, it is completely satisfaction enough to have the purpose of making people better, smarter decision makers.
So, if my cathedral isn't better health care for our patients, then is it a stretch to think that the first stone cutter's cathedral isn't that one beautiful brick and pride in his craftsmanship; and the second stone cutter's cathedral isn't the quality of life experience that he's providing for his family and for himself.
What's your cathedral look like? Is it right in front of your eyes, or off in the distance?
Tuesday, August 31, 2010
Spare Some Change?
There's a classic joke about the difference between the IT person and the developer. Here's my version:
The best quote I've ever heard about change isn't the old adage that "the only thing constant is change." I'm a futurist at heart, so there's no deep insight for me in the constancy of change. The best quote I've heard about change is that "change is great because every time something changes it means you're one step closer to getting it right." That's a bit of paraphrasing and it assumes that we're primarily concerned with "good" change, of course. The point stands: if you believe that change is a good thing, then it's natural to embrace it rather than fight it.
Classic waterfall methodologies focus on defining specifications ahead of implementation so that the risk of change can be avoided. That level of contractual thinking drives unnecessary conflict, especially in business intelligence projects. One of the things we've learned through experience is that we can't know what we don't know. Naturally, the needs being addressed by a business intelligence solution will change over time as more insight is delivered to decision makers. If we already knew what the outcome was, then there wouldn't be a need for the project. Business intelligence is about discovery.
One of the core principles from the Agile Manifesto is "customer collaboration over contract negotiation" and "responding to change."
I've worked with a number of teams that grow increasingly frustrated over changes that end users make in business logic. ETL developers sometimes get so fed up with change and being forced to rewrite code over and over that they feel they should stop writing any code until "the users decide what they want!" Those developers haven't recognized that every time they change their code, they're enabling the users to understand what it is that they want. They're facilitators, not just implementers.
So, agile BI is at least as natural a fit as agile application development. Probably even more so. For BI developers to be agile, though, they have to embrace change. They have to facilitate rather than resist change.
The boss comes into the IT guy's office and says "Hey, Joe, we've got a new line of business starting up and we really need to be able to do this new thing, X."
Joe simply says, "Sorry, Boss, that can't be done."
So, the boss goes down the hall to the development manager and says "Hey, Nancy, we've got a new line of business starting up and we really need to be able to do this new thing, X."
Nancy says, "Absolutely, Boss. We can do anything. It'll take us six months to plan, two years to develop, and another several months of user acceptance testing."
The boss goes back to his office and begins writing his resignation.
The best quote I've ever heard about change isn't the old adage that "the only thing constant is change." I'm a futurist at heart, so there's no deep insight for me in the constancy of change. The best quote I've heard about change is that "change is great because every time something changes it means you're one step closer to getting it right." That's a bit of paraphrasing and it assumes that we're primarily concerned with "good" change, of course. The point stands: if you believe that change is a good thing, then it's natural to embrace it rather than fight it.
Classic waterfall methodologies focus on defining specifications ahead of implementation so that the risk of change can be avoided. That level of contractual thinking drives unnecessary conflict, especially in business intelligence projects. One of the things we've learned through experience is that we can't know what we don't know. Naturally, the needs being addressed by a business intelligence solution will change over time as more insight is delivered to decision makers. If we already knew what the outcome was, then there wouldn't be a need for the project. Business intelligence is about discovery.
One of the core principles from the Agile Manifesto is "customer collaboration over contract negotiation" and "responding to change."
I've worked with a number of teams that grow increasingly frustrated over changes that end users make in business logic. ETL developers sometimes get so fed up with change and being forced to rewrite code over and over that they feel they should stop writing any code until "the users decide what they want!" Those developers haven't recognized that every time they change their code, they're enabling the users to understand what it is that they want. They're facilitators, not just implementers.
So, agile BI is at least as natural a fit as agile application development. Probably even more so. For BI developers to be agile, though, they have to embrace change. They have to facilitate rather than resist change.
Friday, August 27, 2010
Your Job? My Job? Our Job!
I've been trying to figure out agile data warehousing for several years now. I'm a computer scientist by training and a programmer by hobby, so I've always kept my eye on trends in traditional software development. What I tell myself, professionally, is that it helps me have alternative perspectives on BI solutions. (It's really just 'cause I like programming, even if I what I hack together isn't typically that elegant.)
Several years ago, I was introduced to one of the founders of the St. Louis XP (Extreme Programming) group, Brian Button, and decided to sit down and have lunch with him. I explained what kind of work we do to build data warehouses, and he listened very politely. At the time, he was thinking mostly about test driven development and pair programming. One of the things he asked me was "can you get your data modeler, ETL developer, and report developer all in a room together working on the same components all at once?" It occurred to me, then, that separation of development responsibilities might be a serious impediment to agile BI development.
As a former consultant, I've personally done a little bit of everything. I can data model well enough to build something workable; I've spent a lot of time writing ETL both by hand and with GUI tools; I'm a raw SQL hacker for sure; and I can even create a reasonable report or two that some VP wouldn't just use as a coaster. How often have I ever asked my staff to do that breadth of work, though? In larger organizations, I usually see and hear about separation of team: data modelers, ETL developers, reporting people. They're separate teams. That's always been under the guise of "developing technical expertise" within the team and driving consistency across projects. (Important goals for sure.)
However, when I look at successful agile software teams that I know about, that same level of separation isn't typically present. A single developer might do part of the UI, some of the dependent service modules, and the persistence layer. They're focused on delivering a particular function, not of some component of the overall application, but an external function of the application. This goes back to the previous conversation about sashimi, too [1] [2].
Of course there are some developers that are better at UI and other better at ORM, just like there are some BI folks better at data modeling and other better at data presentation. To enable more agile development, though, requires developers who are more willing and able to cross over those traditional boundaries in the work that they do. One of the leaders I work with today articulates this very well when he says that we just want "developers." What this does for agile is that it minimizes context switching and the spin between different developers working on the interrelated but different pieces of the same component. If an ETL developer decides she needs five extra fields on a table because she just learned that some previous assumption about the cardinality of a relationship was flawed, should that change require the synchronous work of:
There's a lot of opportunity for optimization there if one person is working on the tasks instead of several people. For about 66% of the teams time, they're working on something other than this one objective, and there's latency in the hand off between developers. (If you can create a parallel scheduling algorithm that gets the overall workload done faster, all things equal, than "one resource dedicated to completing all the steps in implementing each particular piece of functionality" and "helping each other out when there's nothing in the queue", please let me and every computer science professor on earth know.)
I think that for some teams this will be a challenge to their skill set and require developers to grow beyond their existing comfort zone. I'll argue that they'll be the better for it! For some teams it might be more of a challenge to ego than to actual skills: "you're going to let anyone data model?!"
The answer is "yes" and "we require an appropriate level of quality from everyone." That's why agile teams are ones with pair programming, peer reviews, and an approach that not just accepts but welcomes change.
For a team that isn't agile today, these things can't come piecemeal. If you want to be agile, you have to go all-in.
Several years ago, I was introduced to one of the founders of the St. Louis XP (Extreme Programming) group, Brian Button, and decided to sit down and have lunch with him. I explained what kind of work we do to build data warehouses, and he listened very politely. At the time, he was thinking mostly about test driven development and pair programming. One of the things he asked me was "can you get your data modeler, ETL developer, and report developer all in a room together working on the same components all at once?" It occurred to me, then, that separation of development responsibilities might be a serious impediment to agile BI development.
As a former consultant, I've personally done a little bit of everything. I can data model well enough to build something workable; I've spent a lot of time writing ETL both by hand and with GUI tools; I'm a raw SQL hacker for sure; and I can even create a reasonable report or two that some VP wouldn't just use as a coaster. How often have I ever asked my staff to do that breadth of work, though? In larger organizations, I usually see and hear about separation of team: data modelers, ETL developers, reporting people. They're separate teams. That's always been under the guise of "developing technical expertise" within the team and driving consistency across projects. (Important goals for sure.)
However, when I look at successful agile software teams that I know about, that same level of separation isn't typically present. A single developer might do part of the UI, some of the dependent service modules, and the persistence layer. They're focused on delivering a particular function, not of some component of the overall application, but an external function of the application. This goes back to the previous conversation about sashimi, too [1] [2].
Of course there are some developers that are better at UI and other better at ORM, just like there are some BI folks better at data modeling and other better at data presentation. To enable more agile development, though, requires developers who are more willing and able to cross over those traditional boundaries in the work that they do. One of the leaders I work with today articulates this very well when he says that we just want "developers." What this does for agile is that it minimizes context switching and the spin between different developers working on the interrelated but different pieces of the same component. If an ETL developer decides she needs five extra fields on a table because she just learned that some previous assumption about the cardinality of a relationship was flawed, should that change require the synchronous work of:
| Modeler | ETL Developer | Report Developer | |
| 1 | Changes data model | wait/other work | wait/other work |
| 2 | Deploys changes to DB | wait/other work | wait/other work |
| hand off | |||
| 3 | wait/other work | Import new metadata | wait/other work |
| 4 | wait/other work | Update ETL jobs | wait/other work |
| 5 | wait/other work | Unit test ETL | wait/other work |
| hand off | |||
| 6 | wait/other work | wait/other work | Run test queries |
| 7 | wait/other work | wait/other work | Update semantic layer |
| 8 | wait/other work | wait/other work | Update reports |
| N | And loop through for every change | ||
There's a lot of opportunity for optimization there if one person is working on the tasks instead of several people. For about 66% of the teams time, they're working on something other than this one objective, and there's latency in the hand off between developers. (If you can create a parallel scheduling algorithm that gets the overall workload done faster, all things equal, than "one resource dedicated to completing all the steps in implementing each particular piece of functionality" and "helping each other out when there's nothing in the queue", please let me and every computer science professor on earth know.)
I think that for some teams this will be a challenge to their skill set and require developers to grow beyond their existing comfort zone. I'll argue that they'll be the better for it! For some teams it might be more of a challenge to ego than to actual skills: "you're going to let anyone data model?!"
The answer is "yes" and "we require an appropriate level of quality from everyone." That's why agile teams are ones with pair programming, peer reviews, and an approach that not just accepts but welcomes change.
For a team that isn't agile today, these things can't come piecemeal. If you want to be agile, you have to go all-in.
Thursday, August 26, 2010
Growing a Tree versus Building a House
When we say that we want to build "good" software, we tend to use terms that come from other engineering fields: foundation, framework, scafolding, architecture. One of the things that the agile software movement has shown us is that good solutions can come from evolutionary models as well as construction models. The difference comes from the fact that code is far easy to manipulate that physical raw materials.
When building a data warehouse, we often draw traditional, stacked-tier pictures of the data architecture: data warehouse tables, semantic layer, data marts, etc. If we start our design discussions with an assumption that anything we "build on top of" has to be "solid" then we quickly drive the overall solution away from agility. "Solid" conjures an image of a concrete foundation that has to be built to withstand floods and earthquakes. If we find a crack in our foundation, it has to be patched so the things on top don't come crumbling down.
If, instead, we try to imagine a conceptual architecture that has in mind goals of adaptability to purpose (rather than firmness) and loose coupling (rather than high contact), you can begin to imagine a higher level of agility. Look at the picture from the start of this post (from webecoist). The trees are being shaped and molded into a purpose built structure. If, part-way through the growth process, the structure needed to change to be another 6 inches higher or hold a second story of some kind, the necessary changes could be interwoven into the growth already complete. If we were constructing a new art museum and decided, half way through, that we wanted a library instead, we'd have to make some major changes or compromised to account for the fact that the foundation was only designed to hold the weight of portraits, not stacks of books.
This conceptual discussion about growing something organically rather than building it from the ground up is directly related to the sashimi discussion from yesterday. A legacy build approach says data model, build ETL, build semantic layer, build reports. There aren't any opportunities in that model to create meaningful yet consumable vertical slices.
I hear some agile BI conversations only go halfway toward the mind shift that I think is necessary. These "think big, act small" solutions sound like a model where the only change is that you poor some of the concrete foundation at a time. Building a house using this semi-agile approach:
Iteration One:
An organic build approach says plant a seed, watch it grow. First, a sprout appears, with roots a stem and leaves. The stem gets thicker, the roots grow deeper, and more leaves sprout. Branches grow. Flowers bud and fruit appears. When requirements change some pruning and grafting is required, but you don't have to tear down the tree and plan a new one from scratch or start a new tree on the side. The tree will grow around power lines and rocks and other trees as needed.
There's the mindset. I don't think it's easy to shift from a constructionist perspective to an organic one. Success in agile BI requires this change in thinking, though. If you're still laying foundations and screwing sheetrock onto studs, your attempt at agile BI will not be optimal.
Good luck with that.
When building a data warehouse, we often draw traditional, stacked-tier pictures of the data architecture: data warehouse tables, semantic layer, data marts, etc. If we start our design discussions with an assumption that anything we "build on top of" has to be "solid" then we quickly drive the overall solution away from agility. "Solid" conjures an image of a concrete foundation that has to be built to withstand floods and earthquakes. If we find a crack in our foundation, it has to be patched so the things on top don't come crumbling down.
If, instead, we try to imagine a conceptual architecture that has in mind goals of adaptability to purpose (rather than firmness) and loose coupling (rather than high contact), you can begin to imagine a higher level of agility. Look at the picture from the start of this post (from webecoist). The trees are being shaped and molded into a purpose built structure. If, part-way through the growth process, the structure needed to change to be another 6 inches higher or hold a second story of some kind, the necessary changes could be interwoven into the growth already complete. If we were constructing a new art museum and decided, half way through, that we wanted a library instead, we'd have to make some major changes or compromised to account for the fact that the foundation was only designed to hold the weight of portraits, not stacks of books.
This conceptual discussion about growing something organically rather than building it from the ground up is directly related to the sashimi discussion from yesterday. A legacy build approach says data model, build ETL, build semantic layer, build reports. There aren't any opportunities in that model to create meaningful yet consumable vertical slices.
I hear some agile BI conversations only go halfway toward the mind shift that I think is necessary. These "think big, act small" solutions sound like a model where the only change is that you poor some of the concrete foundation at a time. Building a house using this semi-agile approach:
Iteration One:
- Pour foundation for the kitchen only.
- Build kitchen walls.
- Wire up kitchen outlets.
- Install kitchen plumbing.
- Pour foundation for the family room only.
- Build family room walls.
- Realize you need to tear out a kitchen wall to open to family room.
- Reroute electricity in that wall.
- Rerun wiring to kitchen
- Run new wiring to family room
An organic build approach says plant a seed, watch it grow. First, a sprout appears, with roots a stem and leaves. The stem gets thicker, the roots grow deeper, and more leaves sprout. Branches grow. Flowers bud and fruit appears. When requirements change some pruning and grafting is required, but you don't have to tear down the tree and plan a new one from scratch or start a new tree on the side. The tree will grow around power lines and rocks and other trees as needed.
There's the mindset. I don't think it's easy to shift from a constructionist perspective to an organic one. Success in agile BI requires this change in thinking, though. If you're still laying foundations and screwing sheetrock onto studs, your attempt at agile BI will not be optimal.
Good luck with that.
Wednesday, August 25, 2010
Sashimi (An Agile BI Lesson for Floundering Teams)
The most recent TDWI conference generated a lot of conversation around what Agile BI means and how agile principles and practices from traditional software development can and can't be applied to business intelligence projects. I wasn't able to be at the TDWI conference and attend the presentations, but there's been a lot of chatter.
For those of you not familiar with the use of the term sashimi in this context, the gist is that sashimi is the art of slicing up a problem space into pieces that are at the same time independently valuable as well as quickly achievable. In an app dev project, what this means is creating a so-called walking skeleton that exercises only as many pieces of the overall solution as necessary to deliver something that is actually usable by a user. For example, if I'm building an application that's going manage medical claim payments, maybe all the first slice does is retrieve one claim from the database and display it on the screen. Then as work progresses toward the first 90-day release, more and more meet is built up on top of that skeleton, refactoring various pieces of the stack as necessary along the way. Good sashimi results in ever increasing value to end users with only as little bulk on the skeleton as necessary to achieve that.
What does good sashimi for a BI project look like?
I think that it looks the same, but feels much harder to accomplish, especially when you have a enterprise scale strategy for data warehousing and business intelligence. Imagine that you need to deliver a new reporting dashboard for department managers to do predictive labor modeling. The minimal vertical slice for that solution could include:
In traditional BI projects that I've been involved in, the project plan would call for building the solution mostly in the order shown above: bring the data in, understand the data, build a data mart, wrap it with a semantic layer, and deliver the dashboard. Along the way, you'd probably have a subteam prototyping and testing out the dashboard UI and maybe someone doing some data profiling to speed data analysis along; but the back-end pieces of development, especially, are likely to happen in stacked order.
Building a walking skeleton in software requires you to be able to refactor the bones along the way. As the analogy goes, the first version of the walking skeleton might have just one leg and one toe that attaches directly to the spine and up to the head. As the product evolves, the leg bone gets refactored into femur, patella, tibia, and fibula; more toes get added for stability; and a new set of hip bones is created. All of those change to the base skeleton in order to add muscles, skin, and clothing.
As we layer things in a traditional BI project, we often try to keep a more detailed big picture in mind up front. I know the final product is going to have two legs, that bend at the knee, need to be able to support independent orbital motion, and maintain upright stability of a 200 pound body. That all leads to five toes, several leg bones, and hips from the very beginning. An agile approach would ensure that we can notice early on that the business doesn't really need a biped mammal, but a fish. That traditional approach results in a lot of wasted assumptions and potentially wasted work. The agile approach allows for the easy reuse of what can be kept from the skeleton (spine) and a refactoring of the other pieces (leg becomes fin, toe becomes tail).
That's a lot of metaphor there, all to say that one of the requirements of agile development the ability to picture work in those thin vertical slices of functionality that deliver as much value to users as possible with as little commitment under the covers as necessary. That requires both a mind set as well as an architecture that will allow developers to quickly refactor components in the stack without having to deal with exorbitant dependencies. In an enterprise BI environment where source systems are feeding many systems, data warehouses have lots of application and direct user dependencies, and semantic reporting tools are tightly coupled to database objects, this ability to refactor requires a flexible architecture with clear boundaries between components. Examples that may be useful:
There are probably other good architecture to support this kind of agile sashimi for BI solutions. Remember to focus on the goal of being to deliver as much value as possible to end users with as little effort as possible, in every release. That's what this agile lesson is about. You have to change how you thing to get here, though. That will be the next post.
- Jill Dyche (@jilldyche
- Chris Sorensen (@wjdataguy)
- Standing up for Agile BI (sort of)
- Agile Meeting Integration
- Agile? You Bet. And You Should
For those of you not familiar with the use of the term sashimi in this context, the gist is that sashimi is the art of slicing up a problem space into pieces that are at the same time independently valuable as well as quickly achievable. In an app dev project, what this means is creating a so-called walking skeleton that exercises only as many pieces of the overall solution as necessary to deliver something that is actually usable by a user. For example, if I'm building an application that's going manage medical claim payments, maybe all the first slice does is retrieve one claim from the database and display it on the screen. Then as work progresses toward the first 90-day release, more and more meet is built up on top of that skeleton, refactoring various pieces of the stack as necessary along the way. Good sashimi results in ever increasing value to end users with only as little bulk on the skeleton as necessary to achieve that.
What does good sashimi for a BI project look like?
I think that it looks the same, but feels much harder to accomplish, especially when you have a enterprise scale strategy for data warehousing and business intelligence. Imagine that you need to deliver a new reporting dashboard for department managers to do predictive labor modeling. The minimal vertical slice for that solution could include:
- New tables in a staging area from a new source system, with
- New ETL jobs to load data into...
- New tables in an enterprise data warehouse, and
- New tables in a brand new data mart, and
- New objects in a semantic reporting tool (e.g. universe or model), and
- (Finally) your new dashboard.
In traditional BI projects that I've been involved in, the project plan would call for building the solution mostly in the order shown above: bring the data in, understand the data, build a data mart, wrap it with a semantic layer, and deliver the dashboard. Along the way, you'd probably have a subteam prototyping and testing out the dashboard UI and maybe someone doing some data profiling to speed data analysis along; but the back-end pieces of development, especially, are likely to happen in stacked order.
Building a walking skeleton in software requires you to be able to refactor the bones along the way. As the analogy goes, the first version of the walking skeleton might have just one leg and one toe that attaches directly to the spine and up to the head. As the product evolves, the leg bone gets refactored into femur, patella, tibia, and fibula; more toes get added for stability; and a new set of hip bones is created. All of those change to the base skeleton in order to add muscles, skin, and clothing.
As we layer things in a traditional BI project, we often try to keep a more detailed big picture in mind up front. I know the final product is going to have two legs, that bend at the knee, need to be able to support independent orbital motion, and maintain upright stability of a 200 pound body. That all leads to five toes, several leg bones, and hips from the very beginning. An agile approach would ensure that we can notice early on that the business doesn't really need a biped mammal, but a fish. That traditional approach results in a lot of wasted assumptions and potentially wasted work. The agile approach allows for the easy reuse of what can be kept from the skeleton (spine) and a refactoring of the other pieces (leg becomes fin, toe becomes tail).
That's a lot of metaphor there, all to say that one of the requirements of agile development the ability to picture work in those thin vertical slices of functionality that deliver as much value to users as possible with as little commitment under the covers as necessary. That requires both a mind set as well as an architecture that will allow developers to quickly refactor components in the stack without having to deal with exorbitant dependencies. In an enterprise BI environment where source systems are feeding many systems, data warehouses have lots of application and direct user dependencies, and semantic reporting tools are tightly coupled to database objects, this ability to refactor requires a flexible architecture with clear boundaries between components. Examples that may be useful:
- Nothing but the job that loads a table should ever reference it directly. Always have a layer between physical database objects and the users or user applications, even if it's a layer of "select *" views.
- Only one job (or an integrated set of jobs) should load a given table. That job should have a versioned interface so that source systems don't all have to be enhanced when the target table changes.
- Each independent application should have an independent interface into the data (read: data mart, views, etc)
- Refactoring involves moving logic between layers of the solution stack: promote something from a data mart down to an enterprise data warehouse when an opportunity for reuse is identified; demote something from enterprise data warehouse to data mart when it's clearly application specific. Make sure that however you build your solution, you can move things between layers easily.
- Have each layer interface with only the next layer above/below it. Don't allow the design to cross over abstraction boundaries (e.g. having a report directly access staging tables instead of pulling the data into the data warehouse and on up the chain to the report).
- Build as little code as necessary to get something from one abstraction layer to the next, even if that means a simple "select *" view rather than building a full ETL job with surrogate key management, SCD Type-2 logic, and data cleansing rules. But also make sure you've built an abstraction between the data warehouse and the report so that when you add all of those features to the data warehouse, you don't necessarily have to go update all of the reports that have been built.
There are probably other good architecture to support this kind of agile sashimi for BI solutions. Remember to focus on the goal of being to deliver as much value as possible to end users with as little effort as possible, in every release. That's what this agile lesson is about. You have to change how you thing to get here, though. That will be the next post.
Monday, August 23, 2010
Business Keys
I'm engaged in a project that's actively using key concepts from Dan Linstedt's Data Vault methodology. There are lots of very powerful benefits that we seem to be realizing with this methodology, but this series of blog posts won't be particularly about use of the Data Vault. Still, I felt it was appropriate to credit the Data Vault for helping provide a structure for our own discovery.
This first post is about the struggle to identify keys for business entities. We set forth some fundamental principles when we started out on our latest large scale project. First and foremost, we would "throw nothing away". What that's meant is that we want to design the foundation of our reporting database to be a reflection not just of one department's truth, but all of the truths that might exist across the enterprise.
As a result, the design of every major entity has run into the same challenge: "what is the business key for this entity?" Well, from System A, it's the abc code. From System D it's the efg identifier. But if someone put in the xyz ID that the government assigns, then you can use System D to get to this industry file that we get updated every 3 months and link that back to an MS Access database that Ms. So-and-so maintains. Ack! Clearly we can't just say an apple is an apple. And clearly there's a data governance issue at play in this scenario also.
In one case, some of these are legacy systems that simply are what they are and aren't worth investing additional time and energy into.
Our data modeling challenge is to determine what the one business key would be for all the different instances of this one logical entity. When confronted with the challenge of having no clear business key, the project wanted to "just add a source system code and be done with it." I pushed hard against this approach for a couple of weeks, insisting that the team keep going back and working harder to dig up what could be a true business key. Eventually, I realized that I was both working contrary to one of the original goals I'd set forth and becoming the primary roadblock to progress.
What I realized I was doing was trying to put a huge and potentially very dynamic piece of logic out in front of our need to simply capture the information that was being created by source systems. So, we instituted what felt like a completely counter-intuitive design standard: every business key would include source system code as part of the compound key; and we would defer the need to consolidate and deduplicate instances of an entity until after all the underlying data had first been captured.
Deduplicating is the immediate next process, but this allows to be sure that we've captured all the raw information from the source system first before throwing away the fact that there are different versions of the truth in different source systems.
A very powerful lesson for us that felt very counter-intuitive; that we started considering for the wrong reasons; and finally decided to follow through on for all the right reasons!
This first post is about the struggle to identify keys for business entities. We set forth some fundamental principles when we started out on our latest large scale project. First and foremost, we would "throw nothing away". What that's meant is that we want to design the foundation of our reporting database to be a reflection not just of one department's truth, but all of the truths that might exist across the enterprise.
As a result, the design of every major entity has run into the same challenge: "what is the business key for this entity?" Well, from System A, it's the abc code. From System D it's the efg identifier. But if someone put in the xyz ID that the government assigns, then you can use System D to get to this industry file that we get updated every 3 months and link that back to an MS Access database that Ms. So-and-so maintains. Ack! Clearly we can't just say an apple is an apple. And clearly there's a data governance issue at play in this scenario also.
In one case, some of these are legacy systems that simply are what they are and aren't worth investing additional time and energy into.
Our data modeling challenge is to determine what the one business key would be for all the different instances of this one logical entity. When confronted with the challenge of having no clear business key, the project wanted to "just add a source system code and be done with it." I pushed hard against this approach for a couple of weeks, insisting that the team keep going back and working harder to dig up what could be a true business key. Eventually, I realized that I was both working contrary to one of the original goals I'd set forth and becoming the primary roadblock to progress.
Interesting side note: One of the better tricks of good software design is to defer decisions to the last possible minute. If you get away without writing some piece of code, then best to put it off until you have to write it. There's obviously some nuance and art to understanding how to leverage that. The Strategy Pattern is a good example, though.
What I realized I was doing was trying to put a huge and potentially very dynamic piece of logic out in front of our need to simply capture the information that was being created by source systems. So, we instituted what felt like a completely counter-intuitive design standard: every business key would include source system code as part of the compound key; and we would defer the need to consolidate and deduplicate instances of an entity until after all the underlying data had first been captured.
Deduplicating is the immediate next process, but this allows to be sure that we've captured all the raw information from the source system first before throwing away the fact that there are different versions of the truth in different source systems.
A very powerful lesson for us that felt very counter-intuitive; that we started considering for the wrong reasons; and finally decided to follow through on for all the right reasons!
Thursday, May 27, 2010
Reading List
Last week, I got hear John Ladley present on Master Data Management. I'm excited to get his new book, Enterprise Information Management, which publishes tomorrow! I followed up with an email conversation with him and have suddenly increased my reading list by about 10 new books he suggested I pick up and read -- and I thought my shelf was pretty well stocked!
On Tuesday, I walked into my office to find a present for me on my desk. It was Business Intelligence, a newly published book by local UMSL professor Rajiv Sabherwal. I know Rajiv through my membership on the UMSL IS Board of Advisors, and it turns out that his neighbor works with me, too.
Speaking of books, I carry around a printed out draft of my PragProWriMo book from last November. It's a good reminder about the importance of communication. I originally had the goal of revising a couple of chapters and submitting them in January, but I've adjusted that to be revise during PragProWriMo'11 and submit after that.
On Tuesday, I walked into my office to find a present for me on my desk. It was Business Intelligence, a newly published book by local UMSL professor Rajiv Sabherwal. I know Rajiv through my membership on the UMSL IS Board of Advisors, and it turns out that his neighbor works with me, too.
Speaking of books, I carry around a printed out draft of my PragProWriMo book from last November. It's a good reminder about the importance of communication. I originally had the goal of revising a couple of chapters and submitting them in January, but I've adjusted that to be revise during PragProWriMo'11 and submit after that.
Friday, April 30, 2010
Evolution of Business Communication
Steps in corporate culture transformation:
Also somewhere in there is the crazy scenario where I get email a PDF that was clearly created by someone who scanned a printed copy of a PowerPoint presentation. I guess that worse, yet, would be a PDF created by scanning a printed copy of a wiki article.
- Couriering around printed documents
- Faxing documents to each other
- Emailing documents as attachments
- Emailing links to documents on sharepoint
- Emailing links to enterprise wiki articles
- Collaborating in the article and notified of changes via email
- Collaborating in the article and seeing RSS updates in the enterprise portal
Also somewhere in there is the crazy scenario where I get email a PDF that was clearly created by someone who scanned a printed copy of a PowerPoint presentation. I guess that worse, yet, would be a PDF created by scanning a printed copy of a wiki article.
Saturday, April 24, 2010
Complexity = Agile Simplicity
I'm a major advocate of traditional simplicity principles like KISS, YAGNI, DRY, the Unix Philosophy, etc. I'm also very interested in complexity and solving large complex problems. It occurred to me during an all day "fix this process problem" meeting today that the ideas of complexity and simplicity aren't actually antonyms. Sure, by definition, they are, but I think the paradox in that comparison is that perhaps complexity can be defined as simplicity that changes over time.
My perception of complex problems is that given the examples of any one particular moment in time, the situation can be easily dissected, analyzed, documented, and understood. Take a sample from the next moment in time and the same is true. Try to combine that collection of understandings into a generalization that can be applied to other past and future points in time and the problem suddenly becomes complex.
One of the observations we get from agile development is that solutions will be more correct, given the flexibility to adjust to customer needs versus following a previously defined historical specification. Agility is the ability to adjust to change.
Therefore, complex problems should be solvable by solutions that are simple and agile. Solutions do not have to be complex.
We run into challenges designing agile solutions, though. Many traditional solution design tools often call for static process flow diagrams, swim lane control charts, concrete data models, class diagrams, etc. I think that some of the behavioral object oriented patterns give us some clues on how to introduce agility into solutions, but understanding when to apply those (and how to apply them within some technologies) takes creativity and experience.
I think that introducing that same kind of solution agility into human processes is also very challenging. Often, we want clearly defined instructions and flow charts to instruct individuals on exactly what to do. The only variation being a Madlib style fill-in-the-blank. Perhaps we need more "and then a miracle occurs" steps in our complex processes. And perhaps that's both acceptable and desirable in some processes.
My perception of complex problems is that given the examples of any one particular moment in time, the situation can be easily dissected, analyzed, documented, and understood. Take a sample from the next moment in time and the same is true. Try to combine that collection of understandings into a generalization that can be applied to other past and future points in time and the problem suddenly becomes complex.
One of the observations we get from agile development is that solutions will be more correct, given the flexibility to adjust to customer needs versus following a previously defined historical specification. Agility is the ability to adjust to change.
Therefore, complex problems should be solvable by solutions that are simple and agile. Solutions do not have to be complex.
We run into challenges designing agile solutions, though. Many traditional solution design tools often call for static process flow diagrams, swim lane control charts, concrete data models, class diagrams, etc. I think that some of the behavioral object oriented patterns give us some clues on how to introduce agility into solutions, but understanding when to apply those (and how to apply them within some technologies) takes creativity and experience.
I think that introducing that same kind of solution agility into human processes is also very challenging. Often, we want clearly defined instructions and flow charts to instruct individuals on exactly what to do. The only variation being a Madlib style fill-in-the-blank. Perhaps we need more "and then a miracle occurs" steps in our complex processes. And perhaps that's both acceptable and desirable in some processes.
Sunday, April 18, 2010
Digging Holes
The following parable is adapted from one that I was forwarded by a friend this week... It seemed a good analogy for a poor IT Service Management philosophy.
Two IT directors changed jobs and were working for the city public works department. One would dig a hole and the other would follow behind her and fill the hole in. They worked up one side of the street, then down the other, then moved on to the next street, working furiously all day without rest, one director digging a hole, the other director filling it in again.
An onlooker was amazed at their hard work, but couldn't understand what they were doing. So he asked the hole digger, "I'm impressed by the effort you two are putting in to your work, but I don't get it -- why do you dig a hole, only to have your partner follow behind and fill it up again?"
The hole digger wiped her brow and sighed, "Well, I suppose it probably looks odd because we're normally a three-person team. But today the guy who plants the trees called in sick."
Sunday, April 11, 2010
Personal Best

Today's run... The Go! St. Louis Half Marathon
My first official race!
Distance: 13.1 miles
Time: 2:34
Place: 7,520th
It was awesome!
Saturday, March 20, 2010
What's the gap?
At the office, we're working on a strategy around how we maintain and publish various types of technical information and instructions. For instance, there's been a big emphasis around transition of system maintenance and support responsibility from project/implementation teams to support teams. What kind of documentation is required? Where should that documentation be stored? What format? Who should create it? etc.
One of the big cultural battles has been a solution of MS Office documents and MS SharePoint versus MediaWiki. It'll be obvious, but just to lay it out up front, I stand firmly on the MediaWiki side of this discussion.
On the Office/Sharepoint side, you have an argument that "everyone knows how to use MS Office applications; cut and paste of screenshots is easy; you don't have to know how to program the wiki."
On the MediaWiki side, you have an argument that "Sharepoint organization doesn't make any sense; searching across different sites is awkward; you always have open a separate document in a separate application to see what you really want to see; and it just doesn't feel webby enough."
This argument from the SharePoint side that you have to program the wiki is the one their leadership is most adamant about. "Our analysts aren't programmers," they say. We're using a wysiwyg editor! Is a little wikitext markup really programming? I think the ability to pickup on a little wikitext is just like the ability to learn how to use formulas in MS Excel... and if you can't write a simple SUM() or =A1+B1 is MS Excel, then you don't have any business being in an IS job... or really any business support job for that matter.
Perhaps that's a strong statement, but I think that any IS person should feel comfortable picking up a little HTML or wikitext markup. I often hold up my wife, an English major / office worker / writer, as an example of "if she can do it, then an IS person should be able to do it!" But perhaps I'm looking at the wrong set of characteristics.
Maybe the gap is a generational/cultural one rather than an educational/cultural one. There's probably a more articulate way to describe that, but what I'm getting at is that my wife is also comfortable with blogging, Facebook, and the online / social community in general. I wonder what percentage of people in the SharePoint camp are regular contributors to blogs, Facebook, Twitter, or other social networks?
What is the gap between someone who thinks the business world is a collection of MS Word documents and someone who things the world is a more directly accessible, public, and transparent collection of content? And how do we get people across that gap?
One of the big cultural battles has been a solution of MS Office documents and MS SharePoint versus MediaWiki. It'll be obvious, but just to lay it out up front, I stand firmly on the MediaWiki side of this discussion.
On the Office/Sharepoint side, you have an argument that "everyone knows how to use MS Office applications; cut and paste of screenshots is easy; you don't have to know how to program the wiki."
On the MediaWiki side, you have an argument that "Sharepoint organization doesn't make any sense; searching across different sites is awkward; you always have open a separate document in a separate application to see what you really want to see; and it just doesn't feel webby enough."
This argument from the SharePoint side that you have to program the wiki is the one their leadership is most adamant about. "Our analysts aren't programmers," they say. We're using a wysiwyg editor! Is a little wikitext markup really programming? I think the ability to pickup on a little wikitext is just like the ability to learn how to use formulas in MS Excel... and if you can't write a simple SUM() or =A1+B1 is MS Excel, then you don't have any business being in an IS job... or really any business support job for that matter.
Perhaps that's a strong statement, but I think that any IS person should feel comfortable picking up a little HTML or wikitext markup. I often hold up my wife, an English major / office worker / writer, as an example of "if she can do it, then an IS person should be able to do it!" But perhaps I'm looking at the wrong set of characteristics.
Maybe the gap is a generational/cultural one rather than an educational/cultural one. There's probably a more articulate way to describe that, but what I'm getting at is that my wife is also comfortable with blogging, Facebook, and the online / social community in general. I wonder what percentage of people in the SharePoint camp are regular contributors to blogs, Facebook, Twitter, or other social networks?
What is the gap between someone who thinks the business world is a collection of MS Word documents and someone who things the world is a more directly accessible, public, and transparent collection of content? And how do we get people across that gap?
Monday, February 8, 2010
Living on the Edge
I've been working for a while on explaining the value and importance of Enterprise Information Management. While I usually get some good ideas from Wikipedia, the article there today has let me down. As referenced in the Wikipedia article, Gartner and Forrester have some valuable things to say on the topic. But I believe that it can really be boiled down to a simple and practical explanation, given one assumption: in a business organization with optimal management of information, no individual business unit will appear to be fully optimized even though the overall organization is optimized. Here's why:
Imagine the information flow through the business units of an organization as a connected network of nodes. Each edge in the graph represents some type of process (automated or manual) that moves information from one department to another. The Information Supply Chain model describes how each of those edges has a cost associated with it. Each edge, to be worth while, must also create some added value (either through reduced effort via automation or added meaning). The cost / benefit sides of those edges, though, don't typically come from / contribute to the same departmental bottom line. Typically it is a matter of the originating business unit paying the cost of additional data collection or manipulation so that a receiving business unit can benefit.
In the VERY simple diagram above, the argument is obvious. The Admitting department in a hospital has the purpose of collecting information from a patient that other departments will need in order to do their job effectively. Admitting collects patient information (e.g. contact information, primary care physician, insurance information) once so that other departments can all benefit from it. Surgery uses the information collected during Admitting to retrieve the patient's medical record and orders. Billing uses the same patient information plus the additional information about what procedures were performed by Surgery to create invoices to payers (who will use similar information to try to avoid paying the bills).
It would be inefficient if the Surgery department had to collect from you the information it needs to find your medical record and orders; then have your surgical procedure followed immediately by a visit from the billing department to collect the same information about you so that they could proceed with coding and billing processes. This clearly does happen sometimes. Occasionally with good cause, but often because of redundancies between systems, and sometimes because the process doesn't think to collect a piece of information up front. For instance, Admitting may not have any reason to ask "do you have any allergies" because that isn't necessary to complete their assignment of "log that the patient arrived and notify surgery." So, Surgery has to ask the additional questions that are important to it "have you eaten," "do you have any allergies," etc. With some of those questions, significant time and safety risks could be avoided if they are asked as early in the encounter as possible.
So, it seems to me that Enterprise Information Management is most importantly about the management of the edges of that graph and has a direct impact on the efficiency of the Information Supply Chain. Those edges between the nodes of the diagram aren't merely straight lines that magically move information from one business process to another. They are interfaces and systems and business processes that cost significant time, money, and risk to quality.
Why can't we expect each business unit to simply do what will result in an optimal collection and movement of information? Not because they're maliciously selfish about their time or resources, but because individual business units don't usually have the perspective to fully understand the down stream value of their own operations. Enterprise Information Management is the work that lives in between business units and drives overall optimization of the edges between them. Enterprise Information Management is something that lives between and outside of individual business units. Business units can be counted on to optimize their own internal operations. Enterprise Information Management has to be planned and managed explicitly, above and beyond departmental objectives.
Imagine the information flow through the business units of an organization as a connected network of nodes. Each edge in the graph represents some type of process (automated or manual) that moves information from one department to another. The Information Supply Chain model describes how each of those edges has a cost associated with it. Each edge, to be worth while, must also create some added value (either through reduced effort via automation or added meaning). The cost / benefit sides of those edges, though, don't typically come from / contribute to the same departmental bottom line. Typically it is a matter of the originating business unit paying the cost of additional data collection or manipulation so that a receiving business unit can benefit.
In the VERY simple diagram above, the argument is obvious. The Admitting department in a hospital has the purpose of collecting information from a patient that other departments will need in order to do their job effectively. Admitting collects patient information (e.g. contact information, primary care physician, insurance information) once so that other departments can all benefit from it. Surgery uses the information collected during Admitting to retrieve the patient's medical record and orders. Billing uses the same patient information plus the additional information about what procedures were performed by Surgery to create invoices to payers (who will use similar information to try to avoid paying the bills).
It would be inefficient if the Surgery department had to collect from you the information it needs to find your medical record and orders; then have your surgical procedure followed immediately by a visit from the billing department to collect the same information about you so that they could proceed with coding and billing processes. This clearly does happen sometimes. Occasionally with good cause, but often because of redundancies between systems, and sometimes because the process doesn't think to collect a piece of information up front. For instance, Admitting may not have any reason to ask "do you have any allergies" because that isn't necessary to complete their assignment of "log that the patient arrived and notify surgery." So, Surgery has to ask the additional questions that are important to it "have you eaten," "do you have any allergies," etc. With some of those questions, significant time and safety risks could be avoided if they are asked as early in the encounter as possible.
So, it seems to me that Enterprise Information Management is most importantly about the management of the edges of that graph and has a direct impact on the efficiency of the Information Supply Chain. Those edges between the nodes of the diagram aren't merely straight lines that magically move information from one business process to another. They are interfaces and systems and business processes that cost significant time, money, and risk to quality.
Why can't we expect each business unit to simply do what will result in an optimal collection and movement of information? Not because they're maliciously selfish about their time or resources, but because individual business units don't usually have the perspective to fully understand the down stream value of their own operations. Enterprise Information Management is the work that lives in between business units and drives overall optimization of the edges between them. Enterprise Information Management is something that lives between and outside of individual business units. Business units can be counted on to optimize their own internal operations. Enterprise Information Management has to be planned and managed explicitly, above and beyond departmental objectives.
Thursday, February 4, 2010
How Not to Clean Data
PREFACE: Even if you don't have some familiarity with the inside of a hard drive, you'll probably still be troubled by this story from my past. To answer the obvious questions that you'll have after the story: Yes, I really do have a legitimate degree in Electrical Engineering. No, don't worry, I've never been employed in a way that uses that degree in a significant way to design or build any products that you might own.
I once had an old hard drive that started making a bit of a racket. It didn't stop working right away, but I was concerned that there was something wrong with it. I thought that maybe I could figure out what was wrong if I opened it up and poked around inside. At that point I'd never seen the inside of a hard drive in real life. So, I invested in my first set of star-point screw drivers and carefully disassembled the case of the hard drive. Even after the screws were out, the metal cover stuck a bit. It seemed like there was some kind of seal that had it closed, so I used a flat head screw driver to pry it open.
Wow. Shiny. Really clean.
I plugged the hard drive back in, while both the computer case and the hard drive case were open, and booted up the computer. Cool. It spins! I watched it spin up, and the computer boot. Everything working great. That little moving arm is really neat, too. It bounces back and forth really quickly! So, I ran some programs on the computer and started listening for the noises that I thought were signifying an imminent disaster. The hard drive just sounded rough. Something like ball bearings worn down or the spindle just getting sticky. Logically, I got out my WD-40, with the little red straw to make sure I could target the center of the spindle.
Squirt.... Squirt... drip, drip.
Well, let's see if this works for a while. Maybe that was enough to quiet the drive down.
Things working fine. Then the head did a seek and ran right through a drip of WD-40 and smeared across the platter. Is that bad? Then the actuator arm started thrashing back and forth, clicking hard against the center of the spindle and back against the outer wall of the case. Clunk. Clunk. CLUNK. Whirrrrr..rr...r.... Quiet. Computer locked up. Hard drive stopped.
Uh oh...
Maybe if I clean that WD-40 off of the platter it will work again?
So, I got out my trusty Goo Gone and a soft rag to remove the extra drips of WD-40 that were now smeared across the top platter of the hard drive. Rub, rub. Wipe. Rub, rub. Polish. That looks pretty good. Let's spin it back up and see. W....h...i.rrrrrrrrrrr. OK, that sounds pretty.... CLUNK. Clunk. Clunk. Clunk. Unplug the computer.
I worked on this for a couple of hours. I used more Goo Gone. I used alcohol -- both the rubbing kind to clean the platter and the drinking kind to calm my frustration. In the end, I was able to get the drive spun up long enough to retrieve some files. This was still an age when most of my working documents were on floppy disks, because I needed to carry those between different computers. So, luckily, there was no important data lost.
I've been thinking a lot lately about how "broken" business processes impact data quality and data integrity -- thinking about the ways we look at trying to keep the data inside those disks clean and running smoothly. Sometimes we look at things from a perspective that is too distant, with a too limited understanding of the context of the processes that we're examining, and act too quickly and too inexpertly without taking time to understand the nuances of the systems and business processes involved. We do things that we think will help (implement governance processes and quality screens) and end up sending the system into a tailspin. Things do recover from that dive, but not without a major investment in time and energy.
Subscribe to:
Posts (Atom)















