Friday, November 2, 2012

IT Strategy by Tinker Bell

In our house, Friday's are "Pizza and a Movie Night."  Tonight, my girls chose the 2008 Tinker Bell movie.  Spoiler Alert!  The plot goes like this: Bell discovers that her fairy talent is tinkering - building the tools that other fairies use to do their work and change the seasons on the main land.  Tinker Bell struggles with accepting her talent for a while.  She rejects her own talent and tries to learn the other fairies' talents instead.  Eventually, she discovers that she's really a very innovative tinker fairy.  Her lack of faith in her own talent causes her to put the arrival of Spring at risk.  Then, by embracing her talent and flare for innovation, she's able to lead the other fairies with her inventions and save Spring!  As a reward, she also discovers a way that tinker fairies can perform a valuable service in returning lost things to children on the main land during the changing seasons.

Hopefully you already see the comparison between tinker fairies and typical IT co-workers.  The tinker fairies are an invaluable support team that work behind the scenes to make sure that all the other talent fairies can do their work effectively.  They deliver a great service, but that rarely drive real change in how work is done by other fairies... until Tinker Bell comes along, that is.  She spends time understanding what the other fairies do, what their challenges are, and applies her inherent tinker-talent to fundamentally change how they do their work.  Wow!  That's and innovative IT co-worker!!

At the same time the kids were watching Tinker Bell, I was reading Business / IT Fusion by Peter Hinssen.  I can't speak for Peter, but I think Tink would make a great IT 2.0 co-worker.  She's got the talent to see how her tools and materials can come together to solve problems, and she has a great ability to understand the challenges that the business of the fairies is confronted with.  Her innovations don't just make the existing processes more efficient, but drive them to be all together different processes.

My recommendation:
Read Business / IT Fusion.
Watch Tinker Bell.
Don't tell anyone that you got your IT strategy from a Disney movie!


Wednesday, July 11, 2012

MapReduce Programming

I'm just starting to get into working with Hadoop and MapReduce and surprised myself with how quickly I was able to put together a real first MapReduce program.  In about four hours or programming today, I build a program that takes as input a log of events for different subjects and computes a frequency table for the time interval between consecutive events tied to the same subject.

That is, what are the typical intervals between events for the same subject?  Is there a a natural point at which activity tapers off and is sure not to return for an extended period of time?

The idea is to be able to identify how to long the minimum gap is between two separate experiences or interactions.

Here's my MapReduce Implementation

  1. Map the incoming data, partition by subject and do a secondary sort on timestamp.
  2. Reduce that down by looping through the series of timestamps, computing the interval between the current timestamp and the previous one, and counting up the occurrences of each interval (in seconds).
  3. Have the output keyed by interval.
  4. Reduce that again, summing up the separate counts by patient for a given interval into a single count of occurrences for each interval, across all patients.
  5. Map that all into a single partition and write out the data in a meaningful way.

With some existing template code, I cranked that out using MapReduce and Protocol Buffers, figured out some Maven dependencies, set up my environment to support ProtocolBuffers, and built and tested the code.  Not bad for a half-day of work.  I credit my iced mocha from Starbucks, and maybe the fact that I was sitting at Starbucks for much of that time.

Now onto something more sophisticated.  Machine Learning.



Saturday, July 7, 2012

On Executive Sponsorship

My main priority right now is the establishment of data stewardship groups. On the one hand, my gut tells me that these will be not effective if they are built up by mid-level functional leaders rather than through executive mandate. On the other hand, executive sponsorship sometimes gets things accomplishes rapidly.

What's the right balance? Here are my rules of thumb.

1. Never say "because such-and-such a VP said this a priority" or worse yet "because so-and-so's bonus is riding on this." I actually heard that once!

2. Always include the people who will be doing the work in the decision making process. It'll lead to better long term success.

3. Make sure the people doing the work know what old habits or old processes can (and need to) stop. If stewardship is just more work then it won't create the efficiencies it is predicated on.

What do you think?

Friday, June 29, 2012

Big Data

It's taken me a while to get into the idea of "big data" as something special. I've been working on largish structured data sets for a while, and I think the challenges of getting valuable and reliable insights out of that data is challenging enough. My first thought was "now I'm supposed to try to extract something meaningful out of a mishmash of inconsistent, sparsely populated, questionably reliable data?"

Some things I've read over the past few months have helped me get over that hump, so I wanted to share them.

First was "The Information" by James Gleick. He's my new favorite author. This was a great history of information theory.

Second was "Thinking Fast and Slow" by Daniel Kahneman. He's a Nobel prize winner and proponent of behavior economics. Fascinating read.

Last was "Chaos" by James Gleick about the origins and founding of chaos as a mathematical science. Another amazing read.

These three books so close together helped me get a much stronger intuition for the power and meaning of insights that can come from big data and machine learning techniques. If you're on the fence about the usefulness of big data, start reading.

Thursday, May 17, 2012

Namespace Collision

I always enjoy a good joke, especially when they're related to work.  This one is a bit edgy, so I hope you forgive the nature of the content.

3M has a product called the Healthcare Data Dictionary that it uses in many of its products as a way of mapping and related concepts from different domains or standards.  They've made this available for use online for free and are open sourcing it as well.  Since I'm in healthcare, I thought I'd do a little searching around just for fun.  One of the entries I stumbled across was "failure of orgasm."  Oh, there are things much more risque, trust me!  I thought it was quite funny, though, when the 3M HDD told me that related to this condition was "Parent - Reactions - has child."  I'll leave it to you to interpret the direction of the implied causality and likelihood of actuality.



Clearly I took some liberty in misinterpreting the layout of the results, but the collision between the healthcare namespace (which has you thinking about sex and topics related to sex) and the relational namespace (where parents and children represent a hierarchy) makes for a fun joke.

Thursday, April 19, 2012

The Information

I'm reading a book right now called The Information, by John Gleick. It's an incredible history and exposition on the origin of thinking and the articulation of information. One of my favorite quote (so far) suggests that information itself is a new kind of entity - not matter, not energy.

Given my current interest in data governance and the concept of "data as a corporate asset," I've latched on to this concept closely. Information is thing, but it's neither capital nor operating expense. It's a different kind of thing altogether, but measurable and worthy of management nonetheless.

Stay tuned for more reflections on this book while I'm reading the rest of it.

Thursday, November 4, 2010

TDWI St. Louis Chapter

For anyone who will be in the St. Louis area on November 12th, the St. Louis Chapter of The Data Warehouse Institute will be holding it's quarterly meeting.  I expect this meeting will contain some great content from renowned speakers Neil Raden and Krish Krishnan.

For more information and to register, see the TDWI STL page.

Meetings are free and open to the public.  You need not be a TDWI member to attend.

Hope to see you there!

Tuesday, November 2, 2010

Information Portfolio Components

My model for the Information Portfolio includes four components:
  • People
  • Applications
  • Processes
  • Data



People represent the individuals or teams that use data to execute business processes.

Applications are either traditional end-user applications or integration solutions that move data through a process, between processes, or to/from people.

Processes are the business activities that leverage data to fulfill the operational objectives of the business.

Data is the cornerstone of the Information Portfolio.  It is the stuff that moves through a process, between people and applications, to act as fuel in the execution of business services.

The Information Portfolio is a knowledge base or collection of metadata that links together these four concepts in meaningful ways that transform bare data, through context and meaning, into information that can be used in the delivery of business services. [Wisdom Hierarchy]

More on each of these components and how they relate to each other in the Information Portfolio as the month continues...

    Day 1 - PragProWriMo / NaNoWriMo

    It's November again, which means that it's time to spend some focus time writing. After a little bit of a slow start with day one of writing, I'm at 1,292 words. The pace that I try to set is 2,000 words per day, which leaves a chance to take one day off each week and still hit the NaNoWriMo target for 50,000 words. The official goal for PragProWriMo is 2 pages per day, which would be well under a 2,000 word/day goal.

    I've been distracted and not really planning for November, unlike last year. Last year, I had an outline developed and some sketches down on paper before I started writing The Practical Data Warehouse. This year, I spend the evening watching TV, commented to my wife that I really didn't have any clue on what to write... then took her advice and just started writing. This years concoction:

    The Enterprise Information Portfolio
    A Model for the People, Applications, Processes, and Data of an Organization


    As November rolls on, I'll be posting snippets from the text. Good luck to all the other NaNoWriMo and PragProWriMo authors out there!!

    Monday, September 6, 2010

    What's your cathedral?

    There's a classic story about understanding the purpose of work: The Story of the Three Stone Cutters
    Once, there were three stone cutters working together on a job.  A stranger came upon the first stone cutter and asked, "What is it that you're doing?"

    "Cutting this stone into a perfect square block," answered the first stone cutter, continuing to focus with great care and precision.

    The stranger moved on, leaving the first stone cutter to his craft. He came upon the second stone cutter, who was also working diligently on his pile of stones. "What are you doing?" asked the stranger, interested to see how this stone cutter would respond.

    The second stone cutter stopped to look at the stranger and engage in the conversation. "I'm working here on this job to provide for my family. I have a loving wife and two wonderful children. I work hard here to make sure we have what we need and can take time to enjoy each other." The second stone cutter reached out a friendly hand. "I'm Alexander."

    The stranger introduced himself and shared a brief story with the stone cutter about his own family. They said good bye, and the stranger moved on to another area of the project.

    Around the outer edge of the site, the stranger saw a third stone cutter who was squatting behind a large carved stone, but staring toward the horizon. The stranger approached this stone cutter and asked, "Pardon me. May I ask what you're working on?"

    "I'm building a cathedral," responded the third stone cutter without breaking his gaze toward the horizon, and into the future. "It's going to the new home for a parish that is renowned for it's financial generosity and support of the surrounding community. There will be a soup kitchen on the main level, a community garden in the courtyard, and offices for individual and family therapists. In three years, they expect to be providing services to over two thousand people a day."

    The stranger stared into the distance, picturing the bustling crowds and smiling faces of the volunteers. "Thank you for sharing that vision," said the stranger.

    Every time that I've heard this story in the past, I've identified clearly with the third stone cutter - I need to know what kind of structure I'm building and why we're doing it.  I struggle to find motivation unless I really understand the mission.

    And, I've always felt sorry for the first two stone cutters.  I mean, the second one has a noble purpose and all.  The first one doesn't need to know his greater purpose to be a good worker.  The third one, though: he's the enlightened one!

    Then, I started thinking about one of the last team building / motivational leadership meetings I was at.  We spent a long time talking about how to help other co-workers in IT connect to the fact that we are a Catholic health care service provider.  Our mission and purpose is all about providing care for patients, with a deference for those who are on the edges of society or in the greatest need.  A great purpose for the organization.  As IT leaders, we spend quite a bit of time working to help our staff understand how their day to day work of supporting servers and applications helps someone to do their job in Finance; and how that helps someone do their in Medical Records; and how that helps someone do their job in Credentialing; which helps us make sure that our physicians are qualified; which helps us make sure that patients are safe.  In the worst of cases, an IT co-worker can feel six or seven times removed from the purpose of the organization.  (I recently took a Gallup "strengths" survey and learned that "connectedness" is one of my strength, so I guess I don't really struggle with this too much.)

    Still, I recently started to wonder if all of the leaders in that training were coming at things from the perspective that the third stone cutter is the only one who's really got it right, and that we're all supposed to strive to see the same metaphorical cathedral.

    Recently, I've found at least as much satisfaction at work focusing on the quality of stones that I'm carving right now.  I've got a larger purpose in mind, though it isn't really the patient care we provide. As I contemplate how to push data management principles forward, my justification stops with "this will make our organization smarter."  Of course all kinds of great benefits will come from that, including improved patient care, fiscal responsibility, innovative service models, an end to world hunger, and peace for all.  Right now, though, it is completely satisfaction enough to have the purpose of making people better, smarter decision makers.

    So, if my cathedral isn't better health care for our patients, then is it a stretch to think that the first stone cutter's cathedral isn't that one beautiful brick and pride in his craftsmanship; and the second stone cutter's cathedral isn't the quality of life experience that he's providing for his family and for himself.

    What's your cathedral look like?  Is it right in front of your eyes, or off in the distance?

    Tuesday, August 31, 2010

    Spare Some Change?

    There's a classic joke about the difference between the IT person and the developer.  Here's my version:
    The boss comes into the IT guy's office and says "Hey, Joe, we've got a new line of business starting up and we really need to be able to do this new thing, X."  

    Joe simply says, "Sorry, Boss, that can't be done."  

    So, the boss goes down the hall to the development manager and says "Hey, Nancy, we've got a new line of business starting up and we really need to be able to do this new thing, X."  

    Nancy says, "Absolutely, Boss.  We can do anything.  It'll take us six months to plan, two years to develop, and another several months of user acceptance testing."  

    The boss goes back to his office and begins writing his resignation.

    The best quote I've ever heard about change isn't the old adage that "the only thing constant is change."  I'm a futurist at heart, so there's no deep insight for me in the constancy of change.  The best quote I've heard about change is that "change is great because every time something changes it means you're one step closer to getting it right."  That's a bit of paraphrasing and it assumes that we're primarily concerned with "good" change, of course.  The point stands: if you believe that change is a good thing, then it's natural to embrace it rather than fight it.

    Classic waterfall methodologies focus on defining specifications ahead of implementation so that the risk of change can be avoided.  That level of contractual thinking drives unnecessary conflict, especially in business intelligence projects.  One of the things we've learned through experience is that we can't know what we don't know.  Naturally, the needs being addressed by a business intelligence solution will change over time as more insight is delivered to decision makers.  If we already knew what the outcome was, then there wouldn't be a need for the project.  Business intelligence is about discovery.

    One of the core principles from the Agile Manifesto is "customer collaboration over contract negotiation" and "responding to change."

    I've worked with a number of teams that grow increasingly frustrated over changes that end users make in business logic.  ETL developers sometimes get so fed up with change and being forced to rewrite code over and over that they feel they should stop writing any code until "the users decide what they want!"  Those developers haven't recognized that every time they change their code, they're enabling the users to understand what it is that they want.  They're facilitators, not just implementers.

    So, agile BI is at least as natural a fit as agile application development.  Probably even more so.  For BI developers to be agile, though, they have to embrace change.  They have to facilitate rather than resist change.

    Friday, August 27, 2010

    Your Job? My Job? Our Job!

    I've been trying to figure out agile data warehousing for several years now.  I'm a computer scientist by training and a programmer by hobby, so I've always kept my eye on trends in traditional software development.  What I tell myself, professionally, is that it helps me have alternative perspectives on BI solutions.  (It's really just 'cause I like programming, even if I what I hack together isn't typically that elegant.)

    Several years ago, I was introduced to one of the founders of the St. Louis XP (Extreme Programming) group, Brian Button, and decided to sit down and have lunch with him.  I explained what kind of work we do to build data warehouses, and he listened very politely.  At the time, he was thinking mostly about test driven development and pair programming.  One of the things he asked me was "can you get your data modeler, ETL developer, and report developer all in a room together working on the same components all at once?"  It occurred to me, then, that separation of development responsibilities might be a serious impediment to agile BI development.

    As a former consultant, I've personally done a little bit of everything.  I can data model well enough to build something workable; I've spent a lot of time writing ETL both by hand and with GUI tools; I'm a raw SQL hacker for sure; and I can even create a reasonable report or two that some VP wouldn't just use as a coaster.  How often have I ever asked my staff to do that breadth of work, though?  In larger organizations, I usually see and hear about separation of team: data modelers, ETL developers, reporting people.  They're separate teams.  That's always been under the guise of "developing technical expertise" within the team and driving consistency across projects.  (Important goals for sure.)

    However, when I look at successful agile software teams that I know about, that same level of separation isn't typically present.  A single developer might do part of the UI, some of the dependent service modules, and the persistence layer.  They're focused on delivering a particular function, not of some component of the overall application, but an external function of the application.  This goes back to the previous conversation about sashimi, too [1] [2].

    Of course there are some developers that are better at UI and other better at ORM, just like there are some BI folks better at data modeling and other better at data presentation.  To enable more agile development, though, requires developers who are more willing and able to cross over those traditional boundaries in the work that they do.  One of the leaders I work with today articulates this very well when he says that we just want "developers."  What this does for agile is that it minimizes context switching and the spin between different developers working on the interrelated but different pieces of the same component.  If an ETL developer decides she needs five extra fields on a table because she just learned that some previous assumption about the cardinality of a relationship was flawed, should that change require the synchronous work of:

    Modeler ETL Developer Report Developer
    1 Changes data model wait/other work wait/other work
    2 Deploys changes to DB wait/other work wait/other work
    hand off
    3 wait/other work Import new metadata wait/other work
    4 wait/other work Update ETL jobs wait/other work
    5 wait/other work Unit test ETL wait/other work
    hand off
    6 wait/other work wait/other work Run test queries
    7 wait/other work wait/other work Update semantic layer
    8 wait/other work wait/other work Update reports
    NAnd loop through for every change

    There's a lot of opportunity for optimization there if one person is working on the tasks instead of several people. For about 66% of the teams time, they're working on something other than this one objective, and there's latency in the hand off between developers. (If you can create a parallel scheduling algorithm that gets the overall workload done faster, all things equal, than "one resource dedicated to completing all the steps in implementing each particular piece of functionality" and "helping each other out when there's nothing in the queue", please let me and every computer science professor on earth know.)

    I think that for some teams this will be a challenge to their skill set and require developers to grow beyond their existing comfort zone.  I'll argue that they'll be the better for it!  For some teams it might be more of a challenge to ego than to actual skills: "you're going to let anyone data model?!"

    The answer is "yes" and "we require an appropriate level of quality from everyone."  That's why agile teams are ones with pair programming, peer reviews, and an approach that not just accepts but welcomes change.

    For a team that isn't agile today, these things can't come piecemeal.  If you want to be agile, you have to go all-in.

    Thursday, August 26, 2010

    Growing a Tree versus Building a House

    When we say that we want to build "good" software, we tend to use terms that come from other engineering fields: foundation, framework, scafolding, architecture.  One of the things that the agile software movement has shown us is that good solutions can come from evolutionary models as well as construction models.  The difference comes from the fact that code is far easy to manipulate that physical raw materials.

    When building a data warehouse, we often draw traditional, stacked-tier pictures of the data architecture: data warehouse tables, semantic layer, data marts, etc.  If we start our design discussions with an assumption that anything we "build on top of" has to be "solid" then we quickly drive the overall solution away from agility.  "Solid" conjures an image of a concrete foundation that has to be built to withstand floods and earthquakes.  If we find a crack in our foundation, it has to be patched so the things on top don't come crumbling down.

    If, instead, we try to imagine a conceptual architecture that has in mind goals of adaptability to purpose (rather than firmness) and loose coupling (rather than high contact), you can begin to imagine a higher level of agility.  Look at the picture from the start of this post (from webecoist).  The trees are being shaped and molded into a purpose built structure.  If, part-way through the growth process, the structure needed to change to be another 6 inches higher or hold a second story of some kind, the necessary changes could be interwoven into the growth already complete.  If we were constructing a new art museum and decided, half way through, that we wanted a library instead, we'd have to make some major changes or compromised to account for the fact that the foundation was only designed to hold the weight of portraits, not stacks of books.

    This conceptual discussion about growing something organically rather than building it from the ground up is directly related to the sashimi discussion from yesterday.  A legacy build approach says data model, build ETL, build semantic layer, build reports.  There aren't any opportunities in that model to create meaningful yet consumable vertical slices. 

    I hear some agile BI conversations only go halfway toward the mind shift that I think is necessary.  These "think big, act small" solutions sound like a model where the only change is that you poor some of the concrete foundation at a time.  Building a house using this semi-agile approach:

    Iteration One:
    1. Pour foundation for the kitchen only.
    2. Build kitchen walls.
    3. Wire up kitchen outlets.
    4. Install kitchen plumbing.
    Iteration Two:
    1. Pour foundation for the family room only.
    2. Build family room walls.
    3. Realize you need to tear out a kitchen wall to open to family room.
    4. Reroute electricity in that wall.
    5. Rerun wiring to kitchen
    6. Run new wiring to family room
    In this approach to agile BI, you might well deliver value to customers more quickly than if you took a monolithic waterfall approach.  Since you aren't requiring yourself to plan everything up front, you run a high risk of having to do rework later, though.  In a physical construction mindset, rework is very expensive (rip out wall, rewire, etc).

    An organic build approach says plant a seed, watch it grow.  First, a sprout appears, with roots a stem and leaves.  The stem gets thicker, the roots grow deeper, and more leaves sprout.  Branches grow.  Flowers bud and fruit appears.  When requirements change some pruning and grafting is required, but you don't have to tear down the tree and plan a new one from scratch or start a new tree on the side.  The tree will grow around power lines and rocks and other trees as needed.

    There's the mindset.  I don't think it's easy to shift from a constructionist perspective to an organic one.  Success in agile BI requires this change in thinking, though.  If you're still laying foundations and screwing sheetrock onto studs, your attempt at agile BI will not be optimal.


    Good luck with that.

    Wednesday, August 25, 2010

    Sashimi (An Agile BI Lesson for Floundering Teams)

    The most recent TDWI conference generated a lot of conversation around what Agile BI means and how agile principles and practices from traditional software development can and can't be applied to business intelligence projects.  I wasn't able to be at the TDWI conference and attend the presentations, but there's been a lot of chatter.
    I can't speak broadly from an industry perspective on agile BI, but I can speak from my own personal experiences.  The organization I work for has been undergoing a move over the past year to apply an existing agile methodology used in application development to data warehouse and business intelligence solutions.  It's an ongoing study that I believe has a lot of promise and many yet unknown challenges.  So far, there are three parts to this unfinished Agile BI story: sashimi, develoment culture, and developer roles.  Tonight's post is on sashimi.

    For those of you not familiar with the use of the term sashimi in this context, the gist is that sashimi is the art of slicing up a problem space into pieces that are at the same time independently valuable as well as quickly achievable.  In an app dev project, what this means is creating a so-called walking skeleton that exercises only as many pieces of the overall solution as necessary to deliver something that is actually usable by a user.  For example, if I'm building an application that's going manage medical claim payments, maybe all the first slice does is retrieve one claim from the database and display it on the screen.  Then as work progresses toward the first 90-day release, more and more meet is built up on top of that skeleton, refactoring various pieces of the stack as necessary along the way.  Good sashimi results in ever increasing value to end users with only as little bulk on the skeleton as necessary to achieve that.

    What does good sashimi for a BI project look like?

    I think that it looks the same, but feels much harder to accomplish, especially when you have a enterprise scale strategy for data warehousing and business intelligence.  Imagine that you need to deliver a new reporting dashboard for department managers to do predictive labor modeling.  The minimal vertical slice for that solution could include:
    • New tables in a staging area from a new source system, with
    • New ETL jobs to load data into...
    • New tables in an enterprise data warehouse, and
    • New tables in a brand new data mart, and
    • New objects in a semantic reporting tool (e.g. universe or model), and
    • (Finally) your new dashboard.
    That's a lot of layers to slice through.

    In traditional BI projects that I've been involved in, the project plan would call for building the solution mostly in the order shown above: bring the data in, understand the data, build a data mart, wrap it with a semantic layer, and deliver the dashboard.  Along the way, you'd probably have a subteam prototyping and testing out the dashboard UI and maybe someone doing some data profiling to speed data analysis along; but the back-end pieces of development, especially, are likely to happen in stacked order.

    Building a walking skeleton in software requires you to be able to refactor the bones along the way.  As the analogy goes, the first version of the walking skeleton might have just one leg and one toe that attaches directly to the spine and up to the head.  As the product evolves, the leg bone gets refactored into femur, patella, tibia, and fibula; more toes get added for stability; and a new set of hip bones is created.  All of those change to the base skeleton in order to add muscles, skin, and clothing.

    As we layer things in a traditional BI project, we often try to keep a more detailed big picture in mind up front.  I know the final product is going to have two legs, that bend at the knee, need to be able to support independent orbital motion, and maintain upright stability of a 200 pound body.  That all leads to five toes, several leg bones, and hips from the very beginning.  An agile approach would ensure that we can notice early on that the business doesn't really need a biped mammal, but a fish.  That traditional approach results in a lot of wasted assumptions and potentially wasted work.  The agile approach allows for the easy reuse of what can be kept from the skeleton (spine) and a refactoring of the other pieces (leg becomes fin, toe becomes tail).

    That's a lot of metaphor there, all to say that one of the requirements of agile development the ability to picture work in those thin vertical slices of functionality that deliver as much value to users as possible with as little commitment under the covers as necessary.  That requires both a mind set as well as an architecture that will allow developers to quickly refactor components in the stack without having to deal with exorbitant dependencies.  In an enterprise BI environment where source systems are feeding many systems, data warehouses have lots of application and direct user dependencies, and semantic reporting tools are tightly coupled to database objects, this ability to refactor requires a flexible architecture with clear boundaries between components.  Examples that may be useful:
    • Nothing but the job that loads a table should ever reference it directly.  Always have a layer between physical database objects and the users or user applications, even if it's a layer of "select *" views.
    • Only one job (or an integrated set of jobs) should load a given table.  That job should have a versioned interface so that source systems don't all have to be enhanced when the target table changes.
    • Each independent application should have an independent interface into the data (read: data mart, views, etc)
    • Refactoring involves moving logic between layers of the solution stack: promote something from a data mart down to an enterprise data warehouse when an opportunity for reuse is identified; demote something from enterprise data warehouse to data mart when it's clearly application specific.  Make sure that however you build your solution, you can move things between layers easily.
    • Have each layer interface with only the next layer above/below it.  Don't allow the design to cross over abstraction boundaries (e.g. having a report directly access staging tables instead of pulling the data into the data warehouse and on up the chain to the report).
    • Build as little code as necessary to get something from one abstraction layer to the next, even if that means a simple "select *" view rather than building a full ETL job with surrogate key management, SCD Type-2 logic, and data cleansing rules.  But also make sure you've built an abstraction between the data warehouse and the report so that when you add all of those features to the data warehouse, you don't necessarily have to go update all of the reports that have been built.
    Those are just a few thoughts on what might be one way of laying out an architecture that will allow your BI behavior to be agile.

    There are probably other good architecture to support this kind of agile sashimi for BI solutions.  Remember to focus on the goal of being to deliver as much value as possible to end users with as little effort as possible, in every release.  That's what this agile lesson is about.  You have to change how you thing to get here, though.  That will be the next post.

    Monday, August 23, 2010

    Business Keys

    I'm engaged in a project that's actively using key concepts from Dan Linstedt's Data Vault methodology.  There are lots of very powerful benefits that we seem to be realizing with this methodology, but this series of blog posts won't be particularly about use of the Data Vault.  Still, I felt it was appropriate to credit the Data Vault for helping provide a structure for our own discovery.

    This first post is about the struggle to identify keys for business entities.  We set forth some fundamental principles when we started out on our latest large scale project.  First and foremost, we would "throw nothing away".  What that's meant is that we want to design the foundation of our reporting database to be a reflection not just of one department's truth, but all of the truths that might exist across the enterprise.

    As a result, the design of every major entity has run into the same challenge: "what is the business key for this entity?"  Well, from System A, it's the abc code.  From System D it's the efg identifier.  But if someone put in the xyz ID that the government assigns, then you can use System D to get to this industry file that we get updated every 3 months and link that back to an MS Access database that Ms. So-and-so maintains.  Ack!  Clearly we can't just say an apple is an apple.  And clearly there's a data governance issue at play in this scenario also.

    In one case, some of these are legacy systems that simply are what they are and aren't worth investing additional time and energy into.

    Our data modeling challenge is to determine what the one business key would be for all the different instances of this one logical entity.  When confronted with the challenge of having no clear business key, the project wanted to "just add a source system code and be done with it."  I pushed hard against this approach for a couple of weeks, insisting that the team keep going back and working harder to dig up what could be a true business key.  Eventually, I realized that I was both working contrary to one of the original goals I'd set forth and becoming the primary roadblock to progress.


    Interesting side note:  One of the better tricks of good software design is to defer decisions to the last possible minute.  If you get away without writing some piece of code, then best to put it off until you have to write it.  There's obviously some nuance and art to understanding how to leverage that.  The Strategy Pattern is a good example, though.


    What I realized I was doing was trying to put a huge and potentially very dynamic piece of logic out in front of our need to simply capture the information that was being created by source systems.  So, we instituted what felt like a completely counter-intuitive design standard:  every business key would include source system code as part of the compound key; and we would defer the need to consolidate and deduplicate instances of an entity until after all the underlying data had first been captured.

    Deduplicating is the immediate next process, but this allows to be sure that we've captured all the raw information from the source system first before throwing away the fact that there are different versions of the truth in different source systems.

    A very powerful lesson for us that felt very counter-intuitive; that we started considering for the wrong reasons; and finally decided to follow through on for all the right reasons!

    Thursday, May 27, 2010

    Reading List

    Last week, I got hear John Ladley present on Master Data Management.  I'm excited to get his new book, Enterprise Information Management, which publishes tomorrow!  I followed up with an email conversation with him and have suddenly increased my reading list by about 10 new books he suggested I pick up and read -- and I thought my shelf was pretty well stocked!

    On Tuesday, I walked into my office to find a present for me on my desk.  It was Business Intelligence, a newly published book by local UMSL professor Rajiv Sabherwal.  I know Rajiv through my membership on the UMSL IS Board of Advisors, and it turns out that his neighbor works with me, too.

    Speaking of books, I carry around a printed out draft of my PragProWriMo book from last November.  It's a good reminder about the importance of communication.  I originally had the goal of revising a couple of chapters and submitting them in January, but I've adjusted that to be revise during PragProWriMo'11 and submit after that.

    Friday, April 30, 2010

    Evolution of Business Communication

    Steps in corporate culture transformation:
    • Couriering around printed documents
    • Faxing documents to each other
    • Emailing documents as attachments
    • Emailing links to documents on sharepoint
    • Emailing links to enterprise wiki articles
    • Collaborating in the article and notified of changes via email
    • Collaborating in the article and seeing RSS updates in the enterprise portal

    Also somewhere in there is the crazy scenario where I get email a PDF that was clearly created by someone who scanned a printed copy of a PowerPoint presentation.  I guess that worse, yet, would be a PDF created by scanning a printed copy of a wiki article.

    Saturday, April 24, 2010

    Complexity = Agile Simplicity

    I'm a major advocate of traditional simplicity principles like KISS, YAGNI, DRY, the Unix Philosophy, etc.  I'm also very interested in complexity and solving large complex problems.  It occurred to me during an all day "fix this process problem" meeting today that the ideas of complexity and simplicity aren't actually antonyms.  Sure, by definition, they are, but I think the paradox in that comparison is that perhaps complexity can be defined as simplicity that changes over time.

    My perception of complex problems is that given the examples of any one particular moment in time, the situation can be easily dissected, analyzed, documented, and understood.  Take a sample from the next moment in time and the same is true.  Try to combine that collection of understandings into a generalization that can be applied to other past and future points in time and the problem suddenly becomes complex.

    One of the observations we get from agile development is that solutions will be more correct, given the flexibility to adjust to customer needs versus following a previously defined historical specification.  Agility is the ability to adjust to change.

    Therefore, complex problems should be solvable by solutions that are simple and agile.  Solutions do not have to be complex.

    We run into challenges designing agile solutions, though.  Many traditional solution design tools often call for static process flow diagrams, swim lane control charts, concrete data models, class diagrams, etc.  I think that some of the behavioral object oriented patterns give us some clues on how to introduce agility into solutions, but understanding when to apply those (and how to apply them within some technologies) takes creativity and experience. 

    I think that introducing that same kind of solution agility into human processes is also very challenging.  Often, we want clearly defined instructions and flow charts to instruct individuals on exactly what to do.  The only variation being a Madlib style fill-in-the-blank.  Perhaps we need more "and then a miracle occurs" steps in our complex processes.  And perhaps that's both acceptable and desirable in some processes.

    Sunday, April 18, 2010

    Digging Holes

    The following parable is adapted from one that I was forwarded by a friend this week...  It seemed a good analogy for a poor IT Service Management philosophy.


    Two IT directors changed jobs and were working for the city public works department.  One would dig a hole and the other would follow behind her and fill  the hole in. They worked up one side of the street, then down the other, then  moved on to the next street, working furiously all day without rest, one director digging a hole, the other director filling it in again.

    An onlooker was amazed at their hard work, but couldn't understand what they were doing. So he asked the hole digger, "I'm impressed by the effort you two are putting in to your work, but I don't get it -- why do you dig a hole, only to have your partner follow behind and fill it up again?"

    The hole digger wiped her brow and sighed, "Well, I suppose it probably looks odd because we're normally a three-person team. But today the guy who plants the trees called in sick."

    Sunday, April 11, 2010

    Personal Best



    Today's run...  The Go! St. Louis Half Marathon
    My first official race!

    Distance: 13.1 miles
    Time: 2:34
    Place: 7,520th

    It was awesome!