Wednesday, August 25, 2010

Sashimi (An Agile BI Lesson for Floundering Teams)

The most recent TDWI conference generated a lot of conversation around what Agile BI means and how agile principles and practices from traditional software development can and can't be applied to business intelligence projects.  I wasn't able to be at the TDWI conference and attend the presentations, but there's been a lot of chatter.
I can't speak broadly from an industry perspective on agile BI, but I can speak from my own personal experiences.  The organization I work for has been undergoing a move over the past year to apply an existing agile methodology used in application development to data warehouse and business intelligence solutions.  It's an ongoing study that I believe has a lot of promise and many yet unknown challenges.  So far, there are three parts to this unfinished Agile BI story: sashimi, develoment culture, and developer roles.  Tonight's post is on sashimi.

For those of you not familiar with the use of the term sashimi in this context, the gist is that sashimi is the art of slicing up a problem space into pieces that are at the same time independently valuable as well as quickly achievable.  In an app dev project, what this means is creating a so-called walking skeleton that exercises only as many pieces of the overall solution as necessary to deliver something that is actually usable by a user.  For example, if I'm building an application that's going manage medical claim payments, maybe all the first slice does is retrieve one claim from the database and display it on the screen.  Then as work progresses toward the first 90-day release, more and more meet is built up on top of that skeleton, refactoring various pieces of the stack as necessary along the way.  Good sashimi results in ever increasing value to end users with only as little bulk on the skeleton as necessary to achieve that.

What does good sashimi for a BI project look like?

I think that it looks the same, but feels much harder to accomplish, especially when you have a enterprise scale strategy for data warehousing and business intelligence.  Imagine that you need to deliver a new reporting dashboard for department managers to do predictive labor modeling.  The minimal vertical slice for that solution could include:
  • New tables in a staging area from a new source system, with
  • New ETL jobs to load data into...
  • New tables in an enterprise data warehouse, and
  • New tables in a brand new data mart, and
  • New objects in a semantic reporting tool (e.g. universe or model), and
  • (Finally) your new dashboard.
That's a lot of layers to slice through.

In traditional BI projects that I've been involved in, the project plan would call for building the solution mostly in the order shown above: bring the data in, understand the data, build a data mart, wrap it with a semantic layer, and deliver the dashboard.  Along the way, you'd probably have a subteam prototyping and testing out the dashboard UI and maybe someone doing some data profiling to speed data analysis along; but the back-end pieces of development, especially, are likely to happen in stacked order.

Building a walking skeleton in software requires you to be able to refactor the bones along the way.  As the analogy goes, the first version of the walking skeleton might have just one leg and one toe that attaches directly to the spine and up to the head.  As the product evolves, the leg bone gets refactored into femur, patella, tibia, and fibula; more toes get added for stability; and a new set of hip bones is created.  All of those change to the base skeleton in order to add muscles, skin, and clothing.

As we layer things in a traditional BI project, we often try to keep a more detailed big picture in mind up front.  I know the final product is going to have two legs, that bend at the knee, need to be able to support independent orbital motion, and maintain upright stability of a 200 pound body.  That all leads to five toes, several leg bones, and hips from the very beginning.  An agile approach would ensure that we can notice early on that the business doesn't really need a biped mammal, but a fish.  That traditional approach results in a lot of wasted assumptions and potentially wasted work.  The agile approach allows for the easy reuse of what can be kept from the skeleton (spine) and a refactoring of the other pieces (leg becomes fin, toe becomes tail).

That's a lot of metaphor there, all to say that one of the requirements of agile development the ability to picture work in those thin vertical slices of functionality that deliver as much value to users as possible with as little commitment under the covers as necessary.  That requires both a mind set as well as an architecture that will allow developers to quickly refactor components in the stack without having to deal with exorbitant dependencies.  In an enterprise BI environment where source systems are feeding many systems, data warehouses have lots of application and direct user dependencies, and semantic reporting tools are tightly coupled to database objects, this ability to refactor requires a flexible architecture with clear boundaries between components.  Examples that may be useful:
  • Nothing but the job that loads a table should ever reference it directly.  Always have a layer between physical database objects and the users or user applications, even if it's a layer of "select *" views.
  • Only one job (or an integrated set of jobs) should load a given table.  That job should have a versioned interface so that source systems don't all have to be enhanced when the target table changes.
  • Each independent application should have an independent interface into the data (read: data mart, views, etc)
  • Refactoring involves moving logic between layers of the solution stack: promote something from a data mart down to an enterprise data warehouse when an opportunity for reuse is identified; demote something from enterprise data warehouse to data mart when it's clearly application specific.  Make sure that however you build your solution, you can move things between layers easily.
  • Have each layer interface with only the next layer above/below it.  Don't allow the design to cross over abstraction boundaries (e.g. having a report directly access staging tables instead of pulling the data into the data warehouse and on up the chain to the report).
  • Build as little code as necessary to get something from one abstraction layer to the next, even if that means a simple "select *" view rather than building a full ETL job with surrogate key management, SCD Type-2 logic, and data cleansing rules.  But also make sure you've built an abstraction between the data warehouse and the report so that when you add all of those features to the data warehouse, you don't necessarily have to go update all of the reports that have been built.
Those are just a few thoughts on what might be one way of laying out an architecture that will allow your BI behavior to be agile.

There are probably other good architecture to support this kind of agile sashimi for BI solutions.  Remember to focus on the goal of being to deliver as much value as possible to end users with as little effort as possible, in every release.  That's what this agile lesson is about.  You have to change how you thing to get here, though.  That will be the next post.

No comments:

Post a Comment