Sharpening Stones: 2012

Friday, November 2, 2012

IT Strategy by Tinker Bell

In our house, Friday's are "Pizza and a Movie Night." Tonight, my girls chose the 2008 Tinker Bell movie. Spoiler Alert! The plot goes like this: Bell discovers that her fairy talent is tinkering - building the tools that other fairies use to do their work and change the seasons on the main land. Tinker Bell struggles with accepting her talent for a while. She rejects her own talent and tries to learn the other fairies' talents instead. Eventually, she discovers that she's really a very innovative tinker fairy. Her lack of faith in her own talent causes her to put the arrival of Spring at risk. Then, by embracing her talent and flare for innovation, she's able to lead the other fairies with her inventions and save Spring! As a reward, she also discovers a way that tinker fairies can perform a valuable service in returning lost things to children on the main land during the changing seasons.

Hopefully you already see the comparison between tinker fairies and typical IT co-workers. The tinker fairies are an invaluable support team that work behind the scenes to make sure that all the other talent fairies can do their work effectively. They deliver a great service, but that rarely drive real change in how work is done by other fairies... until Tinker Bell comes along, that is. She spends time understanding what the other fairies do, what their challenges are, and applies her inherent tinker-talent to fundamentally change how they do their work. Wow! That's and innovative IT co-worker!!

At the same time the kids were watching Tinker Bell, I was reading Business / IT Fusion by Peter Hinssen. I can't speak for Peter, but I think Tink would make a great IT 2.0 co-worker. She's got the talent to see how her tools and materials can come together to solve problems, and she has a great ability to understand the challenges that the business of the fairies is confronted with. Her innovations don't just make the existing processes more efficient, but drive them to be all together different processes.

My recommendation:
Read Business / IT Fusion.
Watch Tinker Bell.
Don't tell anyone that you got your IT strategy from a Disney movie!

Wednesday, July 11, 2012

MapReduce Programming

I'm just starting to get into working with Hadoop and MapReduce and surprised myself with how quickly I was able to put together a real first MapReduce program. In about four hours or programming today, I build a program that takes as input a log of events for different subjects and computes a frequency table for the time interval between consecutive events tied to the same subject.

That is, what are the typical intervals between events for the same subject? Is there a a natural point at which activity tapers off and is sure not to return for an extended period of time?

The idea is to be able to identify how to long the minimum gap is between two separate experiences or interactions.

Here's my MapReduce Implementation

Map the incoming data, partition by subject and do a secondary sort on timestamp.
Reduce that down by looping through the series of timestamps, computing the interval between the current timestamp and the previous one, and counting up the occurrences of each interval (in seconds).
Have the output keyed by interval.
Reduce that again, summing up the separate counts by patient for a given interval into a single count of occurrences for each interval, across all patients.
Map that all into a single partition and write out the data in a meaningful way.

With some existing template code, I cranked that out using MapReduce and Protocol Buffers, figured out some Maven dependencies, set up my environment to support ProtocolBuffers, and built and tested the code. Not bad for a half-day of work. I credit my iced mocha from Starbucks, and maybe the fact that I was sitting at Starbucks for much of that time.

Now onto something more sophisticated. Machine Learning.

Saturday, July 7, 2012

On Executive Sponsorship

My main priority right now is the establishment of data stewardship groups. On the one hand, my gut tells me that these will be not effective if they are built up by mid-level functional leaders rather than through executive mandate. On the other hand, executive sponsorship sometimes gets things accomplishes rapidly.

What's the right balance? Here are my rules of thumb.

1. Never say "because such-and-such a VP said this a priority" or worse yet "because so-and-so's bonus is riding on this." I actually heard that once!

2. Always include the people who will be doing the work in the decision making process. It'll lead to better long term success.

3. Make sure the people doing the work know what old habits or old processes can (and need to) stop. If stewardship is just more work then it won't create the efficiencies it is predicated on.

What do you think?

Friday, June 29, 2012

Big Data

It's taken me a while to get into the idea of "big data" as something special. I've been working on largish structured data sets for a while, and I think the challenges of getting valuable and reliable insights out of that data is challenging enough. My first thought was "now I'm supposed to try to extract something meaningful out of a mishmash of inconsistent, sparsely populated, questionably reliable data?"

Some things I've read over the past few months have helped me get over that hump, so I wanted to share them.

First was "The Information" by James Gleick. He's my new favorite author. This was a great history of information theory.

Second was "Thinking Fast and Slow" by Daniel Kahneman. He's a Nobel prize winner and proponent of behavior economics. Fascinating read.

Last was "Chaos" by James Gleick about the origins and founding of chaos as a mathematical science. Another amazing read.

These three books so close together helped me get a much stronger intuition for the power and meaning of insights that can come from big data and machine learning techniques. If you're on the fence about the usefulness of big data, start reading.

Thursday, May 17, 2012

Namespace Collision

I always enjoy a good joke, especially when they're related to work. This one is a bit edgy, so I hope you forgive the nature of the content.

3M has a product called the Healthcare Data Dictionary that it uses in many of its products as a way of mapping and related concepts from different domains or standards. They've made this available for use online for free and are open sourcing it as well. Since I'm in healthcare, I thought I'd do a little searching around just for fun. One of the entries I stumbled across was "failure of orgasm." Oh, there are things much more risque, trust me! I thought it was quite funny, though, when the 3M HDD told me that related to this condition was "Parent - Reactions - has child." I'll leave it to you to interpret the direction of the implied causality and likelihood of actuality.

Clearly I took some liberty in misinterpreting the layout of the results, but the collision between the healthcare namespace (which has you thinking about sex and topics related to sex) and the relational namespace (where parents and children represent a hierarchy) makes for a fun joke.

Thursday, April 19, 2012

The Information

I'm reading a book right now called The Information, by John Gleick. It's an incredible history and exposition on the origin of thinking and the articulation of information. One of my favorite quote (so far) suggests that information itself is a new kind of entity - not matter, not energy.

Given my current interest in data governance and the concept of "data as a corporate asset," I've latched on to this concept closely. Information is thing, but it's neither capital nor operating expense. It's a different kind of thing altogether, but measurable and worthy of management nonetheless.

Stay tuned for more reflections on this book while I'm reading the rest of it.