For this example, the project involves the integrated reporting of accounts receivable history from about a dozen separate legacy systems. The design question came up: how much business logic should be done by the legacy source systems who are writing the extracts using proprietary tools, and how much should be done by the ETL team who will be integrating and merging the data into the warehouse.
In the beginning of the discussion, we had two polarized and protectionist perspectives:
Argument 1: Since the source system teams are subject matter experts, they'll know quickest how to understand and translate from source system specific domain value and logic into data warehouse values.
Argument 2: It's the data warehouse team's job to do the integration work, so the logic should all go into the ETL.
The discussion degenerated for a while until I noticed that what we were really revolving around was a straightforward operational argument that is the most convincing that I've ever heard. Here's how we summarized it at the end of our discussion.
General data warehouse design goal: put as little business logic as necessary in the source system extracts themselves. Rationale...
This helps to reduce the number of places you have to look to troubleshoot when something goes wrong. If most of the business logic is applied after the data is retrieved from the source system, then it's also most likely that any errors that occur happen after that initial extract step. If there is significant business logic in both the source system extract as well as the integration ETL, then both places have a reasonable likelihood of introducing errors into the data. Some business logic must exist in the integration ETL to be able to merge common dimensions and do the work of actually integrating the data. Therefore, having limited business logic in the source extract means that you don't have to focus on both the legacy code AND the integration ETL code when troubleshooting.
In the end, we all agreed that we'd keep the logic in the legacy extracts as simple as possible, and put as much business logic as possible in the ETL.
No comments:
Post a Comment