Sharpening Stones: Avoiding Self-Joins on Large Tables

Friday, November 13, 2009

Avoiding Self-Joins on Large Tables

With data warehousing solutions, I've run into the case of needing to do self-joins on large tables, a surprisingly large number of times. A couple of examples might include something like:

Customer Addresses Table

One row in the table per address
Addresses are categorized by a type of home, work, or other
Question is to know how many of your customers work and live in different zip codes
If either address is not present, then include the record

The first solution that probably comes to mind is something like the following join between all of the home addresses and all of the work addresses.

SELECT 

  COALESCE(h.customer_id, w.customer_id) AS customer_id, 

  h.zip AS home_zip,

  w.zip AS work_zip

FROM

  customer_addr h LEFT OUTER JOIN

  customer_addr w ON 

    h.customer_id = w.customer_id AND

    h.addr_type = 'home' AND

    w.addr_type = 'work'

WHERE

  (h.zip <> w.zip) OR h.zip IS NULL OR w.zip IS NULL.

Another, more efficient way to execute the same query in a single pass of the database.

SELECT
customer_id,
MIN( CASE( WHEN ADDR_TYPE_CD = 'h' THEN zip END)) AS home_zip,
MIN( CASE( WHEN ADDR_TYPE_CD = 'w' THEN zip END)) AS work_zip
FROM cust_addr
GROUP BY customer_id;

In the new code, the query will perform a single set of operations against the table once.

#pragprowrimo word count: 18,002

No comments:

Post a Comment

paulboal & Sharpening Stones

I'm a data professional who's worked in both consulting and in corporate positions, and I'm passionate about making the business world a better contributor to society through the intelligent use of business information. Welcome to Sharpening Stones, a blog about information management, data, business intelligence, a data warehousing. The title and domain name for this blog are an homage to the REM song, Exhuming McCarthy: You're sharpening stones, walking on coals, To improve your business accumen. You can see my interpretation of those lines in the first posts by following the links from here.

Friday, November 13, 2009

Avoiding Self-Joins on Large Tables

No comments:

Post a Comment

paulboal & Sharpening Stones

Blog Archive

A Practical Data Warehouse