Saturday, December 12, 2009

Fun with Recursive SQL (Part 2)


This blog post and Part 1 are derived from an article that I wrote as an article for Teradata Magazine about fun uses for recursive SQL.

Part 1 of this series described a way to use recursive SQL to build up a single comma-separated field from a set of rows that were all part of the same grouping. Part 2 describes how to do just the inverse: take a field that has a comma-separated list of values and produce one row for each of the items in that list. What I refer to as "string normalization."

In the code below, the seed statement is simply a select * from our source table. The recursive statement takes the field to be pivoted into multiple rows and uses the POSITION() function to repeatedly pull off just the first item in the list. The recursion continues until our POSITION() calculation in the "dpos" field tells us the field is blank.

The context for this example is a set of survey responses in a table that contains a RESPONSE_KEY (just some primary key) and RESPONSE_TXT (a comma-separated list of responses). The SQL below parses that RESPONSE_TXT field into a single elements and outputs one row per element along with the associated RESPONSE_KEY.

WITH RECURSIVE parse_list
       (response_key, delim_pos, item_num, element, remainder) AS 
(
    SELECT response_key, 0, 0, CAST('' AS VARCHAR(100)), response_txt
    FROM   response
    UNION ALL
    SELECT response_key,
           CASE WHEN POSITION(',' IN remainder) > 0
             THEN POSITION(',' IN remainder)
             ELSE CHARACTER_LENGTH(remainder) END dpos,
           item_num + 1,
           TRIM(BOTH ‘,’ FROM SUBSTR(remainder, 0, dpos+1)),
           TRIM(SUBSTR(remainder, dpos+1))
    FROM   parse_list
    WHERE  dpos > 0 
) 
SELECT response_key, element 
FROM   parse_list p
WHERE  item_num > 0
ORDER BY response_key, item_num;

If the input RESPONSE table had the following data:
RESPONSE_KEY   RESPONSE_TXT
643            eggs, milk, toast
872            cereal, juice, bagel, cream cheese

Then the output would look like this:
RESPONSE_KEY  ELEMENT
643           eggs
643           milk
643           toast
872           cereal
872           juice
872           bagel
872           cream cheese

If you're new to recursive SQL, take the time to read, try, and experiment with these first two uses. Part 3 of this series will describe a way to consolidate multiple overlapping time segments, using some specified order or precidence, to create a single consolidated time line. That sounds confusing and potentially worthless until you see the real world example.

1 comment: