Friday, December 11, 2009

Fun with Recursive SQL (Part 1)


This blog post is derived from an article that I wrote as an article for Teradata Magazine about fun uses for recursive SQL.

Show of hands: how many of you have used recursive SQL in the past six months? For something useful?  Using ANSI syntax rather than Oracle's CONNECT BY?  If you said "yes" to all three, then congratulations!  You've won a prize: 15 minutes of your day back.  You can jump down to the bottom of this post and just review my fun example.  This first part is just an introduction to recursive SQL.

Introduction to Recursive SQL
First let's just review the definition for "recursion."  Handy-dandy dictionary here.... recourse... rectify.... recursion:
See: recursion
Not very helpful.  How about this instead.  A recursion is a method of solving problems in which the algorithm or function applies itself in the execution of the solution.  For instance, if I were to define a recursive algorithm for summarizing a list of numbers, I could say something like "the sum of a list is determined by adding the first number in the list to the sum of all the remaining numbers in the list; if the list is empty, then the sum is zero."  You notice how the definition of "sum of a list" includes the use of the "sum" operation, as in "sum of all the remaining numbers."  You'll also note that there's an explicit terminal condition to define what happens when you have nothing in the list.

So, recursive SQL works the same way.  A recursive SQL statement is one in which the determination of the result set requires the execution of SQL on the result set itself.  In recursive SQL, there has to be a seed statement, as well as the recursive call, and a termination condition.

WITH RECURSIVE [temp table] [column list] AS
(
  [seed statement]
  UNION ALL
  [recursive statement]
)

A Simple Example
For the sake of comfort, a simple example might involve an employee hierarchy table where each row had the employee ID and associated manager ID.  A standard query could retrieve a list of direct reports for a given employee ID:  SELECT * FROM employee WHERE mgr_id = 123.  A recursive query, however, could return the list of all employees anywhere under the hierarchy below that same manager.  An example that might be:
WITH RECURSIVE emp_hier (emp_id, mgr_id, level) AS
(
SELECT a.emp_id, a.mgr_id, 0 
FROM   employee a
WHERE  a.emp_id = 123
UNION ALL
SELECT b.emp_id, b.mgr_id, c.level+1 
FROM   employee b,
       emp_hier c
WHERE  b.mgr_id = c.emp_id
)
SELECT e.emp_title, e.emp_id, e.mgr_id, h.level
FROM   employee e,
       emp_hier h
WHERE  e.emp_id = h.emp_id
  AND  e.emp_id <> 123;

In this query, the is a simple select that pulls the manager record for which we want all of the descendants.  The selects all records from the employee table where the employee is managed by someone already in the emp_hier temporary table – hence the join on employee.mgr_id and emp_hier.emp_id.

On to some more exciting examples...


Example 1: String Concatenation
Ever notice that there are no string aggregation functions in SQL?  Something like a SUM() for character fields that would take two parameters: the field or expression from each row and a delimiter.  Usually, programmers end up having to write code outside of the database to do that kind of string manipulation. Recursive SQL can achieve that kind of character aggregation quite nicely if used appropriately.

Consider a list of email addresses for several members of the same family.  The table has a FAMILY_ID, PERSON_ID, SEQuence number, and EMAIL_ADDR.  The desired output is a single row that contains a comma-separated list of all the email addresses for the entire family (for a mailing list perhaps).

WITH RECURSIVE email_list (coverage_id, emails, seq) AS
(
  SELECT m.family_id, m.email_addr, m.seq
  FROM   members m
  WHERE  m.seq = 1
  UNION ALL
  SELECT e.emails ||’, ’|| m.email_addr, m.seq
  FROM   email_list e, members m
  WHERE  m.seq = e.seq + 1
    AND  m.family_id = e.family_id
)
SELECT r.family_id, r.emails
FROM   email_list r
WHERE  r.seq = (SELECT MAX(h.seq) FROM email_list h 
                WHERE h.family_id = r.family_id 
                GROUP BY h.family_id)
;
Let's walk through the SQL piece by piece.  First, the seed statement starts the result set off by picking just the first person from each family (that is, the people with a SEQ=1).  That sequence number is admittedly a bit artifical and contrived to make the recursive SQL easier to write.  If you needed to, you could generate the SEQ on the fly using a RANK() of people within each family and then leverage the RANK() as a sequence number.  Hopefully the example isn't lost because of that bit of confusion.

The recursive portion of the SQL simply takes what was seeded in through the first query (i.e. people with SEQ=1) and joins that to all the records in the EMAIL_ADDR table with a SEQ = SEQ + 1 (from result set).  The EMAILS output field is created by concatenating the email address from the seed to record with the next highest sequence number.

Then that result set is taken and the same thing is done.  And so on.  And so forth.

The query terminates when the result set contains no records.  That is, when we exhaust all the SEQ numbers in the table.

The outer query refines the results down to exactly what we're looking for by keeping the final record and throwing away all the intermediate records that were used to built up the results.

Next Time
Look ahead for additional posts on how to use recursive SQLto parse a field with comma separated values and how to split and reconcile overlapping time segments into a single, flat time line.

12 comments:

  1. This has been a really valuable post! Do you have any examples of the string concatenation where I don't have a convenient sequential reference (i.e. where you suggest that Rank() be used)?

    Each time I've tried to create this using a window function, I get an "Illegal or unsupported use of subquery/derived table inside a recursive query/view" error.

    ReplyDelete
  2. Nice & Easy Keep Doing Good Job

    ReplyDelete
  3. almost same, but with test data

    http://thesimpleprogrammer.blogspot.com/2013/02/concat-rows-as-single-column-in-teradata.html

    ReplyDelete
  4. Hi Paul, thanks for your article. Recursive SQL can be tricky to understand, so I appreciate you taking the time to explain it step-by-step to us.

    ReplyDelete
  5. Hi Paul,
    Your explanation is great which made complicated logic to simple. thanks. Keep doing good job.

    ReplyDelete
  6. Thank you very much for this article, it is so rare to see nowadays written as fervently article. I enjoyed reading it and I learned a lot of things. I will go and continue reading your blog =). Good luck for the future and another one for the quality of it.You can also check out this (http://www.sqiar.com).

    ReplyDelete
  7. Thank you for taking the time to put this together. It's very helpful in understanding SQL recursion.

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete
  9. Hey, two questions:
    1. in your email concatenation example: for the recursive select portion, do you think it lacks one column (right now only two columns)?
    2. As you know, the email column in the result would be a very long one? How can you make sure it won't be truncated given the very first record of "emails" is only for one email...I ran something similar on Teradata 15.00.0305...my results from recursive selections were all truncated by the seed selection.

    ReplyDelete

  10. Thanks for sharing this information and keep updating us. This is informatics and really useful to me.
    Selenium Training in Chennai | Selenium Training | Selenium Course in Chennai

    ReplyDelete