Show of hands: how many of you have used recursive SQL in the past six months? For something useful? Using ANSI syntax rather than Oracle's CONNECT BY? If you said "yes" to all three, then congratulations! You've won a prize: 15 minutes of your day back. You can jump down to the bottom of this post and just review my fun example. This first part is just an introduction to recursive SQL.
Introduction to Recursive SQL
First let's just review the definition for "recursion." Handy-dandy dictionary here.... recourse... rectify.... recursion:
See: recursionNot very helpful. How about this instead. A recursion is a method of solving problems in which the algorithm or function applies itself in the execution of the solution. For instance, if I were to define a recursive algorithm for summarizing a list of numbers, I could say something like "the sum of a list is determined by adding the first number in the list to the sum of all the remaining numbers in the list; if the list is empty, then the sum is zero." You notice how the definition of "sum of a list" includes the use of the "sum" operation, as in "sum of all the remaining numbers." You'll also note that there's an explicit terminal condition to define what happens when you have nothing in the list.
So, recursive SQL works the same way. A recursive SQL statement is one in which the determination of the result set requires the execution of SQL on the result set itself. In recursive SQL, there has to be a seed statement, as well as the recursive call, and a termination condition.
WITH RECURSIVE [temp table] [column list] AS ( [seed statement] UNION ALL [recursive statement] )
A Simple Example
For the sake of comfort, a simple example might involve an employee hierarchy table where each row had the employee ID and associated manager ID. A standard query could retrieve a list of direct reports for a given employee ID: SELECT * FROM employee WHERE mgr_id = 123. A recursive query, however, could return the list of all employees anywhere under the hierarchy below that same manager. An example that might be:
WITH RECURSIVE emp_hier (emp_id, mgr_id, level) AS ( SELECT a.emp_id, a.mgr_id, 0 FROM employee a WHERE a.emp_id = 123 UNION ALL SELECT b.emp_id, b.mgr_id, c.level+1 FROM employee b, emp_hier c WHERE b.mgr_id = c.emp_id ) SELECT e.emp_title, e.emp_id, e.mgr_id, h.level FROM employee e, emp_hier h WHERE e.emp_id = h.emp_id AND e.emp_id <> 123;
In this query, the
On to some more exciting examples...
Example 1: String Concatenation
Ever notice that there are no string aggregation functions in SQL? Something like a SUM() for character fields that would take two parameters: the field or expression from each row and a delimiter. Usually, programmers end up having to write code outside of the database to do that kind of string manipulation. Recursive SQL can achieve that kind of character aggregation quite nicely if used appropriately.
Consider a list of email addresses for several members of the same family. The table has a FAMILY_ID, PERSON_ID, SEQuence number, and EMAIL_ADDR. The desired output is a single row that contains a comma-separated list of all the email addresses for the entire family (for a mailing list perhaps).
Let's walk through the SQL piece by piece. First, the seed statement starts the result set off by picking just the first person from each family (that is, the people with a SEQ=1). That sequence number is admittedly a bit artifical and contrived to make the recursive SQL easier to write. If you needed to, you could generate the SEQ on the fly using a RANK() of people within each family and then leverage the RANK() as a sequence number. Hopefully the example isn't lost because of that bit of confusion.WITH RECURSIVE email_list (coverage_id, emails, seq) AS ( SELECT m.family_id, m.email_addr, m.seq FROM members m WHERE m.seq = 1 UNION ALL SELECT e.emails ||’, ’|| m.email_addr, m.seq FROM email_list e, members m WHERE m.seq = e.seq + 1 AND m.family_id = e.family_id ) SELECT r.family_id, r.emails FROM email_list r WHERE r.seq = (SELECT MAX(h.seq) FROM email_list h WHERE h.family_id = r.family_id GROUP BY h.family_id) ;
The recursive portion of the SQL simply takes what was seeded in through the first query (i.e. people with SEQ=1) and joins that to all the records in the EMAIL_ADDR table with a SEQ = SEQ + 1 (from result set). The EMAILS output field is created by concatenating the email address from the seed to record with the next highest sequence number.
Then that result set is taken and the same thing is done. And so on. And so forth.
The query terminates when the result set contains no records. That is, when we exhaust all the SEQ numbers in the table.
The outer query refines the results down to exactly what we're looking for by keeping the final record and throwing away all the intermediate records that were used to built up the results.
Look ahead for additional posts on how to use recursive SQLto parse a field with comma separated values and how to split and reconcile overlapping time segments into a single, flat time line.