*This blog post is derived from an article that I wrote as an article for Teradata Magazine about fun uses for recursive SQL.*

Show of hands: how many of you have used recursive SQL in the past six months? For something useful? Using ANSI syntax rather than Oracle's CONNECT BY? If you said "yes" to all three, then congratulations! You've won a prize: 15 minutes of your day back. You can jump down to the bottom of this post and just review my fun example. This first part is just an introduction to recursive SQL.

**Introduction to Recursive SQL**

First let's just review the definition for "recursion." Handy-dandy dictionary here.... recourse... rectify.... recursion:

Not very helpful. How about this instead. A recursion is a method of solving problems in which the algorithm or function applies itself in the execution of the solution. For instance, if I were to define a recursive algorithm for summarizing a list of numbers, I could say something like "the sum of a list is determined by adding the first number in the list to the sum of all the remaining numbers in the list; if the list is empty, then the sum is zero." You notice how the definition of "sum of a list" includes the use of the "sum" operation, as in "sum of all the remaining numbers." You'll also note that there's an explicit terminal condition to define what happens when you have nothing in the list.See: recursion

So, recursive SQL works the same way. A recursive SQL statement is one in which the determination of the result set requires the execution of SQL on the result set itself. In recursive SQL, there has to be a seed statement, as well as the recursive call, and a termination condition.

WITH RECURSIVE [temp table] [column list] AS ( [seed statement] UNION ALL [recursive statement] )

**A Simple Example**

For the sake of comfort, a simple example might involve an employee hierarchy table where each row had the employee ID and associated manager ID. A standard query could retrieve a list of direct reports for a given employee ID: SELECT * FROM employee WHERE mgr_id = 123. A recursive query, however, could return the list of all employees anywhere under the hierarchy below that same manager. An example that might be:

WITH RECURSIVE emp_hier (emp_id, mgr_id, level) AS ( SELECT a.emp_id, a.mgr_id, 0 FROM employee a WHERE a.emp_id = 123 UNION ALL SELECT b.emp_id, b.mgr_id, c.level+1 FROM employee b, emp_hier c WHERE b.mgr_id = c.emp_id ) SELECT e.emp_title, e.emp_id, e.mgr_id, h.level FROM employee e, emp_hier h WHERE e.emp_id = h.emp_id AND e.emp_id <> 123;

In this query, the

On to some more exciting examples...

**Example 1: String Concatenation**

Ever notice that there are no string aggregation functions in SQL? Something like a SUM() for character fields that would take two parameters: the field or expression from each row and a delimiter. Usually, programmers end up having to write code outside of the database to do that kind of string manipulation. Recursive SQL can achieve that kind of character aggregation quite nicely if used appropriately.

Consider a list of email addresses for several members of the same family. The table has a FAMILY_ID, PERSON_ID, SEQuence number, and EMAIL_ADDR. The desired output is a single row that contains a comma-separated list of all the email addresses for the entire family (for a mailing list perhaps).

Let's walk through the SQL piece by piece. First, the seed statement starts the result set off by picking just the first person from each family (that is, the people with a SEQ=1).WITH RECURSIVE email_list (coverage_id, emails, seq) AS ( SELECT m.family_id, m.email_addr, m.seq FROM members m WHERE m.seq = 1 UNION ALL SELECT e.emails ||’, ’|| m.email_addr, m.seq FROM email_list e, members m WHERE m.seq = e.seq + 1 AND m.family_id = e.family_id ) SELECT r.family_id, r.emails FROM email_list r WHERE r.seq = (SELECT MAX(h.seq) FROM email_list h WHERE h.family_id = r.family_id GROUP BY h.family_id) ;

*That sequence number is admittedly a bit artifical and contrived to make the recursive SQL easier to write. If you needed to, you could generate the SEQ on the fly using a RANK() of people within each family and then leverage the RANK() as a sequence number. Hopefully the example isn't lost because of that bit of confusion.*

The recursive portion of the SQL simply takes what was seeded in through the first query (i.e. people with SEQ=1) and joins that to all the records in the EMAIL_ADDR table with a SEQ = SEQ + 1 (from result set). The EMAILS output field is created by concatenating the email address from the seed to record with the next highest sequence number.

Then that result set is taken and the same thing is done. And so on. And so forth.

The query terminates when the result set contains no records. That is, when we exhaust all the SEQ numbers in the table.

The outer query refines the results down to exactly what we're looking for by keeping the final record and throwing away all the intermediate records that were used to built up the results.

**Next Time**

Look ahead for additional posts on how to use recursive SQLto parse a field with comma separated values and how to split and reconcile overlapping time segments into a single, flat time line.

This has been a really valuable post! Do you have any examples of the string concatenation where I don't have a convenient sequential reference (i.e. where you suggest that Rank() be used)?

ReplyDeleteEach time I've tried to create this using a window function, I get an "Illegal or unsupported use of subquery/derived table inside a recursive query/view" error.

Nice & Easy Keep Doing Good Job

ReplyDeletealmost same, but with test data

ReplyDeletehttp://thesimpleprogrammer.blogspot.com/2013/02/concat-rows-as-single-column-in-teradata.html

Hi Paul, thanks for your article. Recursive SQL can be tricky to understand, so I appreciate you taking the time to explain it step-by-step to us.

ReplyDeleteHi Paul,

ReplyDeleteYour explanation is great which made complicated logic to simple. thanks. Keep doing good job.

Thank you very much for this article, it is so rare to see nowadays written as fervently article. I enjoyed reading it and I learned a lot of things. I will go and continue reading your blog =). Good luck for the future and another one for the quality of it.You can also check out this (http://www.sqiar.com).

ReplyDeleteThank you for taking the time to put this together. It's very helpful in understanding SQL recursion.

ReplyDeleteThis comment has been removed by the author.

ReplyDeleteHey, two questions:

ReplyDelete1. in your email concatenation example: for the recursive select portion, do you think it lacks one column (right now only two columns)?

2. As you know, the email column in the result would be a very long one? How can you make sure it won't be truncated given the very first record of "emails" is only for one email...I ran something similar on Teradata 15.00.0305...my results from recursive selections were all truncated by the seed selection.

very good post, thanks for sharing

ReplyDeletevery nice good we share it useful benefits posted.

ReplyDeleteSelenium Training Institutes in Chennai

Selenium Training in Chennai

ReplyDeleteThanks for sharing this information and keep updating us. This is informatics and really useful to me.

Selenium Training in Chennai | Selenium Training | Selenium Course in Chennai

Unfortunately the question of Nicholas concerning the rank() was not answered.

ReplyDeleteI have exactly the same problem: Creating a tree with specific order by can only be done with a rank(), resulting in eg an order by field: 1, 1.1, 1.2, 1.3, 1.3.1, 1.3.2, ..., 2, 2.1, etc.