Unescape HTML entities in Oracle

If you’re creating a comma-seperated list of values which get grouped, and you’re using the built-in XML functionality to do it, you’re in for an unpleasant surprise. This happened to me, although I do believe I should have foreseen this. The problem is that when you dynamically create XML, special characters in Textnodes get escaped – and rightly so ofcourse. I didn’t think of it when implementing this solution, but found out by accident.

The problem

Let me show you an example. Suppose you have two tables: “tags” with two columns “id” and “name”, and “tags_in_posts” with two columns “tgs_id” and “pts_id”. (Obviously there is a third table, but that one isn’t important for this example.) What we want to get is a comma-seperated list of categories per post. We could run the following query:

SELECT       tip.pts_id POST_ID
           , RTRIM(
               XMLAGG(
                 XMLELEMENT(E, tgs.name || ', ')
               ).EXTRACT('//text()')
             , ', ') TAGS
FROM       tags tgs
INNER JOIN tags_in_posts tip
GROUP BY   tip.pts_id

Sample output:

+---------+----------------------------------+
| POST_ID | TAGS                             |
+---------+----------------------------------+
|       1 | twitter & facebook, DB's |
|       2 | boeing, b787                     |
|       3 | moon, tele, space                |
+---------+----------------------------------+

The XML functions create a nice comma-seperated list of all tags for each post, with the RTRIM() function removing the last comma. This works great, or? What? If a tag would contain a special character, like & or ‘, things get weird. Instead of “twitter & facebook, DB’s” you would get “twitter & facebook, DB's”. This is ofcourse correct behaviour, but it might be unexpected.

The fix

The fix lies in another (pretty cool) function in Oracle: DBMS_XMLGEN.CONVERT(). This function takes two arguments (subject and direction) and is able to escape or unescape special characters in a string. When put in our original query, it now looks like this:

SELECT       tip.pts_id POST_ID
           , DBMS_XMLGEN.CONVERT(
               RTRIM(
                 XMLAGG(
                   XMLELEMENT(E, tgs.name || ', ')
                 ).EXTRACT('//text()')
               , ', ')
             , 1) TAGS -- "1" means "UNESCAPE"
FROM       tags tgs
INNER JOIN tags_in_posts tip
GROUP BY   tip.pts_id

The query now returns a result as expected:

+---------+--------------------------+
| POST_ID | TAGS                     |
+---------+--------------------------+
|       1 | twitter & facebook, DB's |
|       2 | boeing, b787             |
|       3 | moon, tele, space        |
+---------+--------------------------+

Happy querying!

2 comments to “Unescape HTML entities in Oracle”

Hi there, I do believe your web site might be having web browser compatibility problems.
When I look at your website in Safari, it looks fine however, if opening in
IE, it’s got some overlapping issues. I just wanted to give you a quick heads up! Other than that, fantastic blog!

Excellent tip!

Thank you
Flavio

Leave a reply