Unescape HTML entities in Oracle
If you’re creating a comma-seperated list of values which get grouped, and you’re using the built-in XML functionality to do it, you’re in for an unpleasant surprise. This happened to me, although I do believe I should have foreseen this. The problem is that when you dynamically create XML, special characters in Textnodes get escaped – and rightly so ofcourse. I didn’t think of it when implementing this solution, but found out by accident.
The problem
Let me show you an example. Suppose you have two tables: “tags” with two columns “id” and “name”, and “tags_in_posts” with two columns “tgs_id” and “pts_id”. (Obviously there is a third table, but that one isn’t important for this example.) What we want to get is a comma-seperated list of categories per post. We could run the following query:
SELECT tip.pts_id POST_ID , RTRIM( XMLAGG( XMLELEMENT(E, tgs.name || ', ') ).EXTRACT('//text()') , ', ') TAGS FROM tags tgs INNER JOIN tags_in_posts tip GROUP BY tip.pts_id
Sample output:
+---------+----------------------------------+ | POST_ID | TAGS | +---------+----------------------------------+ | 1 | twitter & facebook, DB's | | 2 | boeing, b787 | | 3 | moon, tele, space | +---------+----------------------------------+
The XML functions create a nice comma-seperated list of all tags for each post, with the RTRIM() function removing the last comma. This works great, or? What? If a tag would contain a special character, like & or ‘, things get weird. Instead of “twitter & facebook, DB’s” you would get “twitter & facebook, DB's”. This is ofcourse correct behaviour, but it might be unexpected.
The fix
The fix lies in another (pretty cool) function in Oracle: DBMS_XMLGEN.CONVERT(). This function takes two arguments (subject and direction) and is able to escape or unescape special characters in a string. When put in our original query, it now looks like this:
SELECT tip.pts_id POST_ID , DBMS_XMLGEN.CONVERT( RTRIM( XMLAGG( XMLELEMENT(E, tgs.name || ', ') ).EXTRACT('//text()') , ', ') , 1) TAGS -- "1" means "UNESCAPE" FROM tags tgs INNER JOIN tags_in_posts tip GROUP BY tip.pts_id
The query now returns a result as expected:
+---------+--------------------------+ | POST_ID | TAGS | +---------+--------------------------+ | 1 | twitter & facebook, DB's | | 2 | boeing, b787 | | 3 | moon, tele, space | +---------+--------------------------+
Happy querying!

Leave a reply