Monday, October 3, 2011

Handling special characters with FOR XML PATH('')

Since SQL Server 2005, we’ve been able to use FOR XML PATH('') to do string concatenation. I’ve blogged about it before several times. But I don’t think I’ve blogged about the fact that it all goes a bit wrong if you have special characters in the strings you’re concatenating.
Generally, I don’t even worry about this. I should, but I don’t, particularly when the solution is so easy.
Suppose I want to concatenate the list of user databases on my system, in alphabetical order. Easy: Note the lack of
select
  stuff(
     (select ', ' + name -- Note the lack of column name
     from sys.databases
     where database_id > 4
     order by name
     for xml path('')
     )
   , 1, 2, '') as namelist;
This is easy, and it just works. The STUFF command lets me strip the leading comma and space, and my list of database names is done.
But if I wanted to list the names of database with triangular brackets around them… then that’s bit more complicated.
select
  stuff(
     (select ', <' + name + '>'
     from sys.databases
     where database_id > 4
     order by name
     for xml path('')
     )
   , 1, 2, '') as namelist;
It still runs, but I my results don’t show the triangular brackets, it shows <databasename>, <databasename2>. It’s not what I want to see. I effectively need to HTMLDecode the results – but T-SQL doesn’t have a function like that.
However, if I actually make my FOR XML call return actual well-formed XML, then I can extract the contents out, and it returns the data to me in its original (correct) form.
select
  stuff(
     (select ', <' + name + '>'
     from sys.databases
     where database_id > 4
     order by name
     for xml path(''), root('MyString'), type      ).value('/MyString[1]','varchar(max)')
   , 1, 2, '') as namelist;
 OR

select
 stuff(
    (select ', <' + name + '>'
    from sys.databases
    where database_id > 4
    order by name
    for xml path(''), type
    ).value('(./text())[1]','varchar(max)')
  , 1, 2, '') as namelist;

To be valid XML, it needs a ROOT element. I also need to tell it to be XML, using the TYPE directive.
But this lets me hook into the value of /MyString[1], and return that as varchar(max).
And it works, my data comes back as , , etc.
It’s a habit I need to use more often.

Edit: I can also skip the ROOT element (but keep TYPE) and use .value('(./text())[1]','varchar(max)') - but for some reason it's always felt strange to me to be returning something which I declare to be XML (using TYPE) without it being well-formed.

Source : http://sqlblog.com/blogs/rob_farley/archive/2010/04/15/handling-special-characters-with-for-xml-path.aspx