International Components for Unicode (ICU) from IBM - a C/C++ library providing "robust and full-featured Unicode support on a wide variety of platforms." under an open source (X license). It looks like I can just pick this up and use it in my RDF/XML parser to handle the Unicode character normalisation