Hi, colleagues
The following works for me well,
You can also try to use python which is
easy to implement and test. If using unicode function and represent CESU-8 encoded string as byte stream already encoded with UTF-8 this will work fine. The problem with CESU-9 only comes for Unicode point starting with U+10000 and higher. For those point you can use surrogaite pair which is available on wiki in google, here is the algorithm to get UTF-16 representation for Unicode points higher then FFFF.
v = 0x64321
v′ = v - 0x10000
= 0x54321
= 0101 0100 0011 0010 0001
vh = v′ >> 10
= 01 0101 0000 // higher 10 bits of v′
vl = v′ & 0x3FF
= 11 0010 0001 // lower 10 bits of v′
w1 = 0xD800 + vh
= 1101 1000 0000 0000
+ 01 0101 0000
= 1101 1001 0101 0000
= 0xD950 // first code unit of UTF-16 encoding
w2 = 0xDC00 + vl
= 1101 1100 0000 0000
+ 11 0010 0001
= 1101 1111 0010 0001
= 0xDF21 // second code unit of UTF-16 encoding
In other words you get UTF-8 encoded stream which is perfectly understood by HANA and you can store the information perfectly by using your own codec that is compliant with CESU-8.
To get some knowledge about UTF-8 encoding you can refer to utfcpp.sourceforge.net library and the algorithm above can be used to extend it for CESU-8 compatibility.
You do not need to use UTF-16 for python, this will not work for HANA.
Regards,
Vasily Sukhanov