.NET Framework - Unicode/UTF-8 decoding

Asked By Bill Nguyen on 05-Jun-07 12:43 AM
Below are sometext I extracted from a mySQL database. How can I decode them
so that I can read them in Unicode?



Virginia Hamilton Adair / Lâm Thị Mỹ Dạ
Lấp lánh hồn thơ Việt trên sân ga Tokyo chiều cuối năm

Mr. Arnold replied on 05-Jun-07 01:55 AM
If you have VS for .Net 2003 or 2005, then you can go to Help/Index/Visual
Basic and
enter Unicode UTF-8 in the Search box. It will give you the whole section
with program examples of how to do UTF-8, UTF-16, etc, etc
guff replied on 05-Jun-07 03:28 AM
This text looks as it has been decoded with a different encoding than
was used to encode it. It might be possible to recreate the data if you
know what encodings was used to encode and decode it. Then you might be
able to encode it back to it's prevois state and use the proper encoding
to decode it. There is a great risk that some data has been lost,
though, and that you can't recreate the original data from this stage.

If you want to store unicode strings in the MySQL database, it has to be
set up to use unicode as character set.

Göran Andersson
guff replied on 05-Jun-07 11:21 AM
You are doing exactly what I was talking about. If you read the data
using the wrong encoding, then save it using the same encoding, you can
then open it using the corrent encoding, provided that the process
hasn't removed any data.

If you have set up your MySQL database to use unicode, and still get the
string out in that manner, the error is before you even saved the string
in the database in the first place. What you have done is basically:

unicode -> bytes -> wrong encoding -> MySQL -> wrong encoding -> html ->
bytes -> browser -> unicode

While this gives the correct result for some strings, some byte codes
used in UTF-8 doesn't represent a single character by themselves, so if
you contine to store mis-decoded strings as unicode, you will sooner or
later experience corrupted strings.

Göran Andersson
Bill Nguyen replied on 05-Jun-07 07:46 PM
Göran ;

I think you are correct. However, not much I can do since I can not change
the host server parameters.
I am using SQLyog to access mySQL remotely. What I need is to be able to
read the data in its correct format/encoding scheme. Is it possible with
.NET ?


guff replied on 06-Jun-07 04:37 AM
Yes, it's possible in .NET.

Strictly speaking you can't read it using the correct encoding, as it's
not stored using the correct encoding. You can only read it the same way
it's stored, then you have to reverse the process by encoding it using
the same wrong encoding and decoding it using the correct encoding.

As I said earlier, this will not work for all strings, so if you want a
system that works correctly, you have to change how the data is stored
in the database.

Göran Andersson