Discussion:
How do I convert from iso-8859-1 to utf-8 (bom)?
(too old to reply)
Du Dang
2004-08-04 18:11:21 UTC
Permalink
I tried to convert a block of text from iso-8859-1 to utf-8 but all I got
after the convertion is gibberish.

===============================

FileStream fs = File.Open("text.txt", FileMode.Open, FileAccess.Read);
byte[] b = new byte[length];
fs.Read(b, 0, length);

b = Encoding.Convert(Encoding.GetEncoding(28591), Encoding.UTF8, b);
return System.Text.Encoding.UTF8.GetString(b);

===============================

When I skipped the convertion line ( b = Encoding.Convert ....) the text is
legible but still in iso-8859-9 encoding.

Does anyone know what I'm doing wrong, or know a better way of doing this?

Thanks,

Du
Jon Skeet [C# MVP]
2004-08-04 18:20:31 UTC
Permalink
Post by Du Dang
I tried to convert a block of text from iso-8859-1 to utf-8 but all I got
after the convertion is gibberish.
===============================
FileStream fs = File.Open("text.txt", FileMode.Open, FileAccess.Read);
byte[] b = new byte[length];
fs.Read(b, 0, length);
For a start, you should always use the return value of Stream.Read.
Post by Du Dang
b = Encoding.Convert(Encoding.GetEncoding(28591), Encoding.UTF8, b);
return System.Text.Encoding.UTF8.GetString(b);
===============================
When I skipped the convertion line ( b = Encoding.Convert ....) the text is
legible but still in iso-8859-9 encoding.
Does anyone know what I'm doing wrong, or know a better way of doing this?
Well, to start with, why are you bothering with UTF-8 at all if you're
returning a string? Just decode it from ISO-8859-1. Converting it to
UTF-8 and back won't have any effect.

Is your file *definitely* in ISO-8859-1, and not in, say,
Encoding.Default? Which characters are being mis-converted? Could you
email me a sample file?
--
Jon Skeet - <***@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Du Dang
2004-08-04 19:38:16 UTC
Permalink
Thanks Jon,

please check your email.
Post by Jon Skeet [C# MVP]
Post by Du Dang
I tried to convert a block of text from iso-8859-1 to utf-8 but all I got
after the convertion is gibberish.
===============================
FileStream fs = File.Open("text.txt", FileMode.Open, FileAccess.Read);
byte[] b = new byte[length];
fs.Read(b, 0, length);
For a start, you should always use the return value of Stream.Read.
Post by Du Dang
b = Encoding.Convert(Encoding.GetEncoding(28591), Encoding.UTF8, b);
return System.Text.Encoding.UTF8.GetString(b);
===============================
When I skipped the convertion line ( b = Encoding.Convert ....) the text is
legible but still in iso-8859-9 encoding.
Does anyone know what I'm doing wrong, or know a better way of doing this?
Well, to start with, why are you bothering with UTF-8 at all if you're
returning a string? Just decode it from ISO-8859-1. Converting it to
UTF-8 and back won't have any effect.
Is your file *definitely* in ISO-8859-1, and not in, say,
Encoding.Default? Which characters are being mis-converted? Could you
email me a sample file?
--
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too
Continue reading on narkive:
Loading...