However, you may be out of luck, for after being converted, some characters could be funky and not what you wanted to see. I think this is because, some characters in UTF8 are invisible, like x80 - x9f (see the following ISO-8859-1 characters list images)
Because I only care of those regular characters, like \x20-\x7F or \xA9 or \xAE or \x99, I strip other characters before applying encoding function.
In Perl
$content =~ s/[^(\x20-\x7F|\xA9|\xAE|\x99)]+//g; $content = encode('utf8', $content);
In PHP
$content = preg_replace('/[^(\x20-\x7F|\xA9|\xAE|\x99|\n)]+/', "", $content); $content = utf8_encode($content);
UPDATE: Actually, I found that in Perl, encode function cannot correctly convert \x99 to ™. Finally my solution is the following,
open (FILE, ">$your_file") || die "couldn't write to epcmf file\n"; binmode(FILE, ":UTF-8"); $title =~ s/[^(\x20-\x7F|\xA9|\xAE|\x99)]+//g; $title =~ s/\x99/™/g; $title =~ s/\xAE/®/g; $title =~ s/\xA9/©/g; print FILE $title;
Note:
- You should edit your script in UTF-8, for example, in PUTTY, you can change your character set to UTF-8 at Configuration > Windows > Translation
- UTF-8 is different to utf8, so in make sure you write it as binmode(FILE, ":UTF-8");
No comments:
Post a Comment