Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.10.12

Bug #17070 UTF8 charset works but some characters appear as double question marks ??
Submitted: 2010-02-06 04:44 UTC
From: micdhack Assigned:
Status: Bogus Package: Mail_Mime (version 1.6.0)
PHP Version: 5.2.4 OS: Ubuntu 8.04 LTS Server
Roadmaps: (Not assigned)    
Subscription  


 [2010-02-06 04:44 UTC] micdhack (Michael Tsikerdekis)
Description: ------------ I am using the mime->get to encode into utf8 and it works fine but some of the characters appear as double question marks. The language that i am using is Greek. Test script: --------------- $hdrs = array( "To" => $to, "From" => $from, "Subject" => "????????? ????? ???? ???????", "Date" => date('r') ); $options=array('head_encoding' => 'quoted-printable', 'text_encoding' => 'quoted-printable', 'html_encoding' => 'base64', 'head_charset' => 'utf-8', 'html_charset' => 'utf-8', 'text_charset' => 'utf-8'); $body = $this->mime->get($options); $hdrs = $this->mime->headers($hdrs,true); Expected result: ---------------- When sent the title should appear as it is. Actual result: -------------- The end result is ?????????? ????? ???? ??????? The ? becomes a double question mark. The same thing happens with other greek phrases too and it appears that the greek letter ? is the only issue.

Comments

 [2010-02-06 16:22 UTC] alec (Aleksander Machniak)
-Status: Open +Status: Feedback
Not enough information was provided for us to be able to handle this bug. Please re-read the instructions at http://bugs.php.net/how-to-report.php If you can provide more information, feel free to add it to this bug and change the status back to "Open". Thank you for your interest in PEAR. You're writing about Greek, but I see only ASCII in your request. Please, provide an example with proper encoding (UTF-8). Also, Mail_mime will not convert any encoding. If you define head_charset as utf-8, you should use Subject in this encoding.
 [2010-02-06 21:57 UTC] micdhack (Michael Tsikerdekis)
The example that i gave here was a made up one. Normally i take this value from mysql from a utf8 field. Since the string that i receive in my email is almost fully readable expect for that one letter i decided to investigate the header information stored in the db by mail_queue and i think i found where the problem lies. So here is the headers from the db: a:7:{s:25:"Content-Transfer-Encoding";s:16:"quoted-printable";s:12:"Content-Type";s:27:"text/plain; charset=utf-8";s:12:"MIME-Version";s:3:"1.0";s:2:"To";s:23:"tsikerdekis@wuwcorp.com";s:4:"From";s:29:"UrCity <webmaster@urcity.com>";s:7:"Subject";s:182:"=?utf-8?Q?=CE=A3=CE=BA=CE=BF=CF=85=CF=80=CE=AF=CE=B4=CE=B9=CE?= =?utf-8?Q?=B1_=CF=80=CE=B1=CE=B9=CE=B4=CE=B9=CE=AC_had_some_of_its_main?= =?utf-8?Q?_information_being_edited...?=";s:4:"Date";s:31:"Sat, 06 Feb 2010 18:36:33 +0200";} So i tried to step by step identify the letters to see if there was a mistake there. For each letter there is a =XX=XX So for the word we have: =CE=A3=CE=BA=CE=BF=CF=85=CF=80=CE=AF=CE=B4=CE=B9=CE ? ? ? ? ? ? ? ? ? As you can see the final letter cannot be completed because the line is split and there is an interaption. So that leads to the ? being a ??. After transfering the =B1 next to the =CE the letter appeared normally. So i tried to see which function create the issue. So i printed the headers after the $hdrs = $this->mime->headers($hdrs,true); and the subject part of the array was this: [Subject] => =?utf-8?Q?=CE=A3=CE=BA=CE=BF=CF=85=CF=80=CE=AF=CE=B4=CE=B9=CE?= =?utf-8?Q?=B1_=CF=80=CE=B1=CE=B9=CE=B4=CE=B9=CE=AC_had_an_update_that_w?= =?utf-8?Q?as_edited/altered...?= So improrer splitting of the text looks like the number one suspect. So splitting the line should always be if a complete set of =XX=XX is being written otherwise the whole sequence should be transfered in the next line.
 [2010-02-06 21:57 UTC] micdhack (Michael Tsikerdekis)
-Status: Feedback +Status: Open
 [2010-02-06 21:59 UTC] micdhack (Michael Tsikerdekis)
Note: all the greek letters where converted here into ? but i believe you get the point.
 [2010-02-06 23:18 UTC] alec (Aleksander Machniak)
This text is encoded properly, RFC compliant and works for me. I assume it's your mail client issue.
 [2010-02-06 23:20 UTC] alec (Aleksander Machniak)
-Status: Open +Status: Bogus
Thank you for taking the time to write to us, but this is not a bug. Expected behaviour.
 [2010-03-08 13:12 UTC] alec (Aleksander Machniak)
-Roadmap Versions: 1.6.1 +Roadmap Versions: