Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.5.6

Bug #2415 Mail_MimeDecode: No UTF-8 decoding of header
Submitted: 2004-09-30 08:31 UTC
From: urs Assigned: alan_k
Status: Closed Package: Mail_mimeDecode
PHP Version: 4.3.8 OS: Windows/Linux
Roadmaps: 1.6.0    

 [2004-09-30 08:31 UTC] urs
Description: ------------ Migrating UTF-8 There is currently a problem with UTF-8 decoding regarding the headers in Mail_MimeDecode. Eventually there is a PHP-native "mb_decode_mimeheader" to be independent from mbstring functions. Reproduce code: --------------- You can come around it by applying the following patch: <?php // // // The following three lines are to add if(stristr($text, 'utf-8')) $text = mb_decode_mimeheader($text); } // About line 541 $input = str_replace($encoded, $text, $input); ?> Expected result: ---------------- Source $utf8_str = '2. OOS Tagung =?UTF-8?B?ZsO8ciBkaWUgw5ZmZmVudGxpY2hlIFY='; Expected Result: 2. OOS Tagung für die Öffentliche V Actual result: -------------- 2. OOS Tagung für die Öffentliche V


 [2005-02-10 09:09 UTC] jazfresh at hotmail dot com
There is an additional complication. Because there is no way to specify the character set that the header should be decoded to, it's possible to send a header that includes strings in multiple encodings, and have them both be decoded to the their native forms with no way to determine where the boundaries are in the result. e.g. "=?sjis?q?<some Japanese text>?= =?euc-kr?q?<some Korean text>?=" This header uses two encodings, but the resulting decoded string with the current function will be a mish-mash of Shift-JIS and EUC-KR. It would be better to specify a single encoding that these values should be converted to (e.g. UTF-8). There are two possible ways to solve this issue: 1) Add a "decode_to" setting to the input parameters that specifies the encoding that will be decoded to. By default, this would be UTF-8. Internally, the following changes are required on around line 508: (This assumes the "decode_to" parameter is passed to the function as "$decode_to") $text = iconv($decode_to, "$charset//TRANSLIT", $text); $input = str_replace($encoded, $text, $input); This method requires that iconv is installed, which may not be that common. The alternative method is to use mb_decode_mimeheader (which requires that mbstring is installed). A "decode_to" parameter does not need to be supplied because mb_decode_mimeheader uses an (user-adjustable) internal setting by default for the destination encoding. But mb_decode_mimeheader has problems of its own. There is a bug where the hexadecimal characters are only interpreted if they are in lower case (RFC 2047 recommends that they should be in upper case). Thus, a correct replacement function would replace the switch statement on line 495 with: if(strtolower($encoding) == 'q') { // convert all uppercase hex chars to lower case values to workaround the mb_decode_mimeheader bug. preg_match_all('/=([A-F0-9]{2})/i', $text, $matches); foreach($matches[1] as $value) $text = str_replace('='.$value, strtolower('='.$value), $text); // reconstruct the encoded header $encoded = "={$charset}?{$encoding}?{$text}?="; } $text = mb_decode_mimeheader($encoded); $input = str_replace($encoded, $text, $input);
 [2005-02-10 09:19 UTC] jazfresh at hotmail dot com
Woops, made a mistake: $text = iconv($decode_to, "$charset//TRANSLIT", $text); should read: $text = iconv($charset, "$decode_to//TRANSLIT", $text); of course :)
 [2006-04-27 14:34 UTC] cipri (Cipriano Groenendal)
Moved to Mail_MimeDecode subpackage.
 [2010-09-02 16:33 UTC] alan_k (Alan Knowles)
-Status: Open +Status: Closed -Assigned To: +Assigned To: alan_k
This bug has been fixed in SVN. If this was a documentation problem, the fix will appear on by the end of next Sunday (CET). If this was a problem with the website, the change should be live shortly. Otherwise, the fix will appear in the package's next release. Thank you for the report and for helping us make PEAR better.