Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.8.9

Bug #10306 Strings with Double Quotes get encoded wrongly
Submitted: 2007-03-08 13:27 UTC Modified: 2007-05-05 10:06 UTC
From: ota Assigned: cipri
Status: Closed Package: Mail_Mime (version 1.4.0a1)
PHP Version: 5.1.4 OS:
Roadmaps: 1.4.0, 1.4.0a3, 1.4.0a2    
Subscription  


 [2007-03-08 13:27 UTC] ota (Ota Mares)
Description: ------------ Reopening of the Bug Report #10298 The bug tracker allways tells me that my provided password is wrong so i am unable to reopen the old bug report. The problem still exists and this time i have a fix at hand. Here is a example description of the problem: The header values get exploded at a whitespace character so you get several chunks. That means you have chunks like '"umlautÄ' or 'Äumlaut"' (without the SQ) if the original string is '"umlautÄ something"' or '"something Äumlaut"'. The encoding function takes the singel chunks and encodes them to =?iso-8859-1?Q?"umlaut=C4?= or =?iso-8859-1?Q?=C4umlaut"?=. Which is not correct because the opening or closing quote cannot be found then, the DQ is encapsuled by the encoding prefix and suffix in the final version of the encoded string. ('=?iso-8859-1?Q?"umlaut=C4?="' or '"=?iso-8859-1?Q?=C4umlaut"?=) The problem just appears if you have a string that: 1. consists of more then one word with a whitespace between them 2. is surrounded by double quotes 3. includes to be encoded characters. I fixed the problem by checking if there is an DQ at the begining or end of the chunk and if yes removing it and adding it back after the word got encoded. Just look after the comment 'Fix for Bug #10298, Ota Mares <om@viazenetti.de>'. I also changed a bit the if structure. /** * Encodes a header as per RFC2047 * * @param array $input The header data to encode * @param array $params Extra build parameters * @return array Encoded data * @access private */ function _encodeHeaders($input, $params = array()) { $build_params = $this->_build_params; while (list($key, $value) = each($params)) { $build_params[$key] = $value; } foreach ($input as $hdr_name => $hdr_value) { $hdr_vals = preg_split("|(\s)|", $hdr_value, -1, PREG_SPLIT_DELIM_CAPTURE); $hdr_value_out=""; $previous = ""; foreach ($hdr_vals as $hdr_val){ if (!trim($hdr_val)){ //whitespace needs to be handled with another string, or it //won't show between encoded strings. Prepend this to the next item. $previous .= $hdr_val; continue; }else{ $hdr_val = $previous . $hdr_val; $previous = ""; } if (preg_match('#[\x80-\xFF]{1}#', $hdr_val)) { //Fix for Bug #10298, Ota Mares <om@viazenetti.de> //Check if there is a double quote at beginning or end of the string to prevent //that an open or closing quote gets ignored because its encapsuled by an encoding //prefix or suffix. //Remove the double quote and set the specific prefix or suffix variable //so later we can concat the encoded string and the double quotes back together to get //the intended string. $quotePrefix = $quoteSuffix = ''; if ($hdr_val{0} == '"') { $hdr_val = substr($hdr_val, 1); $quotePrefix = '"'; } if ($hdr_val{strlen($hdr_val)-1} == '"') { $hdr_val = substr($hdr_val, 0, -1); $quoteSuffix = '"'; } if (function_exists('iconv_mime_encode')){ $imePref = array(); if ($build_params['head_encoding'] == 'base64'){ $imePrefs['scheme'] = 'B'; }else{ $imePrefs['scheme'] = 'Q'; } $imePrefs['input-charset'] = $build_params['head_charset']; $imePrefs['output-charset'] = $build_params['head_charset']; $hdr_val = iconv_mime_encode($hdr_name, $hdr_val, $imePrefs); $hdr_val = preg_replace("#^{$hdr_name}\:\ #", "", $hdr_val); }else{ //This header contains non ASCII chars and should be encoded. switch ($build_params['head_encoding']) { case 'base64': //Base64 encoding has been selected. //Generate the header using the specified params and dynamicly //determine the maximum length of such strings. //75 is the value specified in the RFC. The first -2 is there so //the later regexp doesn't break any of the translated chars. //The -2 on the first line-regexp is to compensate for the ": " //between the header-name and the header value $prefix = '=?' . $build_params['head_charset'] . '?B?'; $suffix = '?='; $maxLength = 75 - strlen($prefix . $suffix) - 2; $maxLength1stLine = $maxLength - strlen($hdr_name) - 2; //Base64 encode the entire string $hdr_val = base64_encode($hdr_val); //This regexp will break base64-encoded text at every //$maxLength but will not break any encoded letters. $reg1st = "|.{0,$maxLength1stLine}[^\=][^\=]|"; $reg2nd = "|.{0,$maxLength}[^\=][^\=]|"; break; case 'quoted-printable': default: //quoted-printable encoding has been selected //Generate the header using the specified params and dynamicly //determine the maximum length of such strings. //75 is the value specified in the RFC. The -2 is there so //the later regexp doesn't break any of the translated chars. //The -2 on the first line-regexp is to compensate for the ": " //between the header-name and the header value $prefix = '=?' . $build_params['head_charset'] . '?Q?'; $suffix = '?='; $maxLength = 75 - strlen($prefix . $suffix) - 2; $maxLength1stLine = $maxLength - strlen($hdr_name) - 2; //Replace all special characters used by the encoder. $search = array("=", "_", "?", " "); $replace = array("=3D", "=5F", "=3F", "_"); $hdr_val = str_replace($search, $replace, $hdr_val); //Replace all extended characters (\x80-xFF) with their //ASCII values. $hdr_val = preg_replace( '#([\x80-\xFF])#e', '"=" . strtoupper(dechex(ord("\1")))', $hdr_val ); //This regexp will break QP-encoded text at every $maxLength //but will not break any encoded letters. $reg1st = "|(.{0,$maxLength1stLine})[^\=]|"; $reg2nd = "|(.{0,$maxLength})[^\=]|"; break; } //Begin with the regexp for the first line. $reg = $reg1st; //Prevent lins that are just way to short; if ($maxLength1stLine >1){ $reg = $reg2nd; } $output = ""; while ($hdr_val) { //Split translated string at every $maxLength //But make sure not to break any translated chars. $found = preg_match($reg, $hdr_val, $matches); //After this first line, we need to use a different //regexp for the first line. $reg = $reg2nd; //Save the found part and encapsulate it in the //prefix & suffix. Then remove the part from the //$hdr_val variable. if ($found){ $part = $matches[0]; $hdr_val = substr($hdr_val, strlen($matches[0])); }else{ $part = $hdr_val; $hdr_val = ""; } //RFC 2047 specifies that any split header should be seperated //by a CRLF SPACE. if ($output){ $output .= "\r\n "; } $output .= $prefix . $part . $suffix; } $hdr_val = $output; } //Fix for Bug #10298, Ota Mares <om@viazenetti.de> //Concat the double quotes if existant and encoded string together $hdr_val = $quotePrefix.$hdr_val.$quoteSuffix; } $hdr_value_out .= $hdr_val; } $input[$hdr_name] = $hdr_value_out; } return $input; }

Comments

 [2007-03-27 11:55 UTC] cipri (Cipriano Groenendal)
This bug has been fixed in CVS. If this was a documentation problem, the fix will appear on pear.php.net by the end of next Sunday (CET). If this was a problem with the pear.php.net website, the change should be live shortly. Otherwise, the fix will appear in the package's next release. Thank you for the report and for helping us make PEAR better. Please try revision 1.63, and let me know if that worked for you :)
 [2007-04-05 05:46 UTC] cipri (Cipriano Groenendal)
The latest cvs version of mail_mime does not fix the bug, addionaly another bug appears which results from replacing every double quote with =22. The outcome of this is that the email is handled as plain text because the email programm is unable to read the headers. Double quotes are special chars and are reserved for special interpretation, such as delimiting lexical tokens, so you have to maintain them on specific positions. As example the current created boundary header is wrong: boundary=3D=22=3D=5Fcb6a70c3d3376e8701718f180e8a16b3=22?= The correct version would be: boundary="=3D=5Fcb6a70c3d3376e8701718f180e8a16b3" After checking your fix i saw that it only apply's to the non iconv function part. The error also occurs when the iconv functions are available! The perfect solution would be to filter every header value which is delimited by double quotes, extract the value without the double quotes and then apply the encoding function on the filtered string. This is currently not so easy realizable because the headers get exploded on every whitespace.
 [2007-04-05 05:57 UTC] cipri (Cipriano Groenendal)
Thank you for your bug report. This issue has been fixed in the latest released version of the package, which you can download at http://pear.php.net/get/Mail_Mime Fixed in 1.4.0a3
 [2007-05-05 10:06 UTC] cipri (Cipriano Groenendal)
Thank you for your bug report. This issue has been fixed in the latest released version of the package, which you can download at http://pear.php.net/get/Mail_Mime Fixed in 1.4.0