Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.5.6

Request #18876 getting attachment filename in the desired charset
Submitted: 2011-09-27 00:41 UTC
From: dlopez Assigned: alan_k
Status: Closed Package: Mail_mimeDecode (version SVN)
PHP Version: 5.3.8 OS: linux
Roadmaps: (Not assigned)    

 [2011-09-27 00:41 UTC] dlopez (Daniel Lopez)
Description: ------------ When decoding an email to extract file attachments, d_parameters['filename'] is returned in the original charset (eg- KOI8). Even though mimeDecode parses the header string and knows the original charset (see line 731), the calling function has no access to the charset of that return value, so it can not be reliably translated to another desired charset (eg- UTF-8). If you add the following line between line 737 and 738: $text = iconv($charset, iconv_get_encoding('output_encoding'), $text); the calling function can affect the charset of the filename decoded by mimeDecode simply by setting the desired output charset, via: iconv_set_encoding("output_encoding", "UTF-8"); Surely I'm not the only person grappling with processing filenames in unpredictable charsets--is there a reason this has not been done before? Have other approaches been considered?


 [2011-09-27 01:16 UTC] dlopez (Daniel Lopez)
My bad--the iconv needs to apply to both B and Q cases, so the iconv call should be on line 747 right before the str_replace.
 [2011-09-27 13:59 UTC] dlopez (Daniel Lopez)
One more clarification: I understand that you can set decode_headers=false so that you can do the decoding yourself... but then what's the point to ever setting decode_headers=true if you can't trust that the return value will be in an expected charset? Wouldn't it make more sense if decode_headers was a charset string that _decodeHeader() would use to iconv() the output? And then put in a case whereby if decode_headers=null, then it would skip any decoding (and conversion)?
 [2011-09-27 14:02 UTC] alan_k (Alan Knowles)
looks like it should be optional (so as not to break BC) - (however I guess it is the recommended setting..)
 [2011-09-27 14:20 UTC] alan_k (Alan Knowles)
-Status: Open +Status: Closed -Assigned To: +Assigned To: alan_k
This bug has been fixed in SVN. If this was a documentation problem, the fix will appear on by the end of next Sunday (CET). If this was a problem with the website, the change should be live shortly. Otherwise, the fix will appear in the package's next release. Thank you for the report and for helping us make PEAR better. Can you test the changed code. r1=317378&r2=317377&pathrev=317378&view=patch
 [2011-09-27 16:06 UTC] dlopez (Daniel Lopez)
Holy cow, you're a superhero. I thought this feature request would be on a back-burner (or perhaps had some simple workaround that wasn't clear to me). Kudos to you. Preliminary testing with KOI8 content (and with decode_headers=false) looks good, with two notes: 1) The second argument to _decodeHeaders appears to be a misspelling of $default_charset. It's not broken though because you misspelled it the same way in both places. 2) For greater robustness you might want to check if iconv fails and returns false. For example, if the charset passed via decode_headers is invalid or not supported (I set it to 'foobar' to test), mimeDecode now returns an empty string, which might catch people off-guard. If iconv returns false you may want to leave the value either undecoded or else do the straight decoding as was done prior to this patch. I suppose the latter choice is more backward compatible.
 [2011-10-03 11:48 UTC] alan_k (Alan Knowles)
Thanks, well spotted, should be fixed in svn now. Regards Alan
 [2011-10-03 22:27 UTC] dlopez (Daniel Lopez)
Sorry to continue the thread... was there a reason you check the return value of the iconv call after the loop, but not the one inside the loop? I would think it should be in there as well...
 [2011-10-04 05:39 UTC] alan_k (Alan Knowles)
Thanks, I think my eyes are going ;)
 [2012-03-25 02:20 UTC] alexvolkow (Alex Volkow)
Hi, I've a problem with new patch, As an example: Subject: =?KOI8-R?B?8NLJ18XUIQ==?= From: =?KOI8-R?B?4czFy9PBzsTSIPfPzMvP1w==?= <> Becomes: Subject: ?รท??????????! From: ????????? ?????? <> It should be as follows, Subject: ??????! So the subject line being modified to KOI8-R , but not to UTF-8. Thanks in advance!
 [2012-03-25 02:24 UTC] alexvolkow (Alex Volkow)
It seems to be that this site does'nt support russian charset, so I've posted my question to
 [2012-03-26 15:06 UTC] alan_k (Alan Knowles)
This seems to work fine. $mime_message = "Subject: =?KOI8-R?B?8NLJ18XUIQ==?= From: =?KOI8-R?B?4czFy9PBzsTSIPfPzMvP1w==?= <> Hello "; require_once 'Mail/mimeDecode.php'; $decoder = new Mail_mimeDecode($mime_message); $params = array( 'include_bodies' => TRUE, 'decode_bodies' => TRUE, 'decode_headers' => 'UTF8//IGNORE' ); $decoded = $decoder->decode($params); print_r($decoded);