Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.1.0

Bug #3222 BitTorrent bencoding protocol errors
Submitted: 2005-01-17 19:08 UTC
From: uzume at xuno dot com Assigned: tacker
Status: Closed Package: File_Bittorrent
PHP Version: Irrelevant OS:
Roadmaps: (Not assigned)    
Subscription  


 [2005-01-17 19:08 UTC] uzume at xuno dot com
Description: ------------ Encode.php can incorrently bencode dictionaries because the BitTorrent protocol specifies: "Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics)." Also encoding and decoding does not properly encode and decode (most) strings as UTF-8 as specified in the new BitTorrent protocol. Please refer to the old and new BitTorrent protocol specifications at: http://bitconjurer.org/BitTorrent/protocol.html http://bittorrent.com/protocol.html

Comments

 [2005-01-31 12:54 UTC] tacker
Bug has beend fixed in the new release. Thanks for your support.
 [2005-02-07 01:05 UTC] uzume at xuno dot com
This is still broken. I now see code that uses asort() to sort arrays in the array encoding but this has even bigger problems. Dictionaries need be sorted by key and not by value (use ksort()). Technically, only dictionaries need to be sorted but it probably makes sense to have the lists' output by numeric value of the key (not a string sort though as you will note there is a natsort() for "natural" sorting too) so that the "first" item entered into the array is also the first item serialized. I recommend you move the sorting to later after you have identified whether the array is a dictionary or a list (and as I mentioned technically it is to be sorted by keys--not values, and only need be done for dictionaries).
 [2005-02-07 08:56 UTC] tacker
Hoi Uzume, nice QA work. I actually didn't look at what i was doing later in the method. And asort ist really wrong here. Don't know how it got in there. The new Code is below, ist that ok for you? Index: File/Bittorrent/Encode.php =================================================================== --- File/Bittorrent/Encode.php (revision 10) +++ File/Bittorrent/Encode.php (working copy) @@ -127,8 +127,6 @@ */ function encode_array($array) { - // Sort array - asort($array, SORT_STRING); // Check for strings in the keys $isList = true; foreach (array_keys($array) as $key) { @@ -139,6 +137,7 @@ } if ($isList) { // Wie build a list + ksort($array, SORT_NUMERIC); $return = 'l'; foreach ($array as $val) { $return .= $this->encode($val); @@ -146,6 +145,7 @@ $return .= 'e'; } else { // We build a Dictionary + ksort($array, SORT_STRING); $return = 'd'; foreach ($array as $key => $val) { $return .= $this->encode(strval($key));
 [2005-02-07 14:55 UTC] uzume at xuno dot com
It looks like you are doing the right thing now with respect to bencoding PHP arrays. I think I might personally go for a different implmentation but that is another issue than bugs in this software. In my original post about this bug I also mentioned that the BitTorrent specification now mentions that all fields are to be interpreted at UTF-8 now (vs. the original specification which stated ASCII in a few places but overrall did not state anything about all fields). This said there are several places where raw binary strings are required (so interpreting as UTF-8 is not appropriate), like in the peice hash values. Anyway, my point it that it would be nice if there was some easy mechanism to available to interpret the bdecoded data as UTF-8 (e.g., very useful for non-English filenames). I know several BitTorrent clients allow such to be interpreted in numberous ways because tradtionally this was not specified and people have generated torrent metafiles with various encodings (JIS, GB2312, etc.).
 [2005-02-07 16:25 UTC] tacker
Fixed in the 0.1.5 toodumptocode-release. Thank you, again.