Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 2.1.2

Bug #6470 Path is not properly encoded
Submitted: 2006-01-12 01:13 UTC
From: arailean at squiz dot net Assigned: davidc
Status: Closed Package: Net_URL2
PHP Version: 4.4.1 OS: any
Roadmaps: (Not assigned)    
Subscription  


 [2006-01-12 01:13 UTC] arailean at squiz dot net
Description: ------------ Url path is not properly encoded to comply with RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax This becomes evident when used by HTTP_Request and HTTP_Client while handling redirects. Both HTTP_Request and HTTP_Client access the PATH property of of Net_Url directly, so for proper compliance they should be modified as well. The problem is that there is no proper accessor for this property. Such accessor should handle all the re-formatting. Since this particular issue came up while using HTTP_Request, that module should also be modified to use the new functionality. Particularly, line 758 of HTTP_Request should be changed to: $path = $this->_url->getEncodedPath() . $querystring; given that the function getEncodedPath is implemented in Net_URL as provided below. Test script: --------------- /** * Returns encoded path * * Result is properly encoded and ready for use in the request. * Complies with RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax * Does not modify the internal state * * Code Based on work by: By Esben Maal?e esm-at-baseclass.modulweb.dk * see: http://baseclass.modulweb.dk/urlvalidator * * @author Andrei Railean <arailean@squiz.net> * * @return string Path * @access public */ function getEncodedPath() { $is_dir = false; if (strlen($this->path) > 1 && substr($this->path, -1) == '/') { $is_dir = true; } $path_parts = preg_split('|/|', $this->path, -1, PREG_SPLIT_NO_EMPTY); if (empty($path_parts)) { return '/'; } foreach ($path_parts as $key => $part) { // Check for % that is NOT an escape sequence || invalid chars if (preg_match('/%[^a-f0-9]{2}/i', $part) || preg_match('/[^@a-z0-9_.!~*\'()$+&,%:=;?-]/i', $part)) { $path_parts[$key] = urlencode(urldecode($part)); } } $path = '/'.implode('/', $path_parts); if ($is_dir) { $path .= '/'; } return $path; } Expected result: ---------------- /this.%60hello%60/that.html Actual result: -------------- /this.`hello`/that.html

Comments

 [2006-01-12 23:33 UTC] arailean at squiz dot net
the IF statement should read: if (preg_match('/%[^a-f0-9]/i', $part) || preg_match('/[^@a-z0-9_.!~*\'()$+&,%:=;?-]/i', $part)) { i.e. no {2}
 [2006-02-06 23:44 UTC] arailean at squiz dot net
Index: URL.php =================================================================== RCS file: /repository/pear/Net_URL/URL.php,v retrieving revision 1.42 diff -w -u -r1.42 URL.php --- URL.php 29 Oct 2005 11:17:56 -0000 1.42 +++ URL.php 6 Feb 2006 23:28:43 -0000 @@ -418,5 +418,47 @@ $this->port = is_null($port) ? $this->getStandardPort($protocal) : $port; } + /** + * Returns encoded path + * + * Result is properly encoded and ready for use in the request. + * Complies with RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax + * Does not modify the internal state + * + * Code Based on work by: esm-at-baseclass.modulweb.dk + * see: http://baseclass.modulweb.dk/urlvalidator + * + * @return string Path + * @access public + */ + function getEncodedPath() + { + $is_dir = false; + if (strlen($this->path) > 1 && substr($this->path, -1) == '/') { + $is_dir = true; + } + + $path_parts = preg_split('|/|', $this->path, -1, PREG_SPLIT_NO_EMPTY); + + if (empty($path_parts)) { + return '/'; + } + + foreach ($path_parts as $key => $part) { + // Check for % that is NOT an escape sequence || invalid chars + if (preg_match('/%[^a-f0-9]/i', $part) || preg_match('/[^@a-z0-9_.!~*\'()$+&,%:=;?-]/i', $part)) { + $path_parts[$key] = urlencode(urldecode($part)); + } + } + + $path = '/'.implode('/', $path_parts); + + if ($is_dir) { + $path .= '/'; + } + + return $path; + } + } ?>
 [2006-02-06 23:45 UTC] arailean at squiz dot net
The above patch was made from the CVS version of this package
 [2006-02-22 03:19 UTC] arailean at squiz dot net
New patch: http://delta.squiz.net/~arailean/pear6479.udiff.patch This introduces a new function 'encodePath' that is similar to resolve path in that it can be called statically. Given a string, it will encode it according to rfc spec for the path component of the url. This is a replacement for the previous patch. Diff is made from CVS
 [2007-05-08 00:27 UTC] davidc (David Coallier)
Making this a Net_URL2 problem.
 [2007-05-08 01:22 UTC] davidc (David Coallier)
Thank you for your bug report. This issue has been fixed in the latest released version of the package, which you can download at http://pear.php.net/get/Net_URL2 I have now added an option for php5 that people will be able to use like this: require_once 'Net/URL2.php'; Net_URL2::setOption('encode_query_keys', true); $url = new Net_URL2; ... and for people using php4 you go like this: require_once 'Net/URL.php'; $url = new Net_URL; $url->setOption('enable_query_keys', true); And that will make sure that the query keys (url.com/nam`=david) will output (Array( [nam%60] = david ) )