Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.0.15

Bug #13103 Space broken the URL when redirect
Submitted: 2008-02-12 07:12 UTC
From: khoinqq Assigned:
Status: Open Package: Net_URL
PHP Version: 5.1.4 OS:
Roadmaps: (Not assigned)    
Subscription  


 [2008-02-12 07:12 UTC] khoinqq (Khoi Nguyen)
Description: ------------ Some time when we request a URL with a param contains "+" which is the encode of "space", for example: http://www.example.com/search?keyword=my+keyword Then the page is response for 301 and redirect to Location: /newsearch?keyword=my keyword (the site response with URL contains space), the HTTP_Client request it "as is" without encode the URL, made the URL to be corrupt and return 404. Test script: --------------- <?php require_once "HTTP/Client.php"; // This URL is came from a spider, and is properly encoded $url = "http://www.amazon.com/o/redirect%3Ftag%3Damd-google-20%26path%3Dsearch-handle-url/index%3Dstripbooks%2526field-keywords%3Dfood%2520for%2520mood%2526results-process%3Ddefault%2526dispatch%3Dsearch/ref%3Dpd_sl_aw_tops-1_stripbooks_6069142_2"; $client = new HTTP_Client(); $content= $client->get($url); $response = $client->currentResponse(); echo $content; file_put_contents("amazon_source.txt", $response['body']); Expected result: ---------------- Print on screen: 200 In the amazon_source.txt is the HTML source of the page: http://www.amazon.com/exec/obidos/search-handle-url/index=stripbooks&field-keywords=food%20for%20mood&results-process=default&dispatch=search/ref=pd_sl_aw_tops-1_stripbooks_6069142_2&results-process=default Actual result: -------------- Print on screen: 404 In the amazon_source.txt is the 404 error page of Amazon

Comments

 [2008-02-12 08:42 UTC] khoinqq (Khoi Nguyen)
Just a quick fix, changes from: function get($url, $data = null, $preEncoded = false, $headers = array()) { $request =& $this->_createRequest($url, HTTP_REQUEST_METHOD_GET, $headers); if (is_array($data)) { foreach ($data as $name => $value) { $request->addQueryString($name, $value, $preEncoded); } } elseif (isset($data)) { $request->addRawQueryString($data, $preEncoded); } return $this->_performRequest($request); } TO: function get($url, $data = null, $preEncoded = false, $headers = array()) { // Replace space by + before get $url = str_replace(" ", "+", $url); $request =& $this->_createRequest($url, HTTP_REQUEST_METHOD_GET, $headers); if (is_array($data)) { foreach ($data as $name => $value) { $request->addQueryString($name, $value, $preEncoded); } } elseif (isset($data)) { $request->addRawQueryString($data, $preEncoded); } return $this->_performRequest($request); }
 [2008-07-23 15:04 UTC] avb (Alexey Borzov)
HTTP_Client ultimately uses Net_URL for its URL processing needs, moving the bug there.