<?xml version="1.0"?>
<?xml-stylesheet 
 href="http://www.w3.org/2000/08/w3c-synd/style.css" type="text/css"
?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:admin="http://webns.net/mvcb/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel rdf:about="http://pear.php.net/bugs/3498/bug">
    <title>PEAR Bug #3498</title>
    <link>http://pear.php.net/bugs/3498</link>
    <description>[Closed] XML container: several add() produce encoding junk</description>
    <dc:language>en-us</dc:language>
    <dc:creator>pear-webmaster@lists.php.net</dc:creator>
    <dc:publisher>pear-webmaster@lists.php.net</dc:publisher>
    <admin:generatorAgent rdf:resource="http://pear.php.net/bugs"/>
    <sy:updatePeriod>hourly</sy:updatePeriod>
    <sy:updateFrequency>1</sy:updateFrequency>
    <sy:updateBase>2000-01-01T12:00+00:00</sy:updateBase>
    <items>
     <rdf:Seq>
      <rdf:li rdf:resource="http://pear.php.net/bugs/3498"/>
      <rdf:li rdf:resource="http://pear.php.net/bugs/3498/2005-02-21+11%3A50%3A27#2005-02-21+11%3A50%3A27"/>
      <rdf:li rdf:resource="http://pear.php.net/bugs/3498/2005-02-17+13%3A07%3A20#2005-02-17+13%3A07%3A20"/>
      <rdf:li rdf:resource="http://pear.php.net/bugs/3498/2005-02-17+08%3A42%3A40#2005-02-17+08%3A42%3A40"/>
      <rdf:li rdf:resource="http://pear.php.net/bugs/3498/2005-02-17+08%3A11%3A50#2005-02-17+08%3A11%3A50"/>
     </rdf:Seq>
    </items>
  </channel>
    <item rdf:about="http://pear.php.net/bugs/3498">
      <title>ojai@... [2005-02-16 15:01:24]</title>
      <link>http://pear.php.net/bugs/3498</link>
      <description><![CDATA[<pre>Translation2 Bug
Reported by ojai@...
2005-02-16T20:01:24-00:00
PHP: 4.3.10 OS: Linux Debian Package Version: 

Description:
------------
Hi,

When calling several times add() the encoding conversion routine is called over and over, so that already utf8 converted data is converted again and again...

It produces so horrible junk inside the XML file that I can't paste in here ;-)

Actually there are two bugs :

- when save_on_shutdown is set to false, the _data property is victim or redundant conversion performed by _convertLangEncodings() and _convertEncodings(). Indeed, these method are called by _saveData() which itself is called several times. The problem here comes from the fact that encoding conversion is performed &quot;in-place&quot;, in the _data property

- when save_on_shutdown is set to true, this problem should not happen, because _saveData() is supposed to be called only once. But the current implementation of _scheduledSaving() actually register it as a shutdown function several times... So that encoded data is reencoded as well.

Here's a patch that fix these issues. I attempted to reduce memory usage by using a buffer (the _dataBuffer property) instead of returning converted data from convert[Lang]Encodings() :

http://samalyse.com/ln/0012.php

It applies fine on the current CVS.

Cheers

--
  Olivier Guilyardi</pre>]]></description>
      <content:encoded><![CDATA[<pre>Translation2 Bug
Reported by ojai@...
2005-02-16T20:01:24-00:00
PHP: 4.3.10 OS: Linux Debian Package Version: 

Description:
------------
Hi,

When calling several times add() the encoding conversion routine is called over and over, so that already utf8 converted data is converted again and again...

It produces so horrible junk inside the XML file that I can't paste in here ;-)

Actually there are two bugs :

- when save_on_shutdown is set to false, the _data property is victim or redundant conversion performed by _convertLangEncodings() and _convertEncodings(). Indeed, these method are called by _saveData() which itself is called several times. The problem here comes from the fact that encoding conversion is performed &quot;in-place&quot;, in the _data property

- when save_on_shutdown is set to true, this problem should not happen, because _saveData() is supposed to be called only once. But the current implementation of _scheduledSaving() actually register it as a shutdown function several times... So that encoded data is reencoded as well.

Here's a patch that fix these issues. I attempted to reduce memory usage by using a buffer (the _dataBuffer property) instead of returning converted data from convert[Lang]Encodings() :

http://samalyse.com/ln/0012.php

It applies fine on the current CVS.

Cheers

--
  Olivier Guilyardi</pre>]]></content:encoded>
      <dc:date>2005-02-16T20:01:24-00:00</dc:date>
    </item>
    <item rdf:about="http://pear.php.net/bugs/3498/2005-02-21+11%3A50%3A27#2005-02-21+11%3A50%3A27">
      <title>quipo [2005-02-21 16:50]</title>
      <link>http://pear.php.net/bugs/3498#1109004627</link>
      <description><![CDATA[<pre>This bug has been fixed in CVS.

In case this was a documentation problem, the fix will show up at the
end of next Sunday (CET) on pear.php.net.

In case this was a pear.php.net website problem, the change will show
up on the website in short time.
 
Thank you for the report, and for helping us make PEAR better.

--
I finally found a moment to look at this one. Tested and committed! It works smoothly, many thanks.</pre>]]></description>
      <content:encoded><![CDATA[<pre>This bug has been fixed in CVS.

In case this was a documentation problem, the fix will show up at the
end of next Sunday (CET) on pear.php.net.

In case this was a pear.php.net website problem, the change will show
up on the website in short time.
 
Thank you for the report, and for helping us make PEAR better.

--
I finally found a moment to look at this one. Tested and committed! It works smoothly, many thanks.</pre>]]></content:encoded>
      <dc:date>2005-02-21T16:50:27-00:00</dc:date>
    </item>
    <item rdf:about="http://pear.php.net/bugs/3498/2005-02-17+13%3A07%3A20#2005-02-17+13%3A07%3A20">
      <title>ojai@... [2005-02-17 18:07]</title>
      <link>http://pear.php.net/bugs/3498#1108663640</link>
      <description><![CDATA[<pre>Here's an updated patch to optimize saving when save_on_shutdown is set to false :

- it removes the need to reload data after saving it
- it does not use a _dataBuffer property anymore, but a buffer reference which is passed to the convert*Encodings() methods
- it does not contain the &quot;_isScheduledSaving&quot; fix anymore, since I see that you implemented that in cvs

http://samalyse.com/ln/0013.php</pre>]]></description>
      <content:encoded><![CDATA[<pre>Here's an updated patch to optimize saving when save_on_shutdown is set to false :

- it removes the need to reload data after saving it
- it does not use a _dataBuffer property anymore, but a buffer reference which is passed to the convert*Encodings() methods
- it does not contain the &quot;_isScheduledSaving&quot; fix anymore, since I see that you implemented that in cvs

http://samalyse.com/ln/0013.php</pre>]]></content:encoded>
      <dc:date>2005-02-17T18:07:20-00:00</dc:date>
    </item>
    <item rdf:about="http://pear.php.net/bugs/3498/2005-02-17+08%3A42%3A40#2005-02-17+08%3A42%3A40">
      <title>ojai@... [2005-02-17 13:42]</title>
      <link>http://pear.php.net/bugs/3498#1108647760</link>
      <description><![CDATA[<pre>You're right, I did observe the reencoding bug with save_on_shutdown set to true, but I assumed it for save_on_shutdown set to false by simply reading the code.

I didn't see the _loadFile() call in _scheduleSaving()...

But, my patch provide a performance enhancement : why loadFile()'ing everytime _saveData() is called ? That's very heavy : serializing + unserializing each time. 

With save_on_shutdown set to false, it already takes a pretty long time for my script to perform about 10 add()'s, and I'm using the bufferized version as my patch provide.

However, having thought about it, I don't really like this _dataBuffer property I created. I think it would be better to optionally pass a buffer reference to the convert*Encodings() method :

    function _convertEncodings($direction, &amp;$buffer=null) 
    {
        &lt;snip&gt;

        if ($buffer) {
            $data =&amp; $buffer;
        } else {
            $data =&amp; $this-&gt;_data;
        }

        foreach ($data['pages'] as $page_id =&gt; $page_content) {

        &lt;snip&gt;
    }</pre>]]></description>
      <content:encoded><![CDATA[<pre>You're right, I did observe the reencoding bug with save_on_shutdown set to true, but I assumed it for save_on_shutdown set to false by simply reading the code.

I didn't see the _loadFile() call in _scheduleSaving()...

But, my patch provide a performance enhancement : why loadFile()'ing everytime _saveData() is called ? That's very heavy : serializing + unserializing each time. 

With save_on_shutdown set to false, it already takes a pretty long time for my script to perform about 10 add()'s, and I'm using the bufferized version as my patch provide.

However, having thought about it, I don't really like this _dataBuffer property I created. I think it would be better to optionally pass a buffer reference to the convert*Encodings() method :

    function _convertEncodings($direction, &amp;$buffer=null) 
    {
        &lt;snip&gt;

        if ($buffer) {
            $data =&amp; $buffer;
        } else {
            $data =&amp; $this-&gt;_data;
        }

        foreach ($data['pages'] as $page_id =&gt; $page_content) {

        &lt;snip&gt;
    }</pre>]]></content:encoded>
      <dc:date>2005-02-17T13:42:40-00:00</dc:date>
    </item>
    <item rdf:about="http://pear.php.net/bugs/3498/2005-02-17+08%3A11%3A50#2005-02-17+08%3A11%3A50">
      <title>quipo [2005-02-17 13:11]</title>
      <link>http://pear.php.net/bugs/3498#1108645910</link>
      <description><![CDATA[<pre>mmm... I don't really understand it, when save_on_shutdown is set to false the encoding is done once, because _scheduleSaving() calls _loadFile() again after the file has been updated. 
So data is encoded from memory to the file, but it is decoded again from the file to memory. 
At least my tests tell that there's no problem here.

Instead, when save_on_shutdown is set to true, the register_shutdown_function is called many times indeed, and I'll fix that. 

Can you send me a reproducing script for the first case?
The testsuite should cover it, but maybe I left sth out.

TIA

--
Lorenzo Alberton
http://pear.php.net/user/quipo</pre>]]></description>
      <content:encoded><![CDATA[<pre>mmm... I don't really understand it, when save_on_shutdown is set to false the encoding is done once, because _scheduleSaving() calls _loadFile() again after the file has been updated. 
So data is encoded from memory to the file, but it is decoded again from the file to memory. 
At least my tests tell that there's no problem here.

Instead, when save_on_shutdown is set to true, the register_shutdown_function is called many times indeed, and I'll fix that. 

Can you send me a reproducing script for the first case?
The testsuite should cover it, but maybe I left sth out.

TIA

--
Lorenzo Alberton
http://pear.php.net/user/quipo</pre>]]></content:encoded>
      <dc:date>2005-02-17T13:11:50-00:00</dc:date>
    </item>
</rdf:RDF>