Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 2.0.4

Bug #3498 XML container: several add() produce encoding junk
Submitted: 2005-02-16 20:01 UTC
From: ojai at nerim dot net Assigned: quipo
Status: Closed Package: Translation2
PHP Version: 4.3.10 OS: Linux Debian
Roadmaps: (Not assigned)    
Subscription  


 [2005-02-16 20:01 UTC] ojai at nerim dot net
Description: ------------ Hi, When calling several times add() the encoding conversion routine is called over and over, so that already utf8 converted data is converted again and again... It produces so horrible junk inside the XML file that I can't paste in here ;-) Actually there are two bugs : - when save_on_shutdown is set to false, the _data property is victim or redundant conversion performed by _convertLangEncodings() and _convertEncodings(). Indeed, these method are called by _saveData() which itself is called several times. The problem here comes from the fact that encoding conversion is performed "in-place", in the _data property - when save_on_shutdown is set to true, this problem should not happen, because _saveData() is supposed to be called only once. But the current implementation of _scheduledSaving() actually register it as a shutdown function several times... So that encoded data is reencoded as well. Here's a patch that fix these issues. I attempted to reduce memory usage by using a buffer (the _dataBuffer property) instead of returning converted data from convert[Lang]Encodings() : http://samalyse.com/ln/0012.php It applies fine on the current CVS. Cheers -- Olivier Guilyardi

Comments

 [2005-02-17 13:11 UTC] User who submitted this comment has not confirmed identity
If you submitted this note, check your email.If you do not have a message, click here to re-send
MANUAL CONFIRMATION IS NOT POSSIBLE.  Write a message to pear-dev@lists.php.net
to request the confirmation link.  All bugs/comments/patches associated with this

email address will be deleted within 48 hours if the account request is not confirmed!
 [2005-02-17 13:42 UTC] ojai at nerim dot net
You're right, I did observe the reencoding bug with save_on_shutdown set to true, but I assumed it for save_on_shutdown set to false by simply reading the code. I didn't see the _loadFile() call in _scheduleSaving()... But, my patch provide a performance enhancement : why loadFile()'ing everytime _saveData() is called ? That's very heavy : serializing + unserializing each time. With save_on_shutdown set to false, it already takes a pretty long time for my script to perform about 10 add()'s, and I'm using the bufferized version as my patch provide. However, having thought about it, I don't really like this _dataBuffer property I created. I think it would be better to optionally pass a buffer reference to the convert*Encodings() method : function _convertEncodings($direction, &$buffer=null) { <snip> if ($buffer) { $data =& $buffer; } else { $data =& $this->_data; } foreach ($data['pages'] as $page_id => $page_content) { <snip> }
 [2005-02-17 18:07 UTC] ojai at nerim dot net
Here's an updated patch to optimize saving when save_on_shutdown is set to false : - it removes the need to reload data after saving it - it does not use a _dataBuffer property anymore, but a buffer reference which is passed to the convert*Encodings() methods - it does not contain the "_isScheduledSaving" fix anymore, since I see that you implemented that in cvs http://samalyse.com/ln/0013.php
 [2005-02-21 16:50 UTC] User who submitted this comment has not confirmed identity
If you submitted this note, check your email.If you do not have a message, click here to re-send
MANUAL CONFIRMATION IS NOT POSSIBLE.  Write a message to pear-dev@lists.php.net
to request the confirmation link.  All bugs/comments/patches associated with this

email address will be deleted within 48 hours if the account request is not confirmed!