Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.0.0

Bug #11892 File_CSV performance severely hurt
Submitted: 2007-08-22 19:37 UTC Modified: 2008-09-21 16:08 UTC
From: razzari Assigned: dufuz
Status: Closed Package: File_CSV
PHP Version: 4.4.6 OS: Windows 2003
Roadmaps: (Not assigned)    
Subscription  


 [2007-08-22 19:37 UTC] razzari (Manuel Razzari)
Description: ------------ With File_CSV-1.2.2 I was reading a CSV with 100.000 rows and 2 columns in 4 seconds. Now with File_CSV-1.3.0 it's taking 45 seconds. Meaning it's about 10x slower. Test script: --------------- // Basically run File_CSV against a large file where you can see the performance hit $file = '100000-rows__2-columns.csv'; $conf = File_CSV::discoverFormat($file); $_start_time = time(); while ($row = File_CSV::read($file, $conf)) { // some random process echo substr($row[0] ,0 , 1); } echo '<br>Processing time: ', time()-$_start_time, ' seconds.'; Expected result: ---------------- Processing time: 4 seconds. Actual result: -------------- Processing time: 45 seconds.

Comments

 [2007-09-09 09:07 UTC] dufuz (Helgi Þormar)
Does the performance change if you don't use discoverFormat (i.e. write out the config your self in an array)
 [2007-09-10 09:16 UTC] razzari (Manuel Razzari)
In *1.2.2*, if I use this it takes 4 seconds: $conf = array('fields' => 3, 'sep' => ',', 'quote' => NULL); While if I use this, it takes 40 seconds: $conf = array('fields' => 3, 'sep' => ',', 'quote' => '"'); In *1.3.0* both take 40 seconds.
 [2007-11-25 10:24 UTC] dufuz (Helgi Þormar Þorbjörnsson)
Thanks for the info, it was very helpful, I feared it was because we made the regex in discoverFormat a bit more complex and thus potentially slower but since it's in the parsing code then it should be easier (hopefully) to track down.
 [2008-01-01 20:31 UTC] dufuz (Helgi Þormar Þorbjörnsson)
Are you sure about that slowdown, I just created a 2 column, 100000 line file and parsed it in 5 second, roughly, on Windows Vista using File 1.3.0 I used this fairly simple snippet: <?php $delim = '"'; $data = ''; for ($i = 0; $i < 100000; $i++) { $data .= $delim . 'foo' . $delim . ',' . $delim . 'bar' . $delim . "\n"; } file_put_contents('100000-rows__2-columns.csv', $data); And then just ran your test script. Can you do that for me, ? Only change I did was $foo = substr($row[0] ,0 , 1); since I didn't want to echo things out :) Based on what you said using a double quote should be 40 seconds, even on old File but when I ran with NULL as the separator (just change $delim to null to generate the file) I got 3 seconds. I know there's probably a lot more involved in your process but I'm still a bit stumped here, tho I'm trying on PHP 5.2.5, is it possible for you to try on 4.4 and 5.2 to see if there is any difference ?
 [2008-01-01 20:36 UTC] dufuz (Helgi Þormar Þorbjörnsson)
I just wanted to note for my self, ran 1 million 2 column records on my laptop, 56 seconds, even with the disoverFormat function, not too bad, still puzzled why Manuel is having these problems.
 [2008-01-05 19:21 UTC] dufuz (Helgi Þormar Þorbjörnsson)
I just now had 2 guys reporting 22 seconds, one has php 5.1.6 and the other one has 5.2.3, this is weird, not as slow as you Manuel but still slower than what I have. I shall investigate further.
 [2008-01-05 20:14 UTC] dufuz (Helgi Þormar Þorbjörnsson)
I have more test results, 4.4 : 12 secs, 5.2.5: 6 and 6; I've started to wonder if it might be the hardware in question, all of those runs above are preformed on a rather new hardware so my question is, what are the specs for the hardware you are running on ? I will try to find out where the bottleneck is since there is clearly one for people on slower hardware.
 [2008-06-26 20:06 UTC] dufuz (Helgi Þormar Þorbjörnsson)
Did you ever check if xdebug was turned on ? I was playing around with speed related things and noticed a drop from 13 second to 5 seconds when I disabled the whole xdebug ext (due to all the extra things its doing in the background)
 [2008-06-27 09:59 UTC] razzari (Manuel Razzari)
The machine was a staging server, so I never had xdebug installed at all.
 [2008-06-29 04:36 UTC] dufuz (Helgi Þormar Þorbjörnsson)
Hmm weird, still haven't a really hard time reproducing it but I did manage to improve performance on my machine. If you could do me a quick favor and the performance checker again on some of your machines ? I can provide you with a speed test and a csv generator if you need.
 [2008-09-21 16:08 UTC] dufuz (Helgi Þormar Þorbjörnsson)
This bug has been fixed in CVS. If this was a documentation problem, the fix will appear on pear.php.net by the end of next Sunday (CET). If this was a problem with the pear.php.net website, the change should be live shortly. Otherwise, the fix will appear in the package's next release. Thank you for the report and for helping us make PEAR better. I'm gonna call this one closed since I got performance up on my computer, even exceeding what 1.2 had.