Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 3.7.2

Request #12421 Optimize CodeSniffer
Submitted: 2007-11-11 08:52 UTC
From: cweiske Assigned: squiz
Status: Closed Package: PHP_CodeSniffer (version 1.0.0RC1)
PHP Version: Irrelevant OS:
Roadmaps: 1.0.0RC2    
Subscription  


Anyone can comment on a bug. Have a simpler test case? Does it work for you on a different platform? Let us know! Just going to say 'Me too!'? Don't clutter the database with that please !
Your email address:
MUST BE VALID
Solve the problem : 29 + 38 = ?

 
 [2007-11-11 08:52 UTC] cweiske (Christian Weiske)
Description: ------------ Running PHPCS on whole pear takes quite some time, even if only one package at a time is checked. So for large projects, every tiny bit of performance counts. I used xdebug to profile phpcs (and kcachegrind to analyze the output) and found some things that could be optimized. For example, the tokenizer takes 36% of phpcs' running time. For the Date package, 1229 loops are counted in tokenize(). In it, is_array() is called 3 times on the same variable. That is 1229*3 = 3687 calls, 2458 more than needed. When I declare a variable $bArray and use this at the three positions instead of calling is_array every time, phpcs is 300-500ms faster (18.2s-18.3s compared to 18.5-18.8s). While this does not seem to be much, such things add up when checking thousands of php files. So, to make it short, it would be cool if you could profile the phpcs code and make it faster wherever possible :)

Comments

 [2007-11-11 09:05 UTC] cweiske (Christian Weiske)
I was wrong, the 1229 calls were from Event_SignalEmitter. The time was from Date.
 [2007-11-11 09:06 UTC] cweiske (Christian Weiske)
In Date, it's 73095 is_array calls, while it was *3 times before
 [2007-11-11 09:20 UTC] cweiske (Christian Weiske)
With the 3rd version of the patch, the Date package is sniffed one second faster! (17.8s instead of 18.8s)
 [2007-11-11 09:31 UTC] squiz (Greg Sherwood)
Thanks a lot for taking the time to submit these improvements. I've tested them out and they work great! I do use xdebug and cachegrind (windows version) to profile PHP_CodeSniffer, but I've been focused on some individual sniffs that are consuming most of the time checking my code, so I'm glad someone was focused on the core code ;) Thanks again.
 [2007-11-11 09:46 UTC] squiz (Greg Sherwood)
I have a couple of questions. Was there a particular reason for the result of token_name being assigned to a var before being assigned to an array value? I don't see where it is called more than once. It looks like you have a patch missing for CodeSniffer.php as you are now passing the result of is_array to the standardiseToken method. I'm not 100% sure about that change as setting a precedent like that might lead to a lot of additional function arguments being passed around.
 [2007-11-11 10:16 UTC] squiz (Greg Sherwood)
All changes, besides the two I have questions about, are now in CVS.
 [2007-11-11 14:36 UTC] cweiske (Christian Weiske)
> Was there a particular reason for the result > of token_name being assigned to a var before > being assigned to an array value? > I don't see where it is called more than once. I thought it was used once, but in a loop - so it was called again and again because of this loop. > It looks like you have a patch missing for > CodeSniffer.php as you are now passing the > result of is_array to the standardiseToken > method. I'm not 100% sure about that change > as setting a precedent like that might lead > to a lot of additional function arguments > being passed around. Yes, that was that one function. standardiseToken($token, $bArray) It needed one additional change in AbstractPatternSniff in function createTokenPattern($str) $token = PHP_CodeSniffer::standardiseToken($token, is_array($token)); Here is the question if the added unclarity/uncleaness of the code is worth the milliseconds saved. I saw that you committed the code to CVS that still checked "=== true" and "=== false". My patch had the "=== true" added in the initial is_array check, so the checks didn't need to be performed 3 times.
 [2007-11-11 21:43 UTC] squiz (Greg Sherwood)
> I thought it was used once, but in a > loop - so it was called again and > again because of this loop. Sorry, my mistake. Yes, it does work like this and I'll make this change. > I saw that you committed the code to > CVS that still checked "=== true" > and "=== false". My patch had the > "=== true" added in the initial > is_array check, so the checks didn't > need to be performed 3 times. This is my own personal standard, to help reduce my confusion ;) I don't like doing things like: if (!$isArray), I prefer: if ($isArray === false)
 [2007-11-14 00:16 UTC] squiz (Greg Sherwood)
This bug has been fixed in CVS. If this was a documentation problem, the fix will appear on pear.php.net by the end of next Sunday (CET). If this was a problem with the pear.php.net website, the change should be live shortly. Otherwise, the fix will appear in the package's next release. Thank you for the report and for helping us make PEAR better.