Proposal for "ScriptReorganizer"

» Metadata » Status
» Description


Introduction

Looking at open source applications/libraries, a lot of people complain about the bloat ness, amongst others the size byte-wise, of scripts.

Since I started using PHP in 2003 I'm asking myself, why is source code released only in one version? Should the best practice not be a two-way deployment? The goal being (1) Good Looking Code (TM), i.e. structured (one class per file, if the OOP path is followed ;-), well documented and throughout unit tested (2) compact fast running optimized code.

How can programmers be motivated and supported goal 1-wise, without the need to sacrifice goal 2?

One aspect of optimization is the direct juggling with byte code and/or with the source code. On the byte code level there are already several solutions available as well as for caching techniques. But what about the source code reorganization? AFAIK - please correct me, if I'm wrong - the (open source) contributions have been sparse so far.

The simple equation Execution Time = Loading Time + Parsing Time, which does not take into account the fact of external constraints like DB responsiveness etc., leads to the logical answer that the loading time (incl. the imports of every single file needed for execution) is where to kick off with the tweaking.

However, playing devil's advocate, one could argue that the ZEND Engine caches the requested scripts for future use and that the parser skips all comments. So where is the added value? Brainstorming, the following pros come into mind: reduced file size and the eventually "packaging" must have an impact on the memory footprint (even if minimal), the parsing of source code as well as the no. of file I/O operations to execute performance-wise, albeit PHP is very fast in this respect.

Here we go!

To Make One's Point

ScriptReorganizer sets its focus exclusively on the file size aspect of optimization. The library/tool has the ability to reorganize source code in different (incremental) ways:

On a conceptual level the goal is to create a new file Type by applying a specific reorganization Strategy.

Following the lines it makes sense to achieve either a one-to-one Script or a many-to-one Library restructuring. Think of Library as a simple way to package existing (third party) libraries to deploy - PHAR is alive! I know, just bear with me for I will pick up this topic later on.

Having sorted out the file types to create, the following strategies spring into mind: Route as the basic strategy leaving the code base almost untouched, Quiet being the standard one stripping off in addition all comments and last but not least the advanced Pack strategy, which is comparable to the PHP function php_strip_whitespace. For more information see the documentation comments.

What's the reasoning for this layered approach? The main idea is (1) to acknowledge possible coding rules set in place - e.g. only stripping off comments is allowed (2) to have the user becoming more confident about the tool, step by step. Does it really leave the code purpose untouched, when applying one strategy after the other?

So far so good. The Strategy Pattern is in place. But what about attaching additional responsibilities to a Type dynamically to achieve even more flexibility?

The package contains two simple examples of Decorators - adding a header and/or footer after the reorganization process, e.g. for copyright notes etc. - to showcase the usage.

The third decorator Pharize is the one, which (hopefully) adds value to the work of David Shafik and Greg Beaver. Try to reorganize a script with ScriptReorganizer and consequently to create a respective PHP_Archive! Or just use the Pharize wrapper to achieve the same result... Kudos to David and Greg.

Please take notice of the following: Before chaining several decorators in one reorganization process, look out for any restriction/constraint information in the respective documentation usage-wise.

To get the big picture see the sketch.

Example

Convert a script file and all included/required files to a single packed library file, to which an overriding header is added:

<?php

require_once 'ScriptReorganizer/Strategy/Pack.php';
require_once 'ScriptReorganizer/Type/Library.php';
require_once 'ScriptReorganizer/Type/Decorator/AddHeader.php';

$headerToAdd = '/* I was here ... */' . PHP_EOL . PHP_EOL;

$library = new ScriptReorganizer_Type_Decorator_AddHeader(
....new ScriptReorganizer_Type_Library( new ScriptReorganizer_Strategy_Pack )
);

$library->load( 'sample.php' );
$library->reformat( $headerToAdd );
$library->save( 'packedLibraryWithHeader.php' );

?>

Now some figures:

If all relevant (well documented) 15 classes of the release 0.1.0 with a total size of 53,698 Bytes are packed into a single library, the reduction achieved is -81.35%, i.e. Library.phps is only 18.65% - 10,014 Bytes - of the size noted above. The header added during the reorganization process is still included in the result, see screoBuild.phps and screoLibrary.phps.

To check the correctness of the figures stated above, the import statements in ScriptReorganizer_Exception and ScriptReorganizer_Type_Decorator_Pharize must be changed to dynamic ones, i.e. require_once 'PEAR/' . 'Exception.php'; and require_once 'PHP/Archive/' . 'Creator.php'; respectively. Otherwise, these external libraries would be packed into the file too.


For more examples see the unit tests in the PEAR Package File.

P.S.: To run the unit tests PHPUnit2_Extensions_TestFileLoader must be installed - the TGZ is included in the PEAR Package File. I have contacted Sebastian Bergmann and proposed this extension to be added to the official release - he is considering it.

Known Issues

The package works correctly only with, for the time being, "pure" PHP script files. "Mixed" PHP code will be addressed in the near future.

Warning: With ScriptReorganizer optimized code the easy tracking of report error messages of the PHP Engine will definitively get cumbersome, when the advanced mode of the Pack strategy is applied. Reason being: all statements are organized on one line only!

It is curcial to throughout test (not only unit test) the code after optimizing it with this package and before building a release to deploy.

Suggested pack practice to follow:

  1. Reorganization of the source by applying the default pack mode.
  2. Running of all tests.
  3. Reorganization of the source by applying the advanced pack mode, followed by the final deployment.


P.S.: Currently I'm having food for thought regarding the Façade/Factory Pattern I would like to implement, to further simplify the package interface as well as regarding the front-end script/batch file for the stand-alone use.

Conclusions

ScriptReorganizer will have minor impact on small (few files) projects. Looking at medium-to-complex projects however, the results should be notable.

But most importantly, this package promotes the best practice of a two-way deployment and complies with the basic requirement to create a PEAR library/tool, for adding value to already existing optimization/performance enhancing applications.

» Dependencies » Links
  • PHP >= 5.0.2
  • PEAR_Exception
  • PHP_Archive >= 0.5.0 (optional)
  • PHPUnit2 >= 2.2.0
  • PHPUnit2_Extensions_TestFileLoader >= 0.3.0 (unofficial contribution)
» Timeline » Changelog
  • First Draft: 2005-05-06
  • Proposal: 2005-05-15
  • Call for Votes: 2005-05-23
  • Voting Extended: 2005-05-31