Comments for "XML_GRDDL"

» Submit Your Comment
Comments are only accepted during the "Proposal" phase. This proposal is currently in the "Finished" phase.
» Comments
  • Philippe Jausions  [2008-03-01 00:26 UTC]

    The total lack of PEAR coding style makes the code rather unpleasant to review at this point. The package seems useful though. But it should have stayed in draft status until at least some of code been corrected.
  • Daniel O'Connor  [2008-03-01 01:12 UTC]

    Philippe,
    I'm actually addressing a lot of it now, or at least trying to.

    On the list for today:
    * Fix all whitespace, tabs
    * Implement eRDF / htmlProfile discovering
    * Refactor much of the duplicated, pretending to be private code
    * Javadocs

    Changes will start flowing in the next 20-30 minutes, viewable @

    http://code.google.com/p/xmlgrddl/source/list

    ; and I'll happily repost all packages/samples/etc
  • Daniel O'Connor  [2008-03-01 03:04 UTC]

    Philippe,
    Please find a somewhat more PEAR-ish set of examples, package files, etc.
  • Till Klampaeckel  [2008-03-02 14:21 UTC]

    Hey Daniel,

    looks pretty interesting. Just giving you feedback to a few of your questions. And a couple Qs in the end.

    On Sat, Mar 1, 2008 at 4:20 AM, Daniel O'Connor <daniel.oconnor@gmail.com> wrote:
    > 2) I have a pre-rendered out RDF/XML document to compare my generated
    > (...)
    > - there's no PHPUnit::assertEqual($xml, $other_xml); which handles all
    > of the ins and outs that a string comparison won't cut it for.

    Can you explain this? Are you talking about the difference in linebreaks and possible whitespace? Or what exactly keeps you from a simple string comparison?

    > 3) Caching - what are good implementation strategies around this?
    > IE, Caching vs development, in one scenario you need it for
    > scale/performance, in another you need it to get out of the way.
    >
    > What existing packages are there that use Caching, and would be
    > considered 'best practice'?

    I'd personally allow this to be a configuration option.

    E.g. add an optional dependency on Cache_Lite and create an object on demand. Also allow users to configure the Cache_Lite (e.g. lifetime, backend/path) and/or add support to allow users to inject their "own" instance of Cache_Lite into the class. For example, someone is using Cache_Lite already so why re-create a second instance when your code can re-use his?

    How the users configure your class (with or without caching, or supply their own instance of Cache_Lite) is up to them.

    (user = developer)

    > 4) Logging & Exception handling
    > At the moment, I'm just throwing plain Exceptions and avoiding causing
    > them. What's the most useful way to provide robust exceptions; but
    > swallow the unimportant ones.

    I think the idea is to make your class's exception (XML_GRDDL_Exception) to extend PEAR_Exception.

    I haven't looked at your code yet, but generally if you are subclasses throw an error, I'd also throw a distinct exception. And last but not least also define class constants to define error codes.

    For example if you experience a request error:
    PEAR_Exception
    |- XML_GRDDL_Exception
    |-XML_GRDDL_Request_Exception

    I always get a bit carried away with different exception classes, but I like them for a reason:
    try {
    // your code
    } catch (XML_GRDDL_Request_Exception $e) {
    if ($e->getCode() == XML_GRDDL::ERR_NOT_FOUND) {
    // -> we found a 404
    } else {
    throw $e;
    }
    } catch (XML_GRDDL_Exception $e) {
    // -> handle general errors
    } catch (Exception $e) {
    // -> handle all other errors
    }

    A logger is also nice - e.g. for debugging. I think it all depends on how complex your class is. I don't remember right now which PEAR package implements one, but (I think) I've seen it before.

    Just my 2 cents. :-)

    Generally I have a few Qs about this thing - from what I read the objective is to convert Microformats into RDF? What do I do with it then? (Just curious. :-)) RDF looks pretty complex also.

    I also found this pretty comprehensive example on your wiki:
    <http://code.google.com/p/xmlgrddl/wiki/UsageExample>

    I see hcal, hcard, etc. - can you list all (currently) supported microformats?

    Did you implement a driver-based architecture to meet the different demands of how people want to query for microformats? (Sorry, but) I couldn't really figure this out.

    Last but not least - (pure curiousity) is GRDDL always about RDF, or can we "expect" other response formats also. Or what do you suggest to use?

    Last but not least, the links to your examples are broken. I think Google recently changed their SVN browser.

    Till
  • Daniel O'Connor  [2008-03-02 14:33 UTC]

    >> 2) I have a pre-rendered out RDF/XML document to compare my generated
    >> (...)
    >> - there's no PHPUnit::assertEqual($xml, $other_xml); which handles all
    >> of the ins and outs that a string comparison won't cut it for.

    > Can you explain this? Are you talking about the difference in linebreaks and possible whitespace? Or what exactly keeps you from a simple string comparison?


    Well, it's not just whitespace, line endings and other minor snafu.

    Cocument A contains
    <a>
    <b />
    <c />
    </a>

    while document B contains
    <a>
    <c />
    <b />
    </a>

    In essence, they represent the same information, but aren't the same document.

    What I'm currently doing is:
    - Make use of the w3c's python compliance tools (in progress)
    - Amend test cases for PHP, having manually verified equivalence

    ... which seems to have resolved my problem; but may be Bad Testing Karma.
  • Daniel O'Connor  [2008-03-02 14:42 UTC]

    > I also found this pretty comprehensive example on your wiki:
    <http://code.google.com/p/xmlgrddl/wiki/UsageExample>

    That's new and will be applicable for the next version :P


    > I see hcal, hcard, etc. - can you list all (currently) supported microformats?
    Yes and no. If you add a profile attribute to the <head> of your document, then XML_GRDDL can and will try to extract them.

    Future plans may include fuzzily detecting microformats and pretending like the page author *did* include the correct profile links.

    See http://www.ibm.com/developerworks/xml/library/x-tipproflink.html




    > Did you implement a driver-based architecture to meet the different demands of how people want to query for microformats? (Sorry, but) I couldn't really figure this out.

    There's two options out there in the free-ish XSL processing world.
    One is what PHP has, libxsl based (XSLT 1.0).
    The other is saxon, which supports XSLT 2.0

    It's also conceivable that you may wish to plug in a remote, web based XSLT service



    > Last but not least - (pure curiousity) is GRDDL always about RDF, or can we "expect" other response formats also. Or what do you suggest to use?

    RDF/XML is the recommendation of the GRDDL spec, and at the moment, the initial goals of this package.

    There are plenty of other possible ways you can serialize the structured information.

    If someone wrote an transformation for 'hcard to json' and it was widespread, then it's quite possible you could write a simple driver to handle that.

    The main tasks of this package are:
    - fetch me a document
    - find transformation listed in the GRDDL fashion
    - fetch those transformations for me
    - apply them
    - merge the resulting documents
  • Daniel O'Connor  [2008-03-02 14:46 UTC]

    > Last but not least, the links to your examples are broken. I think Google recently changed their SVN browser.

    Should be fixed now.

    See also; the nice example you quoted.

    http://code.google.com/p/xmlgrddl/wiki/UsageExample