Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.4.1

Request #16783 Better validation when creating a field
Submitted: 2009-11-12 22:42 UTC
From: frozensolid Assigned: dbs
Status: Closed Package: File_MARC (version 0.4.2)
PHP Version: Irrelevant OS: N/A
Roadmaps: (Not assigned)    
Subscription  


 [2009-11-12 22:42 UTC] frozensolid (Matt Schraeder)
Description: ------------ We deal with large collection files and often these files are poorly made and do not follow proper MARC standards. Most often will be an invalid tag (for example: "30-"). On lines 319 and 349 of MARC.php, new Data Fields are created without validating the inputs, so the follow error is thrown: Tag "30-" is not a valid tag. in <b>/usr/share/php/File/MARC/Data_Field.php</b> on line <b>105</b> Before creating the field, the Tag and Indicators should be validated, and if invalid throw warnings and ignore the bad data.

Comments

 [2009-11-12 23:36 UTC] dbs (Dan Scott)
-Status: Open +Status: Feedback
Could you please attach some example bad MARC records so that I can put together a test and patch for your problem?
 [2009-11-13 01:44 UTC] frozensolid (Matt Schraeder)
I uploaded the file to http://www.mo.btsb.com/test/test.mrc The following test code will cause the error just by looping through the records. The last working record is: Everything I needed to know about being a girl I learned from Judy Blume / _cedited by Jennifer O'Connell. $testfile = 'test.mrc'; if (file_exists($testfile)) { $MARCfile = new File_MARC($testfile); while ($marcRecord = $MARCfile->next()) { $field = $marcRecord->getField('245'); echo $field->__toString().'<br>'; } } The record obviously has a lot wrong with it, but simply commenting out throwing the exception in the Field constructor allows our scripts to handle the necessary parts of the record.
 [2009-11-13 01:53 UTC] dbs (Dan Scott)
I can't resolve that address. Can you double-check it, please?
 [2009-11-13 02:09 UTC] frozensolid (Matt Schraeder)
 [2009-11-13 03:28 UTC] dbs (Dan Scott)
Hmm. Comparing the output of File_MARC 0.4.2 to yaz-marcdump and pymarc, even though File_MARC throws an exception for the really egregiously formatted record in question (which seems to be less about having a '30-' tag than it is about having a horrendously corrupted leader), it at least continues to soldier on and produces output for the following record. In comparison, both pymarc and yaz-marcdump terminate after hitting the egregious record in question and don't dump the output of the following record. So, I don't think File_MARC is doing such a bad job of dealing with invalid input data. Throwing an exception is probably the right thing to do.
 [2009-11-13 03:51 UTC] dbs (Dan Scott)
-Status: Feedback +Status: Closed -Assigned To: +Assigned To: dbs
Thank you for your bug report. This issue has been fixed in the latest released version of the package, which you can download at http://pear.php.net/get/ Changed the behaviour to catch the exception that was thrown by MARC_Data_Field and add it as a warning to the MARC_Record object. Added a new test for the specific case mentioned in the bug report (invalid tag name). The sample provided by the bug reporter appears to be a different case entirely and isn't handled by other standard MARC parsing tools. Pushing out release 0.4.3 as a result.
 [2009-11-13 20:19 UTC] frozensolid (Matt Schraeder)
Confirmed fix. When File_MARC hits the file with the invalid tag it's carrying on just fine and otherwise seems to be working. I must have messed up when I was putting together the test.mrc file so my apologies for that. It was throwing the error I expected it to, but I didn't check that the records after that were formatted properly.
 [2010-03-04 03:50 UTC] frozensolid (Matt Schraeder)
I just stumbled upon a MARC file with a control field that has this same problem. Would it be possible to add the same check when making a Control Field?
 [2010-03-04 10:19 UTC] dbs (Dan Scott)
Please open a new bug and attach an example bad MARC record, so that I can target the bug against the correct package version and also have something to test against. Otherwise, I'm just making stuff up.