Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 1.4.1

Bug #21219 toXML method modifies the MARC leader
Submitted: 2017-06-15 19:01 UTC
From: aroussos Assigned:
Status: Open Package: File_MARC (version 1.1.5)
PHP Version: 5.6.30 OS: Debian (jessie)
Roadmaps: (Not assigned)    
Subscription  


 [2017-06-15 19:01 UTC] aroussos (Andreas Roussos)
Description: ------------ We recently discovered that a number of authority records in our Koha ILS installation have the following glitch: an extra space character (" ") appears at the end of the MARC leader. I wrote a simple script that uses PHP's rtrim() to remove the trailing whitespace, however the toXML() call at the end of my script appears to modify the leader by inserting the string "na" at positions 05 and 06. Is this intentional? Test script: --------------- As per the bug submission guidelines, my test script is longer than 20 lines of code, so here's a link to it: https://pastebin.com/GbqVWdcb Expected result: ---------------- <?xml version="1.0" encoding="UTF-8"?> <collection xmlns="http://www.loc.gov/MARC21/slim"> <record> <leader>00143 2200073 4500</leader> <controlfield tag="001">260</controlfield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">20120402afrey50 ba0</subfield> </datafield> <datafield tag="152" ind1=" " ind2=" "> <subfield code="b">PERSO_NAME</subfield> </datafield> <datafield tag="200" ind1=" " ind2=" "> <subfield code="a">Severin</subfield> <subfield code="b">Georgii</subfield> </datafield> </record> </collection> Actual result: -------------- <?xml version="1.0" encoding="UTF-8"?> <collection xmlns="http://www.loc.gov/MARC21/slim"> <record> <leader>00143na 2200073 4500</leader> <controlfield tag="001">260</controlfield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="a">20120402afrey50 ba0</subfield> </datafield> <datafield tag="152" ind1=" " ind2=" "> <subfield code="b">PERSO_NAME</subfield> </datafield> <datafield tag="200" ind1=" " ind2=" "> <subfield code="a">Severin</subfield> <subfield code="b">Georgii</subfield> </datafield> </record> </collection>

Comments

 [2017-06-16 13:45 UTC] dbs (Dan Scott)
See the comments at https://github.com/pear/File_MARC/blob/master/File/MARC/Record.php#L629: // MARCXML schema has some strict requirements // We'll set reasonable defaults to avoid invalid MARCXML So the current behaviour treats a blank 05 / 06 as meaning "set reasonable defaults!" and set 05=n ("new"), 06=a ("bibliographic - language material"). Checking http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd to see if that still holds up, we find the regex for a valid leader to be: "[\d ]{5}[\dA-Za-z ]{1}[\dA-Za-z]{1}[\dA-Za-z ]{3}(2| )(2| )[\d ]{5}[\dA-Za-z ]{3}(4500| )" So, leader[05] should be allowed to be blank, but leader[06] *has* to be something. Given the situation, I think the current result of setting "na" in the absence of any input is acceptable--but we should raise some warnings rather than just silently setting those values, as in your case leader[06] being "a" for authority records is just wrong. It would be good to document this too, so that as part of your processing step you could explicitly set leader[06] to "z" thus preventing File::MARC from falling back to a default value.