Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 0.8.6

Bug #14209 Default portability options not suitable for upgrading
Submitted: 2008-06-23 11:08 UTC
From: hschletz Assigned:
Status: Open Package: MDB2_Schema (version 0.8.2)
PHP Version: Irrelevant OS:
Roadmaps: (Not assigned)    
Subscription  


 [2008-06-23 11:08 UTC] hschletz (Holger Schletz)
Description: ------------ I noticed some inconsistent behavior of updateDatabase() where the new schema had empty or missing <default> attributes. Depending on the previous state, the result was either a single space, an empty string or NULL (tested with PostgreSQL). Running a second pass of updateDatabase() with the same schema, some of the values changed again until they finally settled to a stable state. This turned out to be a sideeffect of the MDB2_PORTABILITY_EMPTY_TO_NULL setting. Unsetting this flag gave the expected result (empty string for <default></default> or NULL when no default attribute was specified). This portability option is really inadequate here. updateDatabase() should temporarily disable it. Although i have not tested this with Oracle (according to the MDB2 documentation Oracle's behavior is the reason for this option), this should work here too.

Comments

 [2008-07-26 13:44 UTC] doconnor (Daniel O'Connor)
Holger, I don't suppose you cna pin down this behaviour with a simple test case? It's a bit hard for someone inexperienced to get up to speed with just the details provided...
 [2008-08-10 11:26 UTC] hschletz (Holger Schletz)
There are a lot of possible combinations which are difficult to explore completely. To make things worse, each DBMS may behave differently. The following was tested with PostgreSQL 8.1 and MDB2_Schema's 'force_defaults' option set to 'false'. For a basic test case, create a table with the following XML field definition: <field> <name>default_empty_string</name> <type>text</type> <default></default> <notnull>true</notnull> </field> We'd expect the following result: Column | Type | Modifiers ----------------------+-------------------------+---------------------------------------- default_empty_string | character varying(4096) | not null default ''::character varying If we apply the same definition again on the existing table, the column is expected not to be altered. With MDB2_PORTABILITY_EMPTY_TO_NULL disabled, the result is exactly like this. Note that the <notnull> property is required for <default></default>. Otherwise the default value would be NULL instead of '' (possibly another bug?). It is not required for nonempty default values. With MDB2_PORTABILITY_EMPTY_TO_NULL enabled (the default setting), the table is created like this: Column | Type | Modifiers ----------------------+-------------------------+----------------------------------------- default_empty_string | character varying(4096) | not null default ' '::character varying If we apply the same definition again on the existing table we get: Column | Type | Modifiers ----------------------+-------------------------+----------------------------------------- default_empty_string | character varying(4096) | not null This is a stable state which does not change anymore on further upgrades. Unfortunately, it's the opposite of what we wanted. MDB2_PORTABILITY_EMPTY_TO_NULL must be used with care. If the application is not explicitly designed for its behavior (like MDB2_Schema, apparently), it may not behave as expected. Furthermore, the logic applies only to the application, not to the database itself. While this option operates on retrieved data, it has no effect on SQL queries and the way they are processed within the database. But notnull constraints and default values are part of the database's internal logic - if the DBMS distincts empty strings and NULL values, we cannot force it to treat them as equivalent. Since clear rules are vital for data integrity, the ambiguity resulting from the MDB2_PORTABILITY_EMPTY_TO_NULL setting makes it unsuitable for managing the database schema. So what could we do about it? Option 1: For best portability, avoid empty strings as default values. This should be clearly pointed out in the documentation. This is possible if we design a database and application from scratch. When porting an existing database or application to MDB2, things might get more complicated. Option 2: Disallow empty strings completely as default values. If we really need them (with all the caveats), provide an option to re-enable them. Option 3: Disallow the presence of MDB2_PORTABILITY_EMPTY_TO_NULL when creating the MDB2_Schema object. Option 4: Review the MDB2_Schema code for handling of default values and work around side-effects of the MDB2_PORTABILITY_EMPTY_TO_NULL setting. Option 5: Temporarily disable the MDB2_PORTABILITY_EMPTY_TO_NULL setting within the relevant functions and restore the old value on function exit. This should work with all DBMS that distinct empty strings and NULL values. How this would behave with Oracle (the guilty one for all this trouble, according to MDB2 documentation) still has to be investigated. Option 1 is the preferred way to deal with the problem. Option 2 could enforce these measures, but this could break existing applications and schemas (quick workaround possible though, by setting an option.) This also gives us the opportunity to think about what we're doing. Again, a big warning in the documentation could be helpful. Option 3 would save us from bad surprises if we choose to enable empty strings. Again, minor changes to existing code could be necessary if a Database object is passed to the constructor (instead of having one created automatically, in which case the constructor should take care of disabling the setting). Options 4 and 5 are quirks that won't necessarily force changes to existing applications, although we might experience different, possibly erratic behavior if the code relies on the current implementation. Of course it's possible to get everything right without any changes to the MDB2_Schema code, but it took me hours to find out. The suggested changes could help pushing the developer on the right track.
 [2008-10-25 01:09 UTC] ifeghali (Igor Feghali)
Just to note, option #1 would make impossible a fix for Bug #14650.
 [2008-11-23 11:56 UTC] hschletz (Holger Schletz)
Option #1 is meant as a design decision to the database schema, not as a suggested change to the MDB2_Schema code. If someone really needs empty default values (there may be no choice when migrating an existing schema), they could still be used, with all the caveats. Two things should be considered: 1. If you put a NOT NULL constraint on a column and then make up an empty default value just to get arround the constraint, you should ask yourself whether the constraint makes sense at all. An empty string is effectively a pseudo-NULL in most cases (logically, not technically), so we could as well stick to NULL. That's what the NULL contruct is meant for. I don't say there is no use for this at all, but every case I have seen that far is actually poor design and doesn't make any sense at a closer look. 2. Your schema will not be portable. For text columns, the mentioned Oracle issue may give unexpected results. For timestamp columns, each DBMS has a differrent idea of valid input. PostgreSQL neither allows empty strings nor '0000-00-00', as allowed by MySQL. Again, NULL is your friend.
 [2009-02-22 20:21 UTC] ifeghali (Igor Feghali)
-Status: Open +Status: Feedback
Hello Holger, With portability = entirely off we don't face this problem ? I am inclined to sticky with "Temporarily disable the MDB2_PORTABILITY_EMPTY_TO_NULL setting within the relevant functions and restore the old value on function exit.". Maybe this would be necessary only in updateDatabase() ? Or do you have in mind any other place ?
 [2009-02-22 20:57 UTC] hschletz (Holger Schletz)
This is what I do for creating/upgrading database schemas: $options = array ( 'quote_identifier' => true, 'force_defaults' => false, 'portability' => MDB2_PORTABILITY_ALL ^ MDB2_PORTABILITY_EMPTY_TO_NULL ); $schema =& MDB2_Schema::factory($dsn, $options); ... This has been extensively tested and does exactly what I expect. Note that 'force_defaults' has to be explicitly set to false to exactly reflect the given input schema. It's been a long time since I hade a closer look at the code, but yes, I think updateDatabase() should be the right place for this. Just ensure that the old value is restored in every case, even if an error occurs within one of the functions called direclty or indirectly from within updateDatabase(). This would still leave the potential portability issue open. Maybe the validation code could be modified to issue a friendly warning whenever an empty string is given as a default value?
 [2012-01-27 19:26 UTC] doconnor (Daniel O'Connor)
-Status: Feedback +Status: Open