Package home | Report new bug | New search | Development Roadmap Status: Open | Feedback | All | Closed Since Version 0.7.0

Bug #4066 Functions are keywords
Submitted: 2005-04-05 16:48 UTC
From: epte at ruffdogs dot com Assigned: busterb
Status: Closed Package: SQL_Parser
PHP Version: 4.3.8 OS:
Roadmaps: (Not assigned)    
Subscription  


 [2005-04-05 16:48 UTC] epte at ruffdogs dot com
Description: ------------ Correct me if I'm wrong. It seems to me that function names that are not followed by '(argumentlist)' should be considered identifiers, and not keywords. Reproduce code: --------------- $sql = " CREATE TABLE schedules ( id int(11) NOT NULL auto_increment, target_id int(11) NOT NULL default '0', PRIMARY KEY (id) ) TYPE=MyISAM;"; $_SQLParser = new SQL_Parser(NULL, 'MySQL'); $parseRes = $_SQLParser->parse($sql); Actual result: -------------- Parse error: Expected identifier on line 16 minute varchar(255) NOT NULL default '', ^ found: "minute"

Comments

 [2005-04-05 16:49 UTC] epte at ruffdogs dot com
Sorry -- that was the wrong reproduction code. the correct sql statement follows: $sql = " CREATE TABLE schedules ( id int(11) NOT NULL auto_increment, target_id int(11) NOT NULL default '0', target_order int(11) NOT NULL default '0', minute varchar(255) NOT NULL default '', hour varchar(255) NOT NULL default '', dom varchar(255) NOT NULL default '', month varchar(255) NOT NULL default '', dow varchar(255) NOT NULL default '', keepdays int(11) default NULL, keepMB int(11) default NULL, simple tinyint(4) NOT NULL default '0', disabled tinyint(4) NOT NULL default '0', PRIMARY KEY (id) ) TYPE=MyISAM;";
 [2005-04-05 16:59 UTC] epte at ruffdogs dot com
*sigh*. I'm still trying to understand what the lexer should and shouldn't know about. Is it that the lexer only knows about symbols, functions included? (I know that's the way it is now. I'm wondering about what the general theory is behind tokenizers and symbol tables.) I'm trying to understand where the fix should go -- lexer or parser. It seems at first that the lexer is making a bad cast of minute to keyword. But maybe it's the parser that should check the keywords coming in to see if they're actually idents (by evaluating the lexical context). Thoughts?
 [2005-04-05 20:32 UTC] epte at ruffdogs dot com
I've confirmed for myself that the SQL actually is valid. It does run from within MySQL. No backticks were needed for the 'minute' field. I've also done some reading. Apparently, in everything I've seen, lexical analysers should only have to know about keywords generally, not specific classes of keywords. Also, I've found out that keywords need not always be treated like reserved words. 'int' and 'char' are keywords handled like identifiers in some languages. I had been thinking keyword=RESERVED_WORD. So more and more, it seems the fix should be at the syntactic level.
 [2005-04-05 20:41 UTC] epte at ruffdogs dot com
Here's the patch allowing function names and reserved words as identifiers in this context: Index: Parser.php =================================================================== --- Parser.php (revision 2681) +++ Parser.php (working copy) @@ -492,7 +492,9 @@ while (1) { // parse field identifier $this->getTok(); - if ($this->token == 'ident') { + // In this context, field names can be reserved words or function names + if ($this->token == 'ident' || $this->isFunc() || $this->isReserved() + ) { $name = $this->lexer->tokText; } elseif ($this->token == ')') { return $fields;
 [2005-04-06 04:38 UTC] busterb
I see the Lexer as making suggestions about what it sees as a convenience to the Parser, not as a hard rule about what to do with a particular token. Grammar is not always consistent vis a vis token types.
 [2005-04-06 05:05 UTC] busterb
Applied patch.