[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Parser tools: How to make the lexer context sensitive?

To: <plt-scheme@fast.cs.utah.edu>
Subject: Parser tools: How to make the lexer context sensitive?
From: Jens Axel Søgaard <js@vgt-gym.dk>
Date: Wed, 27 Mar 2002 02:30:46 +0100
Sender: owner-plt-scheme@fast.cs.utah.edu
Thread-Index: AcHVLo2B96pRdNAxR9yM2x48mgZM+g==
Thread-Topic: Parser tools: How to make the lexer context sensitive?

I have spent some time translating the lex and yacc specifications

    http://www.lysator.liu.se/c/ANSI-C-grammar-l.html#check-type

for the C programming language into something suitable for the
parser tools collection provided by plt-scheme.

I am almost done (with respect to my goal) but have yet to
overcome one little obstacle. 

In the specification for the lexer is the following rule

   {L}({L}|{D})*		{ count(); return(check_type()); }

which says that a letter possibly followed by letters or digits
is a IDENTIFIER or a TYPENAME ; since check_type() is defined as: 

int check_type()
{ /*
  * pseudo code --- this is what it should check
  * 
  *	if (yytext == type_name)
  *		return(TYPE_NAME);
  *
  *	return(IDENTIFIER);
  */
}

Thus it should be possible from the lexer to determine whether
the parser is currently parsing a 'type_name' or not.
The lexer should return a TYPENAME during parsing of the 
non-terminal 'type_name' and otherwise a IDENTIFIER. If I recall
correctly from my compiler course (5 years ago?) one solution in 
lex/yacc was to set a global flag, when entering the parsing of type_name,
and reseting it when leaving. 

How do I do this using the parser tools?

The rule I use in parser is:

  (type_name
    [(specifier_qualifier_list)                     (list 'type_name $1)]
    [(specifier_qualifier_list abstract_declarator) (list 'type_name $1 $2)])

The non working rule in the lexer is

   [(@ (L) (* (: (L) (D))))  (token-IDENTIFIER (get-lexeme))]

which should be

   [(@ (L) (* (: (L) (D))))  (if <parsing a type_name?>
                                 (token-TYPENAME (get-lexeme))
                                 (token-IDENTIFIER (get-lexeme)))]

What shall I write in place of <parsing a type_name?> ?



For the curious: I am implementing the parser in order to
make writing extensions for MzScheme easier. Some time ago
I wrote a tiny extension in order to use some functions
from ImageMagick. Due to number of constants and functions
in ImageMagick I figured it would be worth while to automate
some of the extension writing. So far I can succesfully
convert typedefed enumerations into association lists
(using a little evaluator of the constant expressions).

I am now working on extracting the function definitions.
In most cases, if two functions in the API have the same type,
I can reuse corresping functions in the extension.

-- 
Jens Axel Søgaard

There is no substitute for good manners, except, perhaps, fast reflexes.
  - fortune

Follow-Ups:
- Re: Parser tools: How to make the lexer context sensitive?
  - From: Matthew Flatt <mflatt@cs.utah.edu>

Prev by Date: RE: scsh in PLT Scheme?
Next by Date: RE: scsh in PLT Scheme?
Prev by thread: Re: Macro in a servlet
Next by thread: Re: Parser tools: How to make the lexer context sensitive?
Index(es):
- Date
- Thread