2 Scope Sets for Pattern-Based Macros
Like previous models of macro expansion, our set-of-scopes expander operates on a program from the outside in. The expander detects bindings, macro uses, and references as part of the outside-to-inside traversal. The difference in our expander is the way that bindings and macro expansions are recorded and attached to syntax fragments during expansion.
2.1 Scope Sets
(let ([x 1]) (let-syntax ([m (syntax-rules () [(m) x])]) (lambda (x) (m))))
(let ([x{alet} 1]) (let-syntax ([m{alet, bls} (syntax-rules () [(m) #'x{alet}])]) (lambda (x{alet, bls, clam}) (m{alet, bls, clam}))))
(let ([x{alet} 1]) (let-syntax ([m{alet, bls} (syntax-rules () [(m) #'x{alet}])]) (lambda (x{alet, bls, clam}) x{alet, dintro})))
Lexical scoping corresponds to sets that are constrained to a particular shape: For any given set, there’s a single scope s that implies all the others (i.e., the ones around s in the program). As a result, s by itself is enough information to identify a binding for a given reference. We normally describe lexical scope in terms of the closest such s for some notion of “closest.” Given scope sets instead of individual scopes, we can define “closest” as the largest relevant set.
More generally, we can define binding based on subsets: A reference’s binding is found as one whose set of scopes is the largest subset of the reference’s own scopes (in addition to having the same symbolic name). The advantage of using sets of scopes is that macro expansion creates scope sets that overlap in more general ways; there’s not always a s that implies all the others. Absent a determining s, we can’t identify a binding by a single scope, but we can identify it by a set of scopes.
If arbitrary sets of scopes are possible, then two different bindings might have overlapping scopes, neither might be a subset of the other, and both might be subsets of a particular reference’s scope set. In that case, the reference is ambiguous. Creating an ambiguous reference with only pattern-based macros is possible, but it requires a definition context that supports mingled macro definitions and uses; we provide an example in Ambiguous References.
2.2 Bindings
creates a new scope;
adds the scope to every identifier in binding position, as well as to the region where the bindings apply; and
extends a global table that maps a ⟨symbol, scope set⟩ pair to a representation of a binding.
(let ([x 1]) (let-syntax ([m (syntax-rules () [(m) x])]) (lambda (x) (m))))
|
|
|
The distinction between the binding table and the compile-time environment is important for a purely “syntactic” view of binding, where a term can be expanded, manipulated, transferred to a new context, and then expanded again. Some approaches to macros, such as syntactic closures (Bawden and Rees 1988) and explicit renaming (Clinger 1991), tangle the binding and environment facets of expansion so that terms cannot be manipulated with the same flexibility.
The binding table can grow forever, but when a particular scope becomes unreachable (i.e., when no reachable syntax object includes a reference to the scope), then any mapping that includes the scope becomes unreachable. This weak mapping can be approximated by attaching the mapping to the scope, instead of using an actual global table. Any scope in a scope set can house the binding, since the binding can only be referenced using all of the scopes in the set. Attaching to the most recently allocated scope is a good heuristic, because the most recent scope is likely to be maximally distinguishing and have the shortest lifetime.
When a syntax object is serialized, the serialization must pull along
the fragment of the binding table that is relevant for the syntax
object’s scopes. Again, that extraction happens more or less
automatically when the binding-table entries are implemented by
attaching them to scopes, but an explicit “garbage collection” of
the scopes is useful in practice to make serialized syntax more compact.
Deserializing a syntax object creates fresh
representations for every serialized scope, but preserving sharing
ensures that binding relationships are preserved among identifiers in
the syntax object—
2.3 Recursive Macros and Use-Site Scopes
So far, our characterization of macro-invocation scopes works only for non-recursive macro definitions. To handle recursive macro definitions, in addition to a fresh scope to distinguish forms that are introduced by a macro, a fresh scope is needed to distinguish forms that are present at the macro use site.
(letrec-syntax ([identity (syntax-rules () [(_ misc-id) (lambda (x) (let ([misc-id 'other]) x))])]) (identity x))
(letrec-syntax ([identity (syntax-rules () [(_ misc-id) (lambda (x{als}) (let ([misc-id 'other]) x{als}))])]) (identity x{als}))
(letrec-syntax ([identity (syntax-rules () [(_ misc-id) (lambda (x{als}) (let ([misc-id 'other]) x{als}))])]) (identity x{als, ebdy}))
(lambda (x{als, bintro, clam}) (let ([x{als, ebdy, clam, dlet} 'other]) x{als, bintro, clam, dlet}))
This extra scope for the body is not the approach taken in Racket’s expander, for reasons that are explained in the next section, Use-Site Scopes and Macro-Generated Definitions.
(identity x{als, euse})
(lambda (x{als, bintro, clam}) (let ([x{als, clam, dlet, euse} 'other]) x{als, bintro, clam, dlet}))
2.4 Use-Site Scopes and Macro-Generated Definitions
In a binding form such as let or letrec, bindings are clearly distinguished from uses by their positions within the syntactic form. In addition to these forms, Racket (like Scheme) supports definition contexts that mingle binding forms and expressions. For example, the body of a module contains a mixture of definitions and expressions, all in a single recursive scope. Definitions can include macro definitions, expressions can include uses of those same macros, and macro uses can even expand to further definitions.
Use-site scopes distinguish macro definitions and uses within a definition context in the same way that they work for letrec-syntax. The body-scope alternative for letrec-syntax (as described in the previous section, Recursive Macros and Use-Site Scopes) does not work for definition contexts, since the macro definitions and uses are not initially separated; use-site scopes allow definition and use sites to be distinguished as they are discovered. A distinct use-site scope is needed for each macro use, since a macro use may introduce a new macro definition that is itself used in the same definition context; a single use-site scope for the whole definition context would not distinguish the macro-generated macro’s definition from its uses, while a per-use scope supports arbitrary chains of uses and definitions.
(define-syntax-rule (define-identity id) (define id (lambda (x) x))) (define-identity f) (f 5)
(define-syntax-rule (define-five misc-id) (begin (define misc-id 5) x)) (define-five x)
(define-syntax-rule (define-other-five misc-id) (begin (define x 5) misc-id)) (define-other-five x)
To support macros that expand to definitions of given identifiers, a definition context must keep track of scopes created for macro uses, and it must remove those scopes from identifiers that end up in binding positions (effectively moving them back to the definition scope). In the define-identity and define-five examples, the use-site scope is removed from the binding identifiers x and f, so they are treated the same as if their definitions appeared directly in the source.
This special treatment of use-site scopes adds complexity to the macro expander, but it is of the kind of complexity that mutually recursive binding contexts create routinely (e.g., along the same lines as ensuring that variables are defined before they are referenced). Definition contexts in Racket have proven convenient and expressive enough to be worth the extra measure of complexity.
As an optimization, a use-site scope is needed only when a macro is expanded in a definition context where the macro’s definition is in the same definition context. (Otherwise, the use is in a more nested scope, and thus already distinguished from introduced identifiers.) That combination happens regularly in a module body, but it is much less common than macro uses generally. This “optimization” is visible if the macro system provides certain operations, such as forcing the expansion of a sub-form, but it is invisible to pattern-based macros.
2.5 Ambiguous References
The combination of use-site scopes to solve local-binding problems (as in Recursive Macros and Use-Site Scopes) versus reverting use-site scopes to accommodate macro-generated definitions (as in Use-Site Scopes and Macro-Generated Definitions) creates the possibility of generating an identifier whose binding is ambiguous.
(define-syntax-rule (def-m m given-x) (begin (define x 1) (define-syntax-rule (m) (begin (define given-x 2) x)))) (def-m m x) (m)
(define-syntax-rule (def-m{adef} ....) ....) (define x{adef, bintro1} 1) (define-syntax-rule (m{adef}) (begin (define x{adef, buse1} 2) x{adef, bintro1})) (define x{adef, cintro2} 2) x{adef, bintro1, cintro2}
Unlike the ambiguity that is resolved by use-site scopes, this ambiguity arguably reflects an inherent ambiguity in the macro. Absent the (define x 1) definition generated by def-m, the final x reference should refer to the definition generated from (define given-x 2); similarly, absent the definition generated from (define given-x 2), the final x should refer to the one generated from (define x 1). Neither of those definitions is more specific than the other, since they are generated by different macro invocations, so our new expander rejects the reference as ambiguous.
(define-syntax-rule (def-m m orig-x) (define-syntax-rule (m) (begin (define orig-x 2) x))) (def-m m x) (m)
(define-syntax-rule (def-m m) (define-syntax-rule (m) x)) (define x 2) (def-m m) (m) (define-syntax-rule (def-m m orig-x) (begin (define orig-x 2) (define-syntax-rule (m) x))) (def-m m x) (m) (define-syntax-rule (def-m m) (define-syntax-rule (m orig-x) (begin (define orig-x 2) x))) (def-m m) (m x)