2 Scope Sets for Pattern-Based Macros

Like previous models of macro expansion, our set-of-scopes expander operates on a program from the outside in. The expander detects bindings, macro uses, and references as part of the outside-to-inside traversal. The difference in our expander is the way that bindings and macro expansions are recorded and attached to syntax fragments during expansion.

2.1 Scope Sets

A scope corresponds to a binding context, and every identifier in a program has a set of scopes. For example, if we treat let and lambda as primitive binding forms, then in the fully expanded expression

(let ([x 1])
(lambda (y)
z))

the let form corresponds to a scope alet, and the lambda form corresponds to blam. That is, everything in the let’s body is in alet, and everything in the inner lambda’s body is in blam; the set of scopes associated with z is {alet, blam}.We might say that “the environment of z is {alet, blam},” but for macro expansion, we also use the word “environment” to refer to a mapping from bindings to variables and compile-time values. To avoid confusion, we’ll refrain from using “environment” to mean a set of scopes. (Notation: the subscripts on alet and blam are just part of the names that we use to refer to abstract scope tokens; they have no meaning beyond indicating the scope’s origin.)

In a macro-extensible language, expanding a use of a macro creates a new scope in the same way that a binding form creates a new scope. Starting with

(let ([x 1])
  (let-syntax ([m (syntax-rules ()
                    [(m) x])])
    (lambda (x)
       (m))))

the right-hand side of the m binding has the scope set {alet}, while the final m has scope set {alet, bls, clam} corresponding to the let, let-syntax, and lambda forms. We can write the scope sets next to each x and m at the point where macro expansion reaches the (m) form:

(let ([x{alet} 1])
  (let-syntax ([m{alet, bls} (syntax-rules ()
                            [(m) #'x{alet}])])
    (lambda (x{alet, bls, clam})
      (m{alet, bls, clam}))))

The expansion of (m) produces x with the scope set {alet, dintro}, where dintro is a new scope for identifiers that are introduced by the macro’s expansion:

(let ([x{alet} 1])
  (let-syntax ([m{alet, bls} (syntax-rules ()
                            [(m) #'x{alet}])])
    (lambda (x{alet, bls, clam})
      x{alet, dintro})))

The absence of clam on the final x explains why it doesn’t refer to the inner binding of x. At the same time, if a different m places a macro-introduced x in a binding position around an x from a macro use (m x), the x from the use is not macro-introduced and doesn’t have the scope dintro, so it wouldn’t refer to the macro-introduced binding.

Lexical scoping corresponds to sets that are constrained to a particular shape: For any given set, there’s a single scope s that implies all the others (i.e., the ones around s in the program). As a result, s by itself is enough information to identify a binding for a given reference. We normally describe lexical scope in terms of the closest such s for some notion of “closest.” Given scope sets instead of individual scopes, we can define “closest” as the largest relevant set.

More generally, we can define binding based on subsets: A reference’s binding is found as one whose set of scopes is the largest subset of the reference’s own scopes (in addition to having the same symbolic name). The advantage of using sets of scopes is that macro expansion creates scope sets that overlap in more general ways; there’s not always a s that implies all the others. Absent a determining s, we can’t identify a binding by a single scope, but we can identify it by a set of scopes.

If arbitrary sets of scopes are possible, then two different bindings might have overlapping scopes, neither might be a subset of the other, and both might be subsets of a particular reference’s scope set. In that case, the reference is ambiguous. Creating an ambiguous reference with only pattern-based macros is possible, but it requires a definition context that supports mingled macro definitions and uses; we provide an example in Ambiguous References.

2.2 Bindings

When macro expansion encounters a primitive binding form, it

creates a new scope;
adds the scope to every identifier in binding position, as well as to the region where the bindings apply; and
extends a global table that maps a ⟨symbol, scope set⟩ pair to a representation of a binding.

In a simplified language where bindings are local, an identifier with its scope set could be its own representation of a binding. In a more complete language, bindings can also refer to module imports, and rename transformers (see Rename Transformers) can map combinations to the same binding. We therefore represent a local binding with a unique, opaque value (e.g., a gensym).

For example,

(let ([x 1])
  (let-syntax ([m (syntax-rules ()
                    [(m) x])])
    (lambda (x)
      (m))))

more precisely expands after several steps to

(let ([x{alet} 1])
  (let-syntax ([m{alet, bls} (syntax-rules ()
                            [(m) #'x{alet}])])
    (lambda (x{alet, bls, clam})
       x{alet, dintro})))

x{alet}	→	x4
m{alet, bls}	→	m8
x{alet, bls, clam}	→	x16

where the compile-time environment along the way (not shown) maps x4 to a variable, m8 to a macro, and x16 to another variable. The reference x{alet, dintro} has the binding x4, because x{alet} is the mapping for x in the binding table that has the largest subset of {alet, dintro}.

The distinction between the binding table and the compile-time environment is important for a purely “syntactic” view of binding, where a term can be expanded, manipulated, transferred to a new context, and then expanded again. Some approaches to macros, such as syntactic closures (Bawden and Rees 1988) and explicit renaming (Clinger 1991), tangle the binding and environment facets of expansion so that terms cannot be manipulated with the same flexibility.

The binding table can grow forever, but when a particular scope becomes unreachable (i.e., when no reachable syntax object includes a reference to the scope), then any mapping that includes the scope becomes unreachable. This weak mapping can be approximated by attaching the mapping to the scope, instead of using an actual global table. Any scope in a scope set can house the binding, since the binding can only be referenced using all of the scopes in the set. Attaching to the most recently allocated scope is a good heuristic, because the most recent scope is likely to be maximally distinguishing and have the shortest lifetime.

When a syntax object is serialized, the serialization must pull along the fragment of the binding table that is relevant for the syntax object’s scopes. Again, that extraction happens more or less automatically when the binding-table entries are implemented by attaching them to scopes, but an explicit “garbage collection” of the scopes is useful in practice to make serialized syntax more compact. Deserializing a syntax object creates fresh representations for every serialized scope, but preserving sharing ensures that binding relationships are preserved among identifiers in the syntax object—or more generally, among syntax objects within a module.

2.3 Recursive Macros and Use-Site Scopes

So far, our characterization of macro-invocation scopes works only for non-recursive macro definitions. To handle recursive macro definitions, in addition to a fresh scope to distinguish forms that are introduced by a macro, a fresh scope is needed to distinguish forms that are present at the macro use site.

Consider the following letrec-syntax expression, whose meaning depends on whether a use-site identifier captures a macro-introduced identifier:

(letrec-syntax ([identity (syntax-rules ()
                            [(_ misc-id)
                             (lambda (x)
                               (let ([misc-id 'other])
                                 x))])])
   (identity x))

Assuming that the letrec-syntax form creates a scope als, the scope must be added to both the right-hand side and body of the letrec-syntax form to create a recursive binding:

(letrec-syntax ([identity (syntax-rules ()
                            [(_ misc-id)
                             (lambda (x{als})
                               (let ([misc-id 'other])
                                 x{als}))])])
   (identity x{als}))

If we create a scope only for introduced forms in a macro expansion, then expanding (identity x{als}) creates the scope set bintro and produces

(lambda (x{als, bintro})
(let ([x{als} 'other])
x{als, bintro}))

where bintro is added to each of the two introduced xs. The lambda introduces a new scope clam, and let introduces dlet, producing

(lambda (x{als, bintro, clam})
(let ([x{als, clam, dlet} 'other])
x{als, bintro, clam, dlet}))

At this point, the binding of the innermost x is ambiguous: {als, bintro, clam, dlet} is a superset of both {als, bintro, clam} and {als, clam, dlet}, neither of which is a subset of the other. Instead, we want x to refer to the lambda binding.

The problem is that our scope rules, so far, do not ensure that a macro definition and a macro use are separated by at least one scope. This problem with letrec-syntax can be fixed by adding an additional scope to a letrec-syntax body. In that case, with ebdy added to the body in the identity example,

(letrec-syntax ([identity (syntax-rules ()
                            [(_ misc-id)
                             (lambda (x{als})
                               (let ([misc-id 'other])
                                 x{als}))])])
   (identity x{als, ebdy}))

leads eventually to

(lambda (x{als, bintro, clam})
(let ([x{als, ebdy, clam, dlet} 'other])
x{als, bintro, clam, dlet}))

There’s no ambiguity, and the final x refers to the lambda binding as intended.

This extra scope for the body is not the approach taken in Racket’s expander, for reasons that are explained in the next section, Use-Site Scopes and Macro-Generated Definitions.

Adding a scope for each individual macro-use site corrects this problem in a similar way. If we call the use-site scope euse, then we start with

(identity x{als, euse})

which expands to

(lambda (x{als, bintro})
(let ([x{als, euse} 'other])
x{als, bintro}))

which ends up as

(lambda (x{als, bintro, clam})
(let ([x{als, clam, dlet, euse} 'other])
x{als, bintro, clam, dlet}))

Again, there’s no ambiguity. A use-site scope acts as the symmetric counterpart to the macro-introduction scope to ensure that a macro’s definition and use sites are distinguished by scopes in both directions.

2.4 Use-Site Scopes and Macro-Generated Definitions

In a binding form such as let or letrec, bindings are clearly distinguished from uses by their positions within the syntactic form. In addition to these forms, Racket (like Scheme) supports definition contexts that mingle binding forms and expressions. For example, the body of a module contains a mixture of definitions and expressions, all in a single recursive scope. Definitions can include macro definitions, expressions can include uses of those same macros, and macro uses can even expand to further definitions.

Use-site scopes distinguish macro definitions and uses within a definition context in the same way that they work for letrec-syntax. The body-scope alternative for letrec-syntax (as described in the previous section, Recursive Macros and Use-Site Scopes) does not work for definition contexts, since the macro definitions and uses are not initially separated; use-site scopes allow definition and use sites to be distinguished as they are discovered. A distinct use-site scope is needed for each macro use, since a macro use may introduce a new macro definition that is itself used in the same definition context; a single use-site scope for the whole definition context would not distinguish the macro-generated macro’s definition from its uses, while a per-use scope supports arbitrary chains of uses and definitions.

When a macro use expands to a new definition, then any identifier in the macro use may turn out to be a binding identifier in the macro expansion. For example, consider a define-identity macro that is intended to expand to a definition of a given identifier as the identity function:

(define-syntax-rule (define-identity id)
(define id (lambda (x) x)))

(define-identity f)
(f 5)

If the expansion of (define-identity f) adds a scope euse to the use-site f, the resulting definition of f{euse} does not bind the f in (f 5).

Another way to describe the issue is that a definition context must treat use-site and introduced identifiers asymmetrically as binding identifiers. In

(define-syntax-rule (define-five misc-id)
  (begin
   (define misc-id 5)
   x))

(define-five x)

the introduced x should refer to an x that is defined in the enclosing scope, which turns out to be the same x that appears at the use site of define-five. But in

(define-syntax-rule (define-other-five misc-id)
  (begin
   (define x 5)
   misc-id))

(define-other-five x)

the x from the use site should not refer to the macro-introduced binding x.

To support macros that expand to definitions of given identifiers, a definition context must keep track of scopes created for macro uses, and it must remove those scopes from identifiers that end up in binding positions (effectively moving them back to the definition scope). In the define-identity and define-five examples, the use-site scope is removed from the binding identifiers x and f, so they are treated the same as if their definitions appeared directly in the source.

This special treatment of use-site scopes adds complexity to the macro expander, but it is of the kind of complexity that mutually recursive binding contexts create routinely (e.g., along the same lines as ensuring that variables are defined before they are referenced). Definition contexts in Racket have proven convenient and expressive enough to be worth the extra measure of complexity.

As an optimization, a use-site scope is needed only when a macro is expanded in a definition context where the macro’s definition is in the same definition context. (Otherwise, the use is in a more nested scope, and thus already distinguished from introduced identifiers.) That combination happens regularly in a module body, but it is much less common than macro uses generally. This “optimization” is visible if the macro system provides certain operations, such as forcing the expansion of a sub-form, but it is invisible to pattern-based macros.

2.5 Ambiguous References

The combination of use-site scopes to solve local-binding problems (as in Recursive Macros and Use-Site Scopes) versus reverting use-site scopes to accommodate macro-generated definitions (as in Use-Site Scopes and Macro-Generated Definitions) creates the possibility of generating an identifier whose binding is ambiguous.

The following example defines m through a def-m macro, and it uses m in the same definition context:

(define-syntax-rule (def-m m given-x)
  (begin
    (define x 1)
    (define-syntax-rule (m)
      (begin
        (define given-x 2)
        x))))

(def-m m x)
(m)

The expansion, after splicing begins, ends with an ambiguous reference:

(define-syntax-rule (def-m{adef} ....) ....)
(define x{adef, bintro1} 1)
(define-syntax-rule (m{adef})
  (begin
    (define x{adef, buse1} 2)
    x{adef, bintro1}))
(define x{adef, cintro2} 2)
x{adef, bintro1, cintro2}

The scope adef corresponds to the definition context, bintro1 and buse1 correspond to the expansion of def-m, cintro2 corresponds to the expansion of m. The final reference to x is ambiguous, because it was introduced through both macro layers.

Unlike the ambiguity that is resolved by use-site scopes, this ambiguity arguably reflects an inherent ambiguity in the macro. Absent the (define x 1) definition generated by def-m, the final x reference should refer to the definition generated from (define given-x 2); similarly, absent the definition generated from (define given-x 2), the final x should refer to the one generated from (define x 1). Neither of those definitions is more specific than the other, since they are generated by different macro invocations, so our new expander rejects the reference as ambiguous.

Our previous model of macro expansion to cover definition contexts (Flatt et al. 2012) would treat the final x always as a reference to the definition generated from (define x 1) and never to the definition generated from (define given-x 2). More compactly, that model rejects as having an unbound identifier the example

(define-syntax-rule (def-m m orig-x)
  (define-syntax-rule (m)
    (begin
      (define orig-x 2)
      x)))
(def-m m x)
(m)

The set-of-scopes expander accepts this example. The set-of-scopes expander is arguably more consistent, considering that both expanders allow the reference with other combinations of introductions for the definition and reference:

(define-syntax-rule (def-m m)
  (define-syntax-rule (m)
    x))
(define x 2)
(def-m m)
(m)

(define-syntax-rule (def-m m orig-x)
  (begin
    (define orig-x 2)
    (define-syntax-rule (m)
      x)))
(def-m m x)
(m)

(define-syntax-rule (def-m m)
  (define-syntax-rule (m orig-x)
    (begin
      (define orig-x 2)
      x)))
(def-m m)
(m x)

So far, we have not encountered a practical example that exposes the difference between the expanders’ treatment of pattern-based macros in definition contexts.

← prev up next →

1	Background: Scope and Macros
2	Scope Sets for Pattern-Based Macros
3	Scope Sets for Procedural Macros and Modules
4	Implementation and Experience
5	Model
6	Defining Hygiene
7	Other Related Work
8	Conclusion
	Acknowledgments
	References

2.1	Scope Sets
2.2	Bindings
2.3	Recursive Macros and Use-Site Scopes
2.4	Use-Site Scopes and Macro-Generated Definitions
2.5	Ambiguous References