3 Scope Sets for Procedural Macros and Modules

Although our set-of-scopes expander resolves bindings differently than in previous models, it still works by attaching information to identifiers, and so it can provide a smooth path from pattern-matching macros to procedural macros in the same way as syntax-case (Dybvig et al. 1993). Specifically, (syntax form) quotes the S-expression form while preserving its scope-set information, so that form can be used to construct the result of a macro.

More precisely, a primitive (quote-syntax form) quotes form with its scope sets in Racket. The derived (syntax form) detects uses of pattern variables and replaces them with their matches while quoting any non-pattern content in form with quote-syntax. A (syntax form) can be abbreviated #'form, and when form includes no pattern variables, #'form is equivalent to (quote-syntax form). The quaisquoting variant #`form (which uses a backquote instead of a regular quote) allows escapes within form as #,expr, which inserts the result of evaluating expr in place of the escape.

The result of a quote-syntax or syntax form is a syntax object. When a syntax object’s S-expression component is just a symbol, then the syntax object is an identifier.

3.1 Identifier Comparisons with Scope Sets

Various compile-time functions work on syntax objects and identifiers. Two of the most commonly used functions are free-identifier=? and bound-identifier=?, each of which takes two identifiers. The free-identifier=? function is used to recognize a reference to a known binding, such as recognizing a use of else in a conditional. The bound-identifier=? function is used to check whether two identifiers would conflict as bindings in the same context, such as when a macro that expands to a binding form checks that identifiers in the macro use are suitably distinct.

These two functions are straightforward to implement with scope sets. A free-identifier=? comparison on identifiers checks whether the two identifiers have the same binding by consulting the global binding table. A bound-identifier=? comparison checks that two identifiers have exactly the same scope sets, independent of the binding table.

Note that (bound-identifier=? x y) does not completely answer the question “would x bind y?” A #t result answers that question in the affirmative, but x might bind y even if the result is #f. The same is true in Racket’s old macros system as well as implementations like Chez Scheme, which (like the set-of-scopes expander) print #f but produce 1 for the following example:
(let ()
  (define-syntax (m stx)
    (syntax-case stx ()
      [(_ a b)
       (begin
         (printf "~s\n" (bound-identifier=? #'a #'b))
         #'(begin
             (define a 1)
             b))]))
  (define-syntax n
    (syntax-rules ()
      [(_ id) (m id x)]))
  (n x))

3.2 Local Bindings and Syntax Quoting

The set-of-scopes approach to binding works the same as previous models for macros that are purely pattern-based, but the set-of-scopes approach makes finer distinctions among identifiers than would be expected by existing procedural Racket macros that use #' or quote-syntax. To be consistent with the way that Racket macros have been written, quote-syntax must discard some scopes.

For example, in the macro

(lambda (stx)
  (let ([id #'x])
     #`(let ([x 1])
         #,id)))

the x that takes the place of #,id should refer to the binding x in the generated let form. The x identifier that is bound to id, however, is not in the scope that is created for the compile-time let:

(lambda (stx{alam})
  (let ([id{alam, blet} #'x{alam}])
     #`(let ([x{alam, blet} 1])
         #,id{alam, blet})))

If quote-syntax (implicit in #`) preserves all scopes on an identifier, then with set-of-scopes binding, the x that replaces #,id will not refer to the x in the generated let’s binding position.

It’s tempting to think that the compile-time let should introduce a phase-specific scope that applies only for compile-time references, in which case it won’t affect x as a run-time reference. That adjustment doesn’t solve the problem in general, since a macro can generate compile-time bindings and references just as well as run-time bindings and references.

A solution is for the expansion of quote-syntax to discard certain scopes on its content. The discarded scopes are those from binding forms that enclosed the quote-syntax form up to a phase crossing or module top-level, as well as any use-site scopes recorded for macro invocations within those binding forms. In the case of a quote-syntax form within a macro binding’s right-hand side, those scopes cover all of the scopes introduced on the right-hand side of the macro binding.

The resulting macro system is different than the old Racket macro system. Experiments suggest that the vast majority of macro implementations work either way, but it’s easy to construct an example that behaves differently:

(free-identifier=? (let ([x 1]) #'x)
#'x)

In Racket’s old macro system, the result is #f. The set-of-scopes system with a scope-pruning quote-syntax produces #t, instead, because the let-generated scope is stripped away from #'x.

Note: Racket’s macro system matches Dybvig et al. (1993), where both free-identifier=? and bound-identifier=? produce #f for the above arguments, and bound-identifier=? always implies free-identifier=?. The current psyntax implementation, as used by Chez Scheme and other implementations and as consistent with Adams (2015), produces #f and #t for free-identifier=? and bound-identifier=?, respectively; as the example illustrates, bound-identifier=? does not imply free-identifier=?. The set-of-scopes system produces #t and #t for free-identifier=? and bound-identifier=?, respectively, and bound-identifier=? always implies free-identifier=?.

If quote-syntax did not prune scopes, then not only would the result above be #f, but bound-identifier=? would produce #f for both (let ([x 1]) #'x) and (let ([y 1]) #'x). Those results reflect the switch to attaching identifier-independent scopes to identifiers, instead of attaching identifier-specific renamings.

Arguably, the issue here is the way that pieces of syntax from different local scopes are placed into the same result syntax object, with the expectation that all the pieces are treated the same way. In other words, Racket programmers have gotten used to an unusual variant of quote-syntax, and most macros could be written just as well with a non-pruning variant.Then again, the pruning variant of quote-syntax tends to discard information about local bindings that is usually unwanted but preserved by the old quote-syntax.

There’s precedent for a variant of syntax-case that does not support assembling pieces as in the example. An early version of van Tonder’s macro expander (van Tonder 2007) had that property as a result of making the evaluation of syntax generate a fresh context.

Supplying a second, non-pruning variant of quote-syntax poses no problems. Our set-of-scopes implementation for Racket implements the non-pruning variant when a #:local keyword is added to a quote-syntax form. For example,

(free-identifier=? (let ([x 1]) (quote-syntax x #:local))
(quote-syntax x #:local))

produces #f instead of #t, because the scope introduced by let is preserved in the body’s syntax object. The non-pruning variant of quote-syntax is useful for embedding references in a program’s full expansion that are meant to be inspected by tools other than the Racket compiler; Typed Racket’s implementation uses the #:local variant of quote-syntax to embed type declarations (including declarations for local bindings) in a program’s expansion for use by its type checker.

3.3 Ensuring Distinct Bindings

A Racket macro’s implementation can arrange for an identifier introduced by a macro expansion to have an empty scope set.Avoiding a macro-introduction scope involves using a syntax-local-introduce function. More generally, a macro can arrange for identifiers that are introduced in different contexts to have the same symbol and scope set. If those identifiers appear as bindings via lambda, let, or let-syntax, then the new scope created for the binding form will ensure that the different identifiers produce different bindings. That is, the binding scope is always created after any expansion that introduced the bound identifier, so all bindings are kept distinct by those different binding scopes.

For example, assuming that make-scopeless creates an identifier that has no scopes in an expansion, then the let-x forms in

(define-syntax (let-x stx)
  (syntax-case stx ()
    [(_ rhs body)
     #`(let ([#,(make-scopeless 'x) rhs])
         body)]))

(let-x 5
  (let-x 6
    0))

create intermediate x identifiers that each have an empty scope set, but the full expansion becomes

(let ([x{alet} 5])
(let ([x{blet} 6])
0))

where alet and blet are created by each let (as a primitive binding form), and they distinguish the different x bindings.

In a definition context (see Use-Site Scopes and Macro-Generated Definitions), macro expansion can introduce an identifier to a binding position after the scope for the definition context is created (and after that scope is applied to the definition context’s original content). That ordering risks a collision among bindings in different definition contexts, where identifiers introduced into different definition contexts all have the same symbol and set of scopes.

For example, using a block form that creates a definition context and that we treat here as a primitive form, the uses of def-x in

(define-syntax (def-x stx)
  (syntax-case stx ()
    [(_ rhs)
     #`(define #,(make-scopeless 'x) rhs)]))

(block
  (define y 1)
  (def-x 5))
(block
  (define y 2)
  (def-x 6))

risk expanding as

(block
  (define y{adef} 1)
  (define x{} 5))
(block
  (define y{bdef} 2)
  (define x{} 6))

with conflicting bindings of x for the empty scope set.

To avoid the possibility of such collisions, in a definition context that supports both definitions and macro expansion, the context is represented by a pair of scopes: an outside-edge scope that is added to the original content of the definition context, and an inside-edge scope that is added to everything that appears in the definition context through macro expansion. The outside-edge scope distinguishes original identifiers from macro-introduced identifiers, while the inside-edge scope ensures that every binding created for the definition context is distinct from all other bindings.

Thus, the preceding example expands as

(block
  (define y{aout, ain} 1)
  (define x{ain} 5))
(block
  (define y{bout, bin} 2)
  (define x{bin} 6))

where the inside-edge scopes ain and bin distinguish the two x bindings. Meanwhile, if the definitions of y instead used the name x, they would remain distinguished from the macro-introduced xs by the outside-edge scopes aout and bout.

3.4 First-Class Definition Contexts

Racket exposes the expander’s support for definition contexts (see Use-Site Scopes and Macro-Generated Definitions) so that new macros can support definition contexts while potentially changing the meaning of a macro or variable definition. For example, the class macro allows local macro definitions in the class body while it rewrites specified function definitions to methods and other variable definitions to fields. The unit form similarly rewrites variable definitions to a mixture of private and exported definitions with a component.

Implementing a definition context starts with a call to syntax-local-make-definition-context, which creates a first-class (at compile time) value that represents the definition context. A macro can force expansion of forms in the definition context, it can add variable bindings to the definition context, and it can add compile-time bindings and values that are referenced by further macro expansion within the definition context. To a first approximation, a first-class definition context corresponds to an inside-edge scope that is added to any form expanded within the definition context and that houses the definition context’s bindings. A definition context also has a compile-time environment frame (extending the context of the macro use) to house the mapping of bindings to variables and compile-time values.

Like other definition contexts (see Use-Site Scopes and Macro-Generated Definitions), the compile-time environment must track use-site scopes that are generated for macro expansions within a first-class definition context. If the macro moves any identifier into a binding position in the overall expansion, then the macro normally must remove accumulated use-site scopes (for the current definition context only) by applying syntax-local-identifier-as-binding to the identifier. For example, the unit form implements a definition context that is similar to the body of a lambda, but variables are internally transformed to support mutually recursive references across unit boundaries.

(unit (import)
(export)
(define x 1)
x)

In this example, (define x 1) is expanded to (define-values (x) 1) with a use-site scope on x, but the intent is for this definition of x to capture the reference at the end of the unit form. If the unit macro simply moved the binding x into a letrec right-hand side, the x would not capture the final x as moved into the letrec body; the use-site scope on the definition’s x would prevent it from capturing the use. The solution is for the unit macro to apply syntax-local-identifier-as-binding to the definition’s x before using it as a letrec binding. Macros that use a definition context and bound-identifier=? must similarly apply syntax-local-identifier-as-binding to identifiers before comparing them with bound-identifier=?.

Even if a macro does not create a first-class definition context, some care is needed if a macro forces the expansion of subforms and moves pieces of the result into binding positions. Such a macro probably should not use syntax-local-identifier-as-binding, but it should first ensure that the macro use is in an expression context before forcing any subform expansions. Otherwise, the subform expansions could interact in unexpected ways with the use-site scopes of an enclosing definition context.

Use-site scopes associated with a first-class definition context are not stored directly in the compile-time environment frame for the definition context. Instead, they are stored in the closest frame that is not for a first-class definition context, so that the scopes are still tracked when the definition context is discarded (when the macro returns, typically). The scope for the definition context itself is similarly recorded in the closest such frame, so that quote-syntax can remove it, just like other binding scopes.

3.5 Rename Transformers

Racket’s macro API includes support for binding aliases through rename transformers. A compile-time binding to the result of make-rename-transformer is similar to a binding to a macro transformer that replaces the binding’s identifier with the aliased identifier. In addition, however, binding to a rename transformer causes free-identifier=? to report #t for the original identifier and its alias.

With set-of-scopes binding, a binding alias is supported through an extension of the binding table. The mapping from a ⟨symbol, scope set⟩ pair is to a ⟨binding, maybe-aliased⟩ pair, where an maybe-aliased is either empty or another identifier (i.e., a symbol and scope set) to which the mapped identifier should be considered free-identifier=?. When a transformer-binding form such as define-syntax or letrec-syntax detects that the value to be installed for a binding as a rename transformer, it updates the binding table to register the identifier within the transformer as an optional-alias.

The implementation of free-identifier=? must follow alias chains. Cycles are possible, and they cause the aliased identifier to be treated as unbound.

3.6 Modules and Phases

The module form creates a new scope for its body. More precisely, a module form creates an outside-edge scope and an inside-edge scope, like any other context that allows both definitions and macro expansion.

A (module* name #f ....) submodule form, where #f indicates that the enclosing module’s bindings should be visible, creates an additional scope in the obvious way. For other module* and module submodule forms, the macro expander prevents access to the enclosing module’s bindings by removing the two scopes of the enclosing module.

A module distinguishes bindings that have the same name but different phases. For example, lambda might have one meaning for run-time code within a module, but a different meaning for compile-time code within the same module. Furthermore, instantiating a module at a particular phase implies a phase shift in its syntax literals. Consider the module

(define x 1)
(define-for-syntax x 2)

(define id #'x)
(define-for-syntax id #'x)

(provide id (for-syntax id))

and suppose that the module is imported both normally and for compile time, the latter with a s: prefix. In a compile-time context within the importing module, both id and s:id will be bound to an identifier x that had the same scopes originally, but they should refer to different x bindings (in different module instances with different values).

Among the possibilities for distinguishing phases, having per-phase sets of scopes on an identifier makes the phase-shifting operation most natural. A local binding or macro expansion can add scopes at all phases, while module adds a distinct inside-edge scope to every phase (and the same outside-edge scope to all phases). Since every binding within a module is forced to have that module’s phase-specific inside-edge scopes, bindings at different scopes will be appropriately distinguished.

Racket constrains operations that inspect and adjust scopes on syntax objects to those that add, remove, or flip sets of scopes relative to some other syntax object. As a result, all of the phase-specific scopes for a module’s inside edge are added to or removed from a syntax object together.

Having a distinct “root” scope for each phase makes most local bindings phase-specific. That is, in

(define-for-syntax x 10)
(let ([x 1])
(let-syntax ([y x])
....))

the x on the right-hand side of let-syntax sees the top-level phase-1 x binding, not the phase-0 local binding. This is a change from Racket’s old approach to binding and phases, but the only programs that are affected are ones that would trigger an out-of-context error in the old system. Meanwhile, macros can construct identifiers that have no module scope, so out-of-context errors are still possible.

3.7 The Top Level

A namespace in Racket is a top-level evaluation context. Each call to eval uses a particular namespace (either the current namespace or one supplied to eval), and each read–eval–print loop works in a particular namespace. Namespaces are first-class values in Racket. A namespace can be created as fresh (e.g., for a sandbox), or it can be extracted from a module instantiation to simulate further evaluation in the module’s body.

As the connection to modules may suggest, a top-level namespace corresponds to a pair of scopes in the same way that a module has a scope. Each top-level namespace has the same outside-edge scope, but a distinct inside-edge scope where bindings reside.

The interactive and incremental nature of a top-level context poses certain semantic challenges when macro and variable definitions and re-definitions are allowed. For example, a reference to an unbound identifier within a function cannot be rejected out-of-hand, because it might be defined later within the namespace before the function is called. Similarly, a reference might be resolved as a variable when a function is created, but a later definition could change the identifier’s binding to a macro, so the function must either continue to refer to a variable or be somehow reinterpreted to have a macro use. These challenges are compounded when macros expand to a mixture of variable and macro definitions. Overall, the top level is hopeless: it cannot provide a treatment of binding that is as consistent as module while also performing its job as an interactive, exploratory evaluation context. In Racket, we accept top-level compromises and put all “real” code in modules.

Fortunately, top-level compromises pose little trouble for set-of-scopes binding. Supporting an incremental and redefinition-capable top-level context requires only that the binding table allow updates of existing bindings, which is straightforward.

A more troublesome aspect of top-level namespaces in Racket is that a form might be captured (via quote-syntax), expanded, or compiled in one namespace, and then evaluated in another namespace. Historically, top-level bindings have been equated with “unbound,” so that expanded and compiled forms originating in a top-level context could move freely among namespaces. This treatment as “unbound” has been fuzzy, however, and forms that originate from module namespaces have been treated differently from forms that originate in a non-module namespace.

To accommodate top-level namespaces with as much consistency (of binding treatment) and convenience (of moving forms among top-level namespaces) as possible, we introduce one more dimension to syntax objects. Instead of having a single set of scopes per phase, each syntax object has a sequence of scope sets per phase. When a syntax object is introduced to a top-level context that is not already included in its scope set (at a gven phase), the current scope set is cloned as a new first item of the list of sets; all further scope-set manipulations affect that first item. When looking up an identifier’s binding, however, the sequence is traversed until a binding is found. In other words, all but the first item in the list act as fallbacks for locating a binding. In practice, this fallback mechanisms is consistent with most existing code without otherwise interfering with scope management (since the fallbacks apply only when an identifier is otherwise unbound).

3.8 The Syntax-Function Zoo

Compared to Dybvig et al. (1993) or even Flatt et al. (2012), Racket adds many functions for manipulating syntax objects during macro expansion in ways that are sensitive to the expansion context. We have mentioned first-class definition context and rename transformers, but Racket provides many more tools:

The syntax-local-introduce function lets a macro explicitly toggle the introduction status of a syntax object by flipping the mark (specific to the current macro invocation) that distinguishes use-site and macro-introduced identifiers.
With the set-of-scopes expander, the mark is replaced by a scope, and syntax-local-introduce flips both the introduction scope and the use-site scope (if any) of the current expansion.
The make-syntax-introducer function generates a function that works like syntax-local-introduce, but for a fresh mark/scope.
As a new feature, and unlike syntax-local-introduce, the generated function accepts an additional argument to select the mode: 'flip (the default) to flip the scope’s presence, 'add to add the scope if not present already, and 'remove to remove the scope if it is currently present.
The make-syntax-delta-introducer function accepts two arguments, and it creates a function similar to the one produced by make-syntax-introducer, but instead of operating on a fresh mark/scope, it operates on all marks/scopes on the first syntax object that are not present on the second syntax object.
With the set-of-scopes expander, the generated function accepts a 'flip, 'add, or 'remove mode. This operation gives macro implementors relatively fine-grained control over scopes, but without exposing individual scopes, so the macro expander still can perform certain optimizations and make certain representation choices (e.g., due to the fact that the phase-specific “inside” scopes of a module are added or removed together).
The questionable syntax-local-make-delta-introducer function, which finds the difference between a reference and its binding so that the difference can be applied to another syntax object, is no longer needed, because it can be implemented with make-syntax-delta-introducer.
Since make-syntax-delta-introducer for the previous macro expander manipulated only marks, and not renamings, it was insufficient for certain kinds of scope transfer. Unifying all binding through scopes makes make-syntax-delta-introducer sufficient.
The syntax-local-get-shadower function in the old expander acts as an especially heavy hammer for non-hygienic binding. It synthesizes an identifier like a given one that will capture any identifiers that are original to the current expansion context.
The main use of syntax-local-get-shadower is to implement syntax parameters (Barzilay et al. 2011). In the Racket set-of-scopes expander, syntax-local-get-shadower has been simplified so that it effectively serves only as a hook for implementing syntax parameters, while other former uses are better and more consistently implemented through make-syntax-delta-introducer.
Implementing syntax parameters as a core feature of the macro expander would be sensible and slightly cleaner. We maintain the syntax-local-get-shadower approach only because it’s simpler with our current infrastructure.

As mentioned in First-Class Definition Contexts, a first-class definition context is difficult to specify in terms of renamings. In that case, an internal-definition context is backed by a renaming on syntax objects, but the renaming can refer to itself or other renamings, and so the binding-resolution process must handle a complex form of cycles. With set-of-scopes binding, an internal-definition context is backed by a scope for the context; an internal-definition context doesn’t create cyclic syntax-object structures, and it needs no special rules for resolving references to bindings.

← prev up next →

1	Background: Scope and Macros
2	Scope Sets for Pattern-Based Macros
3	Scope Sets for Procedural Macros and Modules
4	Implementation and Experience
5	Model
6	Defining Hygiene
7	Other Related Work
8	Conclusion
	Acknowledgments
	References

3.1	Identifier Comparisons with Scope Sets
3.2	Local Bindings and Syntax Quoting
3.3	Ensuring Distinct Bindings
3.4	First-Class Definition Contexts
3.5	Rename Transformers
3.6	Modules and Phases
3.7	The Top Level
3.8	The Syntax-Function Zoo