Skip to content

Commit acb3a70

Browse files
committed
Massively speedup parser table generation
Through avoidance of unnecessary work and better use of modern Python data structures, I've been able to reduce the total build time of the three EdgeDB parsers from ~200s to ~26s, a factor of 7.7!
1 parent 91365c3 commit acb3a70

File tree

9 files changed

+622
-637
lines changed

9 files changed

+622
-637
lines changed

parsing/__init__.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -124,21 +124,23 @@
124124

125125

126126
__all__ = (
127-
"SpecError",
128-
"UnexpectedToken",
127+
"Glr",
128+
"Lr",
129+
"ModuleSpecSource",
129130
"Nonterm",
131+
"Parser",
130132
"Precedence",
131133
"Spec",
134+
"SpecError",
135+
"SpecSource",
136+
"UnexpectedToken",
132137
"Token",
133-
"Lr",
134-
"Glr",
135-
"ModuleSpecSource",
136138
)
137139

138-
from parsing.ast import Nonterm, Token
139140
from parsing.automaton import Spec
140-
from parsing.grammar import Precedence
141+
from parsing.ast import Nonterm, Token, Precedence
141142
from parsing.errors import SpecError, UnexpectedToken
143+
from parsing.interfaces import Parser, SpecSource
142144
from parsing.module_spec import ModuleSpecSource
143145
from parsing.lrparser import Lr
144146
from parsing.glrparser import Glr

parsing/ast.py

Lines changed: 74 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,8 @@
99
from typing import TYPE_CHECKING
1010

1111
if TYPE_CHECKING:
12-
from parsing.interfaces import Parser, SymbolSpec
12+
from parsing.grammar import SymbolSpec
13+
from parsing.interfaces import Parser
1314

1415

1516
class Symbol:
@@ -134,3 +135,75 @@ class id(Token):
134135

135136
def __init__(self, parser: Parser) -> None:
136137
Symbol.__init__(self, parser.sym_spec(self), parser)
138+
139+
140+
class Precedence:
141+
"""
142+
Precedences can be associated with tokens, non-terminals, and
143+
productions. Precedence isn't as important for GLR parsers as for LR
144+
parsers, since GLR parsing allows for parse-time resolution of
145+
ambiguity. Still, precedence can be useful for reducing the volume of
146+
ambiguities that must be dealt with at run-time.
147+
148+
There are five precedence types: %fail, %nonassoc, %left, %right, and
149+
%split. Each precedence can have relationships with other precedences:
150+
<, >, or =. These relationships specify a directed acyclic graph (DAG),
151+
which is used to compute the transitive closures of relationships among
152+
precedences. If no path exists between two precedences that are
153+
compared during conflict resolution, parser generation fails. < and >
154+
are reflexive; it does not matter which is used. Conceptually, the =
155+
relationship causes precedences to share a node in the DAG.
156+
157+
During conflict resolution, an error results if no path exists in the
158+
DAG between the precedences under consideration. When such a path
159+
exists, the highest precedence non-terminal or production takes
160+
precedence. Associativity only comes into play for shift/reduce
161+
conflicts, where the terminal and the production have equivalent
162+
precedences (= relationship). In this case, the non-terminal's
163+
associativity determines how the conflict is resolved.
164+
165+
The %fail and %split associativities are special because they can be
166+
mixed with other associativities. During conflict resolution, if
167+
another action has non-%fail associativity, then the %fail (lack of)
168+
associativity is overridden. Similarly, %split associativity overrides
169+
any other associativity. In contrast, any mixture of associativity
170+
between %nonassoc/%left/%right causes an unresolvable conflict.
171+
172+
%fail : Any conflict is a parser-generation-time error.
173+
174+
A pre-defined precedence, [none], is provided. It has
175+
%fail associativity, and has no pre-defined precedence
176+
relationships.
177+
178+
%nonassoc : Resolve shift/reduce conflicts by removing both
179+
possibilities, thus making conflicts a parse-time error.
180+
181+
%left : Resolve shift/reduce conflicts by reducing.
182+
183+
%right : Resolve shift/reduce conflicts by shifting.
184+
185+
%split : Do not resolve conflicts; the GLR algorithm will split
186+
the parse stack when necessary.
187+
188+
A pre-defined precedence, [split], is provided. It has
189+
%split associativity, and has no pre-defined precedence
190+
relationships.
191+
192+
By default, all symbols have [none] precedence. Each production
193+
inherits the precedence of its left-hand-side nonterminal's precedence
194+
unless a precedence is manually specified for the production.
195+
196+
Following are some examples of how to specify precedence classes:
197+
198+
class P1(Parsing.Precedence):
199+
"%split p1"
200+
201+
class p2(Parsing.Precedence):
202+
"%left" # Name implicitly same as class name.
203+
204+
class P3(Parsing.Precedence):
205+
"%left p3 >p2" # No whitespace is allowed between > and p2.
206+
207+
class P4(Parsing.Precedence):
208+
"%left p4 =p3" # No whitespace is allowed between = and p3.
209+
"""

0 commit comments

Comments
 (0)