Annotated source for src/blib2to3/pgen2/conv.py

src/blib2to3/pgen2/conv.py
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | # Copyright 2004-2005 Elemental Security, Inc. All Rights Reserved.
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | # Licensed to PSF under a Contributor Agreement.
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | # mypy: ignore-errors
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | """Convert graminit.[ch] spit out by pgen to Python code.
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | Pgen is the Python parser generator.  It is useful to quickly create a
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | parser from a grammar file in Python's grammar notation.  But I don't
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | want my parsers to be written in C (yet), so I'm translating the
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | parsing tables to Python data structures and writing a Python parse
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | engine.
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | Note that the token numbers are constants determined by the standard
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | Python tokenizer.  The standard token module defines these numbers and
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | their names (the names are not used much).  The token numbers are
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | hardcoded into the Python tokenizer and into pgen.  A Python
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | implementation of the Python tokenizer is also available, in the
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | standard tokenize module.
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | On the other hand, symbol numbers (representing the grammar's
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | non-terminals) are assigned by pgen based on the actual grammar
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | input.
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | Note: this module is pretty much obsolete; the pgen module generates
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | equivalent grammar tables directly from the Grammar.txt input file
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | without having to invoke the Python pgen C program.
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | """
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | # Python imports
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | import re
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | # Local imports
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | from pgen2 import grammar, token
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- -- -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- | class Converter(grammar.Grammar):
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- |     """Grammar subclass that reads classic pgen output files.
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- |     The run() method reads the tables as produced by the pgen parser
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- |     generator, typically contained in two C files, graminit.h and
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- |     graminit.c.  The other methods are for internal use only.
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- |     See the base class for more documentation.
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- | 
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- |     """
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- | 
0005 0004 0000 0000 0000 0001 07 01 000 000 000 000 000 000 0000.00 0000.00 0000.00 |     def run(self, graminit_h, graminit_c):
0005 0004 0000 0000 0000 0001 07 01 000 000 000 000 000 000 0000.00 0000.00 0000.00 |         """Load the grammar tables from the text files written by pgen."""
0005 0004 0000 0000 0000 0001 07 01 000 000 000 000 000 000 0000.00 0000.00 0000.00 |         self.parse_graminit_h(graminit_h)
0005 0004 0000 0000 0000 0001 07 01 000 000 000 000 000 000 0000.00 0000.00 0000.00 |         self.parse_graminit_c(graminit_c)
0005 0004 0000 0000 0000 0001 07 01 000 000 000 000 000 000 0000.00 0000.00 0000.00 |         self.finish_off()
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- | 
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |     def parse_graminit_h(self, filename):
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |         """Parse the .h file written by pgen.  (Internal)
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 | 
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |         This file is a sequence of #define statements defining the
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |         nonterminals of the grammar as numbers.  We build two tables
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |         mapping the numbers to names and back.
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 | 
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |         """
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |         try:
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |             f = open(filename)
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |         except OSError as err:
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |             print(f"Can't open {filename}: {err}")
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |             return False
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |         self.symbol2number = {}
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |         self.number2symbol = {}
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |         lineno = 0
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |         for line in f:
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |             lineno += 1
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |             mo = re.match(r"^#define\s+(\w+)\s+(\d+)$", line)
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |             if not mo and line.strip():
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |                 print(f"{filename}({lineno}): can't parse {line.strip()}")
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |             else:
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |                 symbol, number = mo.groups()
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |                 number = int(number)
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |                 assert symbol not in self.symbol2number
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |                 assert number not in self.number2symbol
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |                 self.symbol2number[symbol] = number
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |                 self.number2symbol[number] = symbol
0023 0022 0000 0005 0002 0000 07 05 004 009 005 009 013 014 0051.81 0103.61 0002.00 |         return True
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- | 
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |     def parse_graminit_c(self, filename):
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         """Parse the .c file written by pgen.  (Internal)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 | 
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         The file looks as follows.  The first two lines are always this:
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 | 
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         #include "pgenheaders.h"
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         #include "grammar.h"
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 | 
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         After that come four blocks:
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 | 
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         1) one or more state definitions
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         2) a table defining dfas
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         3) a table defining labels
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         4) a struct defining the grammar
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 | 
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         A state definition has the following form:
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         - one or more arc arrays, each of the form:
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |           static arc arcs_<n>_<m>[<k>] = {
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                   {<i>, <j>},
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                   ...
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |           };
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         - followed by a state array, of the form:
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |           static state states_<s>[<t>] = {
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                   {<k>, arcs_<n>_<m>},
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                   ...
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |           };
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 | 
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         """
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         try:
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             f = open(filename)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         except OSError as err:
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             print(f"Can't open {filename}: {err}")
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             return False
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         # The code below essentially uses f's iterator-ness!
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno = 0
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 | 
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         # Expect the two #include lines
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert line == '#include "pgenheaders.h"\n', (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert line == '#include "grammar.h"\n', (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 | 
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         # Parse the state definitions
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         allarcs = {}
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         states = []
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         while line.startswith("static arc "):
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             while line.startswith("static arc "):
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 mo = re.match(r"static arc arcs_(\d+)_(\d+)\[(\d+)\] = {$", line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 assert mo, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 n, m, k = list(map(int, mo.groups()))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 arcs = []
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 for _ in range(k):
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                     lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                     mo = re.match(r"\s+{(\d+), (\d+)},$", line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                     assert mo, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                     i, j = list(map(int, mo.groups()))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                     arcs.append((i, j))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 assert line == "};\n", (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 allarcs[(n, m)] = arcs
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             mo = re.match(r"static state states_(\d+)\[(\d+)\] = {$", line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             assert mo, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             s, t = list(map(int, mo.groups()))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             assert s == len(states), (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             state = []
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             for _ in range(t):
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 mo = re.match(r"\s+{(\d+), arcs_(\d+)_(\d+)},$", line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 assert mo, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 k, n, m = list(map(int, mo.groups()))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 arcs = allarcs[n, m]
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 assert k == len(arcs), (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 state.append(arcs)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             states.append(state)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             assert line == "};\n", (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         self.states = states
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 | 
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         # Parse the dfas
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         dfas = {}
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         mo = re.match(r"static dfa dfas\[(\d+)\] = {$", line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert mo, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         ndfas = int(mo.group(1))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         for i in range(ndfas):
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             mo = re.match(r'\s+{(\d+), "(\w+)", (\d+), (\d+), states_(\d+),$', line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             assert mo, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             symbol = mo.group(2)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             number, x, y, z = list(map(int, mo.group(1, 3, 4, 5)))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             assert self.symbol2number[symbol] == number, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             assert self.number2symbol[number] == symbol, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             assert x == 0, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             state = states[z]
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             assert y == len(state), (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             mo = re.match(r'\s+("(?:\\\d\d\d)*")},$', line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             assert mo, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             first = {}
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             rawbitset = eval(mo.group(1))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             for i, c in enumerate(rawbitset):
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 byte = ord(c)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 for j in range(8):
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                     if byte & (1 << j):
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                         first[i * 8 + j] = 1
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             dfas[number] = (state, first)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert line == "};\n", (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         self.dfas = dfas
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 | 
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         # Parse the labels
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         labels = []
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         mo = re.match(r"static label labels\[(\d+)\] = {$", line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert mo, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         nlabels = int(mo.group(1))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         for i in range(nlabels):
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             mo = re.match(r'\s+{(\d+), (0|"\w+")},$', line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             assert mo, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             x, y = mo.groups()
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             x = int(x)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             if y == "0":
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 y = None
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             else:
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |                 y = eval(y)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             labels.append((x, y))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert line == "};\n", (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         self.labels = labels
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 | 
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         # Parse the grammar struct
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert line == "grammar _PyParser_Grammar = {\n", (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         mo = re.match(r"\s+(\d+),$", line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert mo, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         ndfas = int(mo.group(1))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert ndfas == len(self.dfas)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert line == "\tdfas,\n", (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         mo = re.match(r"\s+{(\d+), labels},$", line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert mo, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         nlabels = int(mo.group(1))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert nlabels == len(self.labels), (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         mo = re.match(r"\s+(\d+)$", line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert mo, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         start = int(mo.group(1))
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert start in self.number2symbol, (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         self.start = start
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         assert line == "};\n", (lineno, line)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         try:
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             lineno, line = lineno + 1, next(f)
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         except StopIteration:
0125 0124 0006 0021 0011 0006 07 14 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             pass
0125 0124 0006 0021 0011 0006 07 -- 006 033 045 090 039 135 0713.53 5837.97 0008.18 |         else:
0125 0124 0006 0021 0011 0006 07 -- 006 033 045 090 039 135 0713.53 5837.97 0008.18 |             assert 0, (lineno, line)
---- ---- ---- ---- ---- ---- ---- 07 -- --- --- --- --- --- --- ------- ------- ------- | 
0009 0008 0002 0000 0000 0001 07 05 004 006 004 008 010 012 0039.86 0106.30 0002.67 |     def finish_off(self):
0009 0008 0002 0000 0000 0001 07 05 004 006 004 008 010 012 0039.86 0106.30 0002.67 |         """Create additional useful structures.  (Internal)."""
0009 0008 0002 0000 0000 0001 07 05 004 006 004 008 010 012 0039.86 0106.30 0002.67 |         self.keywords = {}  # map from keyword strings to arc labels
0009 0008 0002 0000 0000 0001 07 05 004 006 004 008 010 012 0039.86 0106.30 0002.67 |         self.tokens = {}  # map from numeric token values to arc labels
0009 0008 0002 0000 0000 0001 07 05 004 006 004 008 010 012 0039.86 0106.30 0002.67 |         for ilabel, (type, value) in enumerate(self.labels):
0009 0008 0002 0000 0000 0001 07 05 004 006 004 008 010 012 0039.86 0106.30 0002.67 |             if type == token.NAME and value is not None:
0009 0008 0002 0000 0000 0001 07 05 004 006 004 008 010 012 0039.86 0106.30 0002.67 |                 self.keywords[value] = ilabel
0009 0008 0002 0000 0000 0001 07 05 004 006 004 008 010 012 0039.86 0106.30 0002.67 |             elif value is None:
0009 0008 0002 0000 0000 0001 07 05 004 006 004 008 010 012 0039.86 0106.30 0002.67 |                 self.tokens[type] = ilabel