Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
constant.numeric.dec.python only has 2 capture groups #198
Conversation
|
actually, on second thought -- I should probably check the rest of the regexen too |
|
ok there are more, but I am having a hard time wrapping my head around the include scheme here's the script I used to find them (can probably be improved a little bit to have more/better output): import argparse
import re
import plistlib
from typing import Any
from typing import Dict
import onigurumacffi
_BACKREF_RE = re.compile(r'((?<!\\)(?:\\\\)*)\\([0-9]+)')
def _fix_end(s: str) -> str:
"""end can have backreferences"""
return _BACKREF_RE.sub('ZZZ', s)
def _visit_captures(reg: str, captures: Dict[str, Dict[str, Any]]) -> None:
max_n = onigurumacffi.compile(reg).number_of_captures()
for k, v in captures.items():
if int(k) > max_n:
print(f'{k} > {max_n}: {reg!r} {v}')
_visit_rule(v)
def _visit_rule(rule: Dict[str, Any]) -> None:
if 'match' in rule and 'captures' in rule:
_visit_captures(rule['match'], rule['captures'])
if 'begin' in rule:
if 'captures' in rule:
_visit_captures(rule['begin'], rule['captures'])
_visit_captures(_fix_end(rule['end']), rule['captures'])
if 'beginCaptures' in rule:
_visit_captures(rule['begin'], rule['beginCaptures'])
if 'endCaptures' in rule:
_visit_captures(_fix_end(rule['end']), rule['endCaptures'])
for sub_rule in rule.get('patterns', ()):
_visit_rule(sub_rule)
for sub_rule in rule.get('repository', {}).values():
_visit_rule(sub_rule)
def main() -> int:
parser = argparse.ArgumentParser()
parser.add_argument('filename')
args = parser.parse_args()
with open(args.filename, 'rb') as f:
contents = plistlib.load(f)
_visit_rule(contents)
return 0
if __name__ == '__main__':
exit(main())$ python3 t.py grammars/MagicPython.tmLanguage
2 > 1: "(\\]|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\]|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\'\\'\\')" {'name': 'invalid.illegal.newline.python'}
2 > 1: '(""")' {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\)|(?=\\'\\'\\'))" {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(\\)|(?="""))' {'name': 'invalid.illegal.newline.python'}
2 > 1: "(\\'\\'\\')" {'name': 'invalid.illegal.newline.python'}
2 > 1: '(""")' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'}
2 > 1: '(ZZZ)' {'name': 'invalid.illegal.newline.python'} |
|
@elprans @vpetrovykh anything left to do here? |
|
@1st1 maybe? |
|
OK, the change looks good. As for the other hits, all of the Sorry for the delay in merging this. |

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

also validated this against oniguruma via onigurumacffi