newcommands parsing is broken #137

utsekaj42 · 2017-11-14T11:14:49Z

newcommands parsing is broken as any newcommand replacement that is applied to text containing brackets with stop at the first closing bracket, e.g.

\newcommand{\normg}[1]{\lvert\lvert {#1} \rvert\rvert}
\normg{x = \sum_{i}^{P} f(d_i)}

will result in

\normg{\lvert\lvert x = {\sum_{i} \rvert\rvert f(d_i)}^{P}

I believe is due to the limitations of the regex, but I'm not very competent with regex. I believe what is required is to some how match the brackets, further for nested newcommands it is unclear how many levels should be evaluated. In html, latex, and ipynb (as of hplgit/doconce/pull/136 it is fine to leave newcommands, thus it is question of how to handle this for other formats.

I had started on a quick fix for non-nested newcommands with only 1 argument, which is

def recursive_bracket_parser(s, i):
    """ Inspired by <https://stackoverflow.com/a/14952529/4000607>"""
    while i < len(s):
        if s[i] == r'{' and (i<1 or s[i-1] != r'\\'):
            i = recursive_bracket_parser(s, i+1)
        elif s[i] == r'}' and (i<1 or s[i-1] != r'\\'):
            return i+1
        else:
            # process whatever is at s[i]
            i += 1
    return i

and an example mirror substitute in expand_newcommands.py

newcommands_test= [(r'\\normg', r'\\lvert\\lvert {NEWCOMMANDARG} \\rvert\\rvert}',1),
(r'\\normf', r'\\normg{NEWCOMMANDARG}_{NEWCOMMANDARG}', 2)] 

for pattern, replacement, nargs in newcommands_test:
    # 0 check if replacement at begining of string
    m = re.search(pattern, text)
    if m and m.start==0:
        first_match=0
    else:
        first_match=1
    # 1 Find all matches
    matches = re.split(pattern, text)
    # 2 process each match
    for match in matches[first_match:]: 
        #print(match, len(match))
        args = []
        for idx in range(nargs):
            end_arg = recursive_bracket_parser(match,1)
            args.append(match[0:end_arg])
            match = match[end_arg:]
        
        tmp = replacement
        for idx, arg in enumerate(args):
            print(tmp) 
            print(arg)
            tmp, n = re.subn(r'{NEWCOMMANDARG}', arg, tmp, count=1)
        print(tmp)
        #tmp =  replacement.format(*args) + match
        #@print(tmp)

I can continue with this line of work, but I think just including the newcommands in ipython, latex, and html is fine for my uses of doconce. Further I'm not sure if this is the right path to go down, but perhaps it could help with issues others have.

The text was updated successfully, but these errors were encountered:

KGHustad · 2017-11-14T13:45:41Z

Regular expressions are only able to recognise regular languages. This is a big limitation when working with any language which may contain recursive structures.

We will implement proper parsing in the next major version, so I'm hesitant to spend time fixing this now.

It's probably best for you to continue with just including the newcommands for now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

newcommands parsing is broken #137

newcommands parsing is broken #137

utsekaj42 commented Nov 14, 2017

KGHustad commented Nov 14, 2017

newcommands parsing is broken #137

newcommands parsing is broken #137

Comments

utsekaj42 commented Nov 14, 2017

KGHustad commented Nov 14, 2017