Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

newcommands parsing is broken #137

Open
utsekaj42 opened this issue Nov 14, 2017 · 1 comment
Open

newcommands parsing is broken #137

utsekaj42 opened this issue Nov 14, 2017 · 1 comment

Comments

@utsekaj42
Copy link

newcommands parsing is broken as any newcommand replacement that is applied to text containing brackets with stop at the first closing bracket, e.g.

\newcommand{\normg}[1]{\lvert\lvert {#1} \rvert\rvert}
\normg{x = \sum_{i}^{P} f(d_i)}

will result in

\normg{\lvert\lvert x = {\sum_{i} \rvert\rvert f(d_i)}^{P}

I believe is due to the limitations of the regex, but I'm not very competent with regex. I believe what is required is to some how match the brackets, further for nested newcommands it is unclear how many levels should be evaluated. In html, latex, and ipynb (as of hplgit/doconce/pull/136 it is fine to leave newcommands, thus it is question of how to handle this for other formats.

I had started on a quick fix for non-nested newcommands with only 1 argument, which is

def recursive_bracket_parser(s, i):
    """ Inspired by <https://stackoverflow.com/a/14952529/4000607>"""
    while i < len(s):
        if s[i] == r'{' and (i<1 or s[i-1] != r'\\'):
            i = recursive_bracket_parser(s, i+1)
        elif s[i] == r'}' and (i<1 or s[i-1] != r'\\'):
            return i+1
        else:
            # process whatever is at s[i]
            i += 1
    return i

and an example mirror substitute in expand_newcommands.py

newcommands_test= [(r'\\normg', r'\\lvert\\lvert {NEWCOMMANDARG} \\rvert\\rvert}',1),
(r'\\normf', r'\\normg{NEWCOMMANDARG}_{NEWCOMMANDARG}', 2)] 

for pattern, replacement, nargs in newcommands_test:
    # 0 check if replacement at begining of string
    m = re.search(pattern, text)
    if m and m.start==0:
        first_match=0
    else:
        first_match=1
    # 1 Find all matches
    matches = re.split(pattern, text)
    # 2 process each match
    for match in matches[first_match:]: 
        #print(match, len(match))
        args = []
        for idx in range(nargs):
            end_arg = recursive_bracket_parser(match,1)
            args.append(match[0:end_arg])
            match = match[end_arg:]
        
        tmp = replacement
        for idx, arg in enumerate(args):
            print(tmp) 
            print(arg)
            tmp, n = re.subn(r'{NEWCOMMANDARG}', arg, tmp, count=1)
        print(tmp)
        #tmp =  replacement.format(*args) + match
        #@print(tmp)

I can continue with this line of work, but I think just including the newcommands in ipython, latex, and html is fine for my uses of doconce. Further I'm not sure if this is the right path to go down, but perhaps it could help with issues others have.

@KGHustad
Copy link
Collaborator

Regular expressions are only able to recognise regular languages. This is a big limitation when working with any language which may contain recursive structures.

We will implement proper parsing in the next major version, so I'm hesitant to spend time fixing this now.

It's probably best for you to continue with just including the newcommands for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants