Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Another attempt fixing nested fenced divs keeping structure #52

Merged
merged 21 commits into from
Dec 14, 2022

Conversation

Omikhleia
Copy link
Contributor

PR #51 (for issue #44) got merged a bit to quickly to my taste. Here is a PR rebased on it, but reverting to the previous writer API and trying to keep the hierarchical structure. It passes the tests for #51. I wish I had more time to check for edge cases, but at least you have it here for scrutiny.

Copy link
Collaborator

@Witiko Witiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may not need to use the full markdown grammar while looking for the end of a div, but it seems to me that we need to consider at least code fences.

lunamark/reader/markdown.lua Outdated Show resolved Hide resolved
lunamark/reader/markdown.lua Outdated Show resolved Hide resolved
@Witiko
Copy link
Collaborator

Witiko commented Nov 24, 2022

This issue should also close #53 and #54. @Omikhleia, can you please add #52 (comment), #53, #54 to the unit tests, so that we can at least see the extent to which the code is failing at the moment?

PR #51 (for issue #44) got merged a bit to quickly to my taste.

I am sorry about that. I should have heeded your concerns. I am way behind on my schedule for the monthly release of witiko/markdown due to the bugged fenced div implementation, but rushing ahead and introducing new bugs to master seems reckless (although we are moving forward!).

Witiko added a commit to Omikhleia/lunamark that referenced this pull request Nov 25, 2022
@Witiko Witiko force-pushed the another_attempt_nested_divs branch 2 times, most recently from d8ff8b1 to b16842e Compare November 25, 2022 12:26
@Witiko
Copy link
Collaborator

Witiko commented Nov 25, 2022

@Omikhleia I added edge cases from #54, #53, and #52 (comment) to the unit tests. CI shows that although this pull request already fixes #54, it currently fails to address the edge cases from #53 and #52 (comment). Will you find the time to tackle this by the end of the month, or should I?

@Omikhleia
Copy link
Contributor Author

Omikhleia commented Nov 25, 2022

@Witiko
Div-blockquote tests we might want (if not already covered)

Here is a properly nested div in a block quote:

::: {.level1}
This is the beginning of a div

> This is a blockquote
>
> :::: {.level2-inside-bloquote}
> This is a inside inside a block quote
> ::::
:::

As nested, the quoted colons do not close the embedding div.

:::::: {.some-classname}
This is the beginning of a div

> This is a blockquote
> ::::::

::::::

The first is a "regular" case that works, the second currently breaks.

Will you find the time to tackle this by the end of the month, or should I?

I won't have the bandwidth to guarantee I can work on it next week (and probably a bit more). Feel free to go ahead. I'll try to follow the PR though, for comments. (And if at some point you want to squash commits, no problem either. A Co-authored-by: name <[email protected]> comment might then be useful - but not mandated in any way, I just discovered that GitHub trick recently).

@Witiko Witiko marked this pull request as draft November 25, 2022 12:48
@Witiko
Copy link
Collaborator

Witiko commented Nov 25, 2022

The first is a "regular" case that works, the second currently breaks.

Good thinking. I will add the regular case.

(And if at some point you want to squash commits, no problem either. A Co-authored-by: name [email protected] comment might then be useful - but not mandated in any way, I just discovered that GitHub trick recently).

Thanks, I did not know that. I will also do that for commits based on your comments.

@Witiko
Copy link
Collaborator

Witiko commented Nov 25, 2022

I won't have the bandwidth to guarantee I can work on it next week (and probably a bit more).

I have postponed the 2.19.0 release of witiko/markdown to December 23, so this is less urgent for me. Besides fenced divs, I will also need #43 closed before the release. If you'd like, we can split the effort and I will focus on #43.

If you'd like to have a direct channel to discuss the implementation, please feel free to join the witiko/markdown discord server or matrix space.

@Witiko Witiko marked this pull request as ready for review November 30, 2022 12:11
@Witiko
Copy link
Collaborator

Witiko commented Nov 30, 2022

@Omikhleia I am done and would appreciate your review.

  • adding and exception for larsers.fenced_div_end to larsers.Endline and larsers.NonbreakingEndline.

This ended up being trickier than I expected. Assume the following input:

:::
This is not a div
:::

If we add an exception for larsers.fenced_div_end to larsers.Endline and larsers.NonbreakingEndline, then this will produce the following output:

<p>::: This is not a div</p>
<p>:::</p>

Therefore, I ended up keeping the nesting level as a named capture inside the grammar and we only match the exception for larsers.fenced_div_end if we are nested inside a div:

local syntax =
{ "Blocks",
Blocks = Cg(Ct("") / "0", "div_level") -- initialize div_level to 0

local function increment_div_level(increment)
local function update_div_level(s, i, current_level) -- luacheck: ignore s i
current_level = tonumber(current_level)
local next_level = tostring(current_level + increment)
return true, next_level
end
return Cg( Cmt(Cb("div_level"), update_div_level)
, "div_level")
end
larsers.FencedDiv = larsers.fenced_div_begin * increment_div_level(1)
* parsers.blanklines
* Ct( (V("Block") - larsers.fenced_div_end)^-1
* (parsers.blanklines / function()
return writer.interblocksep
end
* (V("Block") - larsers.fenced_div_end))^0)
* parsers.blanklines
* larsers.fenced_div_end * increment_div_level(-1)
/ function (attr, div) return div, attr end
/ writer.div

if options.fenced_divs then
local function check_div_level(s, i, current_level) -- luacheck: ignore s i
current_level = tonumber(current_level)
return current_level > 0
end
local is_inside_div = Cmt(Cb("div_level"), check_div_level)
larsers.fencestart = larsers.fencestart
+ is_inside_div -- break out of a paragraph when we
-- are inside a div and see a closing tag
* larsers.fenced_div_end
end

This follows a similar trick done by Pandoc. However, I welcome any ideas for simplification.

@Witiko
Copy link
Collaborator

Witiko commented Nov 30, 2022

One last quibble I have with the current implementation is that a div must end with a blank line, otherwise it will not be placed in a paragraph. Let's take the following example:

::: {.myclass}
Some div
:::

Currently, Lunamark will produce the following output:

<div class="myclass">Some div</div>

Your original solution from this PR called parse_blocks, where we could artificially insert newlines at the end of a div. I am not sure we can easily do that with my solution. However, this trouble with paragraphs seems to affect lunamark in general:

$ printf '> xxx' | lunamark
<blockquote>
xxx
</blockquote>

$ printf '> xxx\n' | lunamark
<blockquote>
<p>xxx</p>
</blockquote>

Regardless of whether this is a feature or a bug, it seems out-of-scope for this PR.

@Omikhleia
Copy link
Contributor Author

Omikhleia commented Nov 30, 2022

@Witiko What an amazing work you have done here! I'm sorry I couldn't contribute - Not only did I expect the week to be very busy, but moreover I am sick with my brain at 10% capacity... Just took the time to test you branch on a bunch of small tests I had (mostly redundant with the ones you added, but written a bit differently, so who knows...) = They all went right...

There's one case where I didn't get the same result as Pandoc and was a bit astonished at first:

::::: {.some-classname}
This is the beginning of a div

> This is a blockquote
> :::::
> but not a nested div

:::

Pandoc gives

<div class="some-classname">
<p>This is the beginning of a div</p>
<blockquote>
<p>This is a blockquote</p>
<p>::::: but not a nested div</p>
</blockquote>
</div>

Lunamark with your fixes (some line breaks added for easier comparison)

<div class="some-classname">
<p>This is the beginning of a div</p>
<blockquote>
<p>This is a blockquote ::::: but not a nested div</p>
</blockquote>
</div>

Small discrepancy for a "broken" case - and I am even thinking our output might be more correct maybe.

I'll look at code as soon as I can and have the necessary focus^^

@Witiko
Copy link
Collaborator

Witiko commented Nov 30, 2022

Small discrepancy for a "broken" case - and I am even thinking our output might be more correct maybe.

Definitely looks like a bug. I will test this with up-to-date Pandoc and report it if present.

I am sick with my brain at 10% capacity... I'll look at code as soon as I can and have the necessary focus^^

I am sorry to hear that! Please, take all the time you need.

@Omikhleia
Copy link
Contributor Author

@Witiko Trying to check combinations of options --
The implementation currently fails on error (./lunamark/reader/markdown.lua:847: back reference 'div_level' not found) with:

local lunamark = require("lunamark")
local opts = { 
    --  smart = true,
    --  strikeout = true,
    --  subscript = true,
    --  superscript = true,
    --  definition_lists = true,
    -- notes = true,
     inline_notes = true,
    --  fenced_code_blocks = true,
    --  fenced_code_attributes = true,
    --  bracketed_spans = true,
     fenced_divs = true,
    --  raw_attribute = true,
    --  link_attributes = true,
    --  startnum = true,
    --  fancy_lists = true,
    --  task_list = true,
    --  hash_enumerators = true,
    --  table_captions = true,
    --  pipe_tables = true,
    --  header_attributes = true,
    --  line_blocks = true,
    --  escaped_line_breaks = true,
 }
local writer = lunamark.writer.html5.new(opts)
local parse = lunamark.reader.markdown.new(writer, opts)
local result, matadata = parse([[

Note^[From the 
original.]

::: {.cit custom-style=raggedleft}
Div
:::

]])
print(result)

It does work if the note is on a single line (Note^[From the original.]) -- So it's unlikely due to notes themselves, but probably a change affecting end-of-lines... but I haven't been able to pinpoint it this morning.

@Omikhleia
Copy link
Contributor Author

Omikhleia commented Dec 3, 2022

but I haven't been able to pinpoint it this morning.

Doh, it was in front of my eyes: larsers.Endline and larsers.NonbreakingEndline both use larsers.fencestart (which in turn uses is_inside_div needing the div level)

My workaround so far
image

Not sure this is the proper way to handle the situation - Tests pass and I could process a 10-page document using all above extensions enabled, but it might not be general.

@Witiko
Copy link
Collaborator

Witiko commented Dec 3, 2022

Seems reasonable. We should check that there do not exist other copies of syntax that change the root rule and may need similar treatment as inlines_t. If they did, I propose we extract the Cg(...) black magic into larsers.InitializeState to keep things DRY.

@Witiko
Copy link
Collaborator

Witiko commented Dec 7, 2022

@Omikhleia I did not find any other problems, but I extracted the state-managing pattern to larsers.InitializeState regardless. Please, let me know if you are happy with the PR. It looks good to me.

@Omikhleia
Copy link
Contributor Author

LGTM !

I just successfully tested two cases I had seen where end-of-lines were breaking. The inline notes, as noted above:

Note^[Some
note]

but also the indirect links, like the following:

Link
[some
link](www.google.com "toto titi")

So besides larsers.inlines, this covers the uses of inlines_no_inline_note and inlines_no_link_t. The result is now the same, whether fenced_divs is enable or not.

Only remaining item would be inlines_nbsp but I have no idea how to check it - it's used in citations, and I never used them so far (just browsed about them and tried, but I don't how to trigger the case in question). This being said, I guess we could assume being covered by the two other cases, since it's (more or less) the same logic. So from my viewpoint, it's all good.

@Omikhleia
Copy link
Contributor Author

Omikhleia commented Dec 7, 2022

Erm. Perhaps spoke too soon...

local lunamark = require("lunamark")
local opts = { 
    fenced_divs = true,
 }
local writer = lunamark.writer.html5.new(opts)
local parse = lunamark.reader.markdown.new(writer, opts)
local result, matadata = parse([[

::: {.cit custom-style=raggedleft}
I am a _fenced_ div
:::

::: {.cit custom-style=raggedleft}

I am a _fenced_ div

:::

]])
print(result)

Gives

<div class="cit">I am a <em>fenced</em> div</div>

<div class="cit">
</div>

For some reason the second div is empty.

EDIT the first blank line is the issue, not the last.

Witiko added a commit to Witiko/markdown that referenced this pull request Dec 7, 2022
Adapted from jgm#52.

Co-Authored-By: Omikhleia <[email protected]>
Witiko added a commit to Witiko/markdown that referenced this pull request Dec 7, 2022
Adapted from jgm#52.

Co-Authored-By: Omikhleia <[email protected]>
Witiko added a commit to Witiko/markdown that referenced this pull request Dec 7, 2022
Adapted from jgm#52.

Co-Authored-By: Omikhleia <[email protected]>
Witiko and others added 2 commits December 7, 2022 23:01
@Witiko Witiko merged commit 9501ccf into jgm:master Dec 14, 2022
@Witiko
Copy link
Collaborator

Witiko commented Dec 14, 2022

The current implementation seems quite robust. Hopefully, we have not overlooked anything major.

@Omikhleia Omikhleia deleted the another_attempt_nested_divs branch December 28, 2022 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants