You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Notice the truncated <a tag on line 4 (caused by an HTML fragment accidentally truncated in the DB).
If I create a file with this content, load it in Firefox and print the resulting DOM with document.getElementsByTagName("html")[0].outerHTML , Firefox returns:
The well-formed link with bar completely disappeared!
EDIT: See next message, there are still some differences but the ones here seem to be caused by the TreeSink impl I used, not the parser.
This difference in interpretation between Firefox/Chrome and html5ever is causing me issues when processing these documents to recover them. I'm well aware that the input is broken, but I would expect html5ever to produce the same structure as real browsers.
EDIT: Even smaller repro, removing the newline fixes the mismatch.
I have an HTML file with markup that can be reduced to the following:
Notice the truncated
<a
tag on line 4 (caused by an HTML fragment accidentally truncated in the DB).If I create a file with this content, load it in Firefox and print the resulting DOM with
document.getElementsByTagName("html")[0].outerHTML
, Firefox returns:bar
is still present in the outputHowever, if I parse the input with html5ever and print back the result, I get:
bar
completely disappeared!EDIT: See next message, there are still some differences but the ones here seem to be caused by the
TreeSink
impl I used, not the parser.This difference in interpretation between Firefox/Chrome and html5ever is causing me issues when processing these documents to recover them. I'm well aware that the input is broken, but I would expect html5ever to produce the same structure as real browsers.
EDIT: Even smaller repro, removing the newline fixes the mismatch.
The text was updated successfully, but these errors were encountered: