We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have the following input HTML file:
<html><body><div><a hr</div><div><div></div> <div><a href="/">bar</a></div></div></body></html>
Notice the unclosed <a tag (this is a minimal repro, in my case it's coming from an accidentally truncated DB value).
<a
If I open it in a browser (Firefox/Chrome) and print its DOM with document.getElementsByTagName("html")[0].outerHTML , I get:
document.getElementsByTagName("html")[0].outerHTML
<html><head></head><body> <div id="div0"> <a hr="" <="" div=""> </a><div id="div1"><a hr="" <="" div=""> <div id="div2"></div> </a><div id="div3"><a hr="" <="" div=""> </a><a href="/">bar</a> </div> </div> </body></html>
With scraper, if I parse it with Html::parse_document and print it with doc.root_element().html(), I get:
scraper
Html::parse_document
doc.root_element().html()
<html><head></head><body><div><a hr<="" div=""></a><div><a hr<="" div=""><div></div> </div> </div></body></html>
Notice that the anchor tag with text bar is missing!
bar
Running this input with html5ever's example sinks, I get an input close to browsers (but still not the same, see servo/html5ever#512).
html5ever
It seems to indicate that there's an issue with scraper's TreeSink implementation.
TreeSink
The text was updated successfully, but these errors were encountered:
No branches or pull requests
I have the following input HTML file:
Notice the unclosed
<a
tag (this is a minimal repro, in my case it's coming from an accidentally truncated DB value).If I open it in a browser (Firefox/Chrome) and print its DOM with
document.getElementsByTagName("html")[0].outerHTML
, I get:With
scraper
, if I parse it withHtml::parse_document
and print it withdoc.root_element().html()
, I get:Notice that the anchor tag with text
bar
is missing!Running this input with
html5ever
's example sinks, I get an input close to browsers (but still not the same, see servo/html5ever#512).It seems to indicate that there's an issue with scraper's
TreeSink
implementation.The text was updated successfully, but these errors were encountered: