Skip to content

Commit

Permalink
In Markdown table cells, apply HTML escaping only to code blocks, and…
Browse files Browse the repository at this point in the history
… apply it properly (#167)

In Markdown table cells, apply HTML escaping only to code blocks, and apply it properly

Since #161 removed HTML escaping for defaults and function docstrings, we should do the same for attribute and param docs in table cells.

The only limitations Markdown places on table cells are:
* no pipe characters (they must be escaped with a backslash)
* no newlines (they must be transformed into `<br>` or an HTML entity)

The latter restriction makes it impossible to have a fenced code block inside a table cell.

Therefore:
* we do not escape HTML or Markdown markup outside a fenced code block
* we keep existing logic for escaping newlines outside a fenced code block
* we fix fence detection (e.g. allowing more than 3 fence characters to support embedded code blocks in code blocks, allowing tildes as fence characters, properly handling language names, etc.);
* in code block content, we escape HTML, and we escape newlines as HTML entities (since `<br>` does not work in a `<pre><code>` block) - finally fixing code block newlines in table cells.
    
This is a followup to #161.

Partially addresses #118
  • Loading branch information
tetromino authored Aug 1, 2023
1 parent b124535 commit 4736754
Show file tree
Hide file tree
Showing 12 changed files with 193 additions and 70 deletions.
10 changes: 5 additions & 5 deletions docs/stardoc_rule.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,17 +25,17 @@ Generates documentation for exported starlark rule definitions in a target starl
| <a id="stardoc-deps"></a>deps | A list of bzl_library dependencies which the input depends on. | `[]` |
| <a id="stardoc-format"></a>format | The format of the output file. Valid values: 'markdown' or 'proto'. | `"markdown"` |
| <a id="stardoc-symbol_names"></a>symbol_names | A list of symbol names to generate documentation for. These should correspond to the names of rule definitions in the input file. If this list is empty, then documentation for all exported rule definitions will be generated. | `[]` |
| <a id="stardoc-semantic_flags"></a>semantic_flags | A list of canonical flags to affect Starlark semantics for the Starlark interpreter during documentation generation. This should only be used to maintain compatibility with non-default semantic flags required to use the given Starlark symbols.<br><br>For example, if <code>//foo:bar.bzl</code> does not build except when a user would specify <code>--incompatible_foo_semantic=false</code>, then this attribute should contain "--incompatible_foo_semantic=false". | `[]` |
| <a id="stardoc-stardoc"></a>stardoc | The location of the legacy Stardoc extractor. Ignored when using the native <code>starlark_doc_extract</code> rule. | `Label("//stardoc:prebuilt_stardoc_binary")` |
| <a id="stardoc-semantic_flags"></a>semantic_flags | A list of canonical flags to affect Starlark semantics for the Starlark interpreter during documentation generation. This should only be used to maintain compatibility with non-default semantic flags required to use the given Starlark symbols.<br><br>For example, if `//foo:bar.bzl` does not build except when a user would specify `--incompatible_foo_semantic=false`, then this attribute should contain "--incompatible_foo_semantic=false". | `[]` |
| <a id="stardoc-stardoc"></a>stardoc | The location of the legacy Stardoc extractor. Ignored when using the native `starlark_doc_extract` rule. | `Label("//stardoc:prebuilt_stardoc_binary")` |
| <a id="stardoc-renderer"></a>renderer | The location of the renderer tool. | `Label("//stardoc:renderer")` |
| <a id="stardoc-aspect_template"></a>aspect_template | The input file template for generating documentation of aspects | `Label("//stardoc:templates/markdown_tables/aspect.vm")` |
| <a id="stardoc-func_template"></a>func_template | The input file template for generating documentation of functions. | `Label("//stardoc:templates/markdown_tables/func.vm")` |
| <a id="stardoc-header_template"></a>header_template | The input file template for the header of the output documentation. | `Label("//stardoc:templates/markdown_tables/header.vm")` |
| <a id="stardoc-provider_template"></a>provider_template | The input file template for generating documentation of providers. | `Label("//stardoc:templates/markdown_tables/provider.vm")` |
| <a id="stardoc-rule_template"></a>rule_template | The input file template for generating documentation of rules. | `Label("//stardoc:templates/markdown_tables/rule.vm")` |
| <a id="stardoc-repository_rule_template"></a>repository_rule_template | The input file template for generating documentation of repository rules. This template is used only when using the native <code>starlark_doc_extract</code> rule. | `Label("//stardoc:templates/markdown_tables/repository_rule.vm")` |
| <a id="stardoc-module_extension_template"></a>module_extension_template | The input file template for generating documentation of module extensions. This template is used only when using the native <code>starlark_doc_extract</code> rule. | `Label("//stardoc:templates/markdown_tables/module_extension.vm")` |
| <a id="stardoc-use_starlark_doc_extract"></a>use_starlark_doc_extract | Use the native <code>starlark_doc_extract</code> rule if available. | `True` |
| <a id="stardoc-repository_rule_template"></a>repository_rule_template | The input file template for generating documentation of repository rules. This template is used only when using the native `starlark_doc_extract` rule. | `Label("//stardoc:templates/markdown_tables/repository_rule.vm")` |
| <a id="stardoc-module_extension_template"></a>module_extension_template | The input file template for generating documentation of module extensions. This template is used only when using the native `starlark_doc_extract` rule. | `Label("//stardoc:templates/markdown_tables/module_extension.vm")` |
| <a id="stardoc-use_starlark_doc_extract"></a>use_starlark_doc_extract | Use the native `starlark_doc_extract` rule if available. | `True` |
| <a id="stardoc-kwargs"></a>kwargs | Further arguments to pass to stardoc. | none |


Original file line number Diff line number Diff line change
Expand Up @@ -24,15 +24,16 @@
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.AttributeInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.AttributeType;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.FunctionParamInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.ModuleExtensionInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.ModuleExtensionTagClassInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.ProviderInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.ProviderNameGroup;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.RuleInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.RepositoryRuleInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.ModuleExtensionInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.ModuleExtensionTagClassInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.RuleInfo;
import com.google.devtools.build.skydoc.rendering.proto.StardocOutputProtos.StarlarkFunctionInfo;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/** Contains a number of utility methods for markdown rendering. */
Expand All @@ -46,55 +47,135 @@ public MarkdownUtil(String extensionBzlFile) {
}

/**
* Return a string that formats the input string so it is displayable in a markdown table cell.
* This performs the following operations:
* Formats the input string so that it is displayable in a Markdown table cell. This performs the
* following operations:
*
* <ul>
* <li>Trims the string of leading/trailing whitespace.
* <li>Transforms the string using {@link #htmlEscape}.
* <li>Transforms multline code (```) tags into preformatted code HTML tags.
* <li>Transforms single-tick code (`) tags into code HTML tags.
* <li>Transforms 'new paraphgraph' patterns (two or more sequential newline characters) into
* line break HTML tags.
* <li>Turns lingering new line tags into spaces (as they generally indicate intended line wrap.
* <li>Escapes pipe characters ({@code |}) as {@code \|}.
* <li>Transforms Markdown code blocks ({@code ```}) into HTML preformatted code blocks, and
* transforms newlines within those code blocks into character entities
* <li>Transforms remaining 'new paragraph' patterns (two or more sequential newline characters)
* into line break HTML tags.
* <li>Turns remaining newlines into spaces (as they generally indicate intended line wrap).
* </ul>
*
* TODO(https://github.com/bazelbuild/stardoc/issues/118): also format Markdown lists as HTML.
*/
public String markdownCellFormat(String docString) {
String resultString = htmlEscape(docString.trim());
public static String markdownCellFormat(String docString) {
return new MarkdownCellFormatter(docString).format();
}

resultString = replaceWithTag(resultString, "```", "<pre><code>", "</code></pre>");
resultString = replaceWithTag(resultString, "`", "<code>", "</code>");
// See https://github.github.com/gfm
private static final class MarkdownCellFormatter {
// Lines of the input docstring, without newline terminators.
private final ImmutableList<String> lines;
// Index of the current line in lines, 0-based.
int currentLine;
// Formatted result.
StringBuilder result;

return resultString.replaceAll("\n(\\s*\n)+", "<br><br>").replace('\n', ' ');
}
private static final Pattern CODE_BLOCK_OPENING_FENCE =
Pattern.compile("^ {0,3}(?<fence>```+|~~~+) *(?<lang>\\w*)[^`~]*$");

private static String replaceWithTag(
String wholeString, String stringToReplace, String openTag, String closeTag) {
String remainingString = wholeString;
StringBuilder resultString = new StringBuilder();
MarkdownCellFormatter(String docString) {
lines = docString.trim().replace("|", "\\|").lines().collect(toImmutableList());
currentLine = 0;
result = new StringBuilder();
}

boolean openTagNext = true;
int index = remainingString.indexOf(stringToReplace);
while (index > -1) {
resultString.append(remainingString, 0, index);
resultString.append(openTagNext ? openTag : closeTag);
openTagNext = !openTagNext;
remainingString = remainingString.substring(index + stringToReplace.length());
index = remainingString.indexOf(stringToReplace);
/** Consumes the input and yields the formatted result. */
String format() {
boolean prefixContentWithSpace = false;
for (; currentLine < lines.size(); currentLine++) {
if (formatParagraphBreak()) {
prefixContentWithSpace = false;
continue;
}
if (prefixContentWithSpace) {
result.append(" ");
}
prefixContentWithSpace = true;
if (formatFencedCodeBlock()) {
continue;
}
result.append(lines.get(currentLine));
}
return result.toString();
}

/**
* If a fenced code block begins at {@link #currentLine}, render to {@link #result}, update
* {@link #currentLine} to point to the closing fence, and return true.
*/
private boolean formatFencedCodeBlock() {
// See https://github.github.com/gfm/#fenced-code-blocks
Matcher opening = CODE_BLOCK_OPENING_FENCE.matcher(lines.get(currentLine));
if (!opening.matches()) {
return false;
}
Pattern closingFence = Pattern.compile("^ {0,3}" + opening.group("fence") + " *$");
for (int closingLine = currentLine + 1; closingLine < lines.size(); closingLine++) {
if (closingFence.matcher(lines.get(closingLine)).matches()) {
// We found the closing fence: format the block's contents as HTML.
String language = opening.group("lang");
if (language != null && !language.isEmpty()) {
result.append("<pre><code class=\"language-").append(language).append("\">");
} else {
result.append("<pre><code>");
}
int firstContentLine = currentLine + 1;
for (int i = firstContentLine; i < closingLine; i++) {
if (i > firstContentLine) {
result.append(newlineEscape("\n"));
}
result.append(htmlEscape(lines.get(i)));
}
result.append("</code></pre>");
currentLine = closingLine;
return true;
}
}
// We did not find the closing fence.
return false;
}

/**
* If blank lines appear at {@link #currentLine}, render to {@link #result}, update {@link
* #currentLine} to point to the last line of the break, and return true.
*/
private boolean formatParagraphBreak() {
int numEmptyLines = 0;
for (int i = currentLine; i < lines.size(); i++) {
if (lines.get(i).isEmpty()) {
numEmptyLines++;
} else {
break;
}
}
if (numEmptyLines > 0) {
result.append("<br><br>");
currentLine += numEmptyLines - 1;
return true;
}
return false;
}
resultString.append(remainingString);
return resultString.toString();
}

/**
* Return a string that escapes angle brackets for HTML.
*
* <p>For example: 'Information with <brackets>.' becomes 'Information with &lt;brackets&gt;'.
*/
public String htmlEscape(String docString) {
public static String htmlEscape(String docString) {
return docString.replace("<", "&lt;").replace(">", "&gt;");
}

/** Returns a string that escapes newlines with HTML entities. */
private static String newlineEscape(String docString) {
return docString.replace("\n", "&#10;");
}

private static final Pattern CONSECUTIVE_BACKTICKS = Pattern.compile("`+");

/**
Expand Down Expand Up @@ -164,23 +245,25 @@ public String aspectSummary(String aspectName, AspectInfo aspectInfo) {
}

/**
* Return a string representing the repository rule summary for the given repository rule with the given name.
* Return a string representing the repository rule summary for the given repository rule with the
* given name.
*
* <p>For example: 'my_repo_rule(foo, bar)'. The summary will contain hyperlinks for each attribute.
* <p>For example: 'my_repo_rule(foo, bar)'. The summary will contain hyperlinks for each
* attribute.
*/
@SuppressWarnings("unused") // Used by markdown template.
public String repositoryRuleSummary(String ruleName, RepositoryRuleInfo ruleInfo) {
ImmutableList<String> attributeNames =
ruleInfo.getAttributeList().stream()
.map(AttributeInfo::getName)
.collect(toImmutableList());
ruleInfo.getAttributeList().stream().map(AttributeInfo::getName).collect(toImmutableList());
return summary(ruleName, attributeNames);
}

/**
* Return a string representing the module extension summary for the given module extension with the given name.
* Return a string representing the module extension summary for the given module extension with
* the given name.
*
* <p>For example:
*
* <pre>
* my_ext = use_extension("//some:file.bzl", "my_ext")
* my_ext.tag1(foo, bar)
Expand All @@ -192,13 +275,19 @@ public String repositoryRuleSummary(String ruleName, RepositoryRuleInfo ruleInfo
@SuppressWarnings("unused") // Used by markdown template.
public String moduleExtensionSummary(String extensionName, ModuleExtensionInfo extensionInfo) {
StringBuilder summaryBuilder = new StringBuilder();
summaryBuilder.append(String.format("%s = use_extension(\"%s\", \"%s\")", extensionName, extensionBzlFile, extensionName));
summaryBuilder.append(
String.format(
"%s = use_extension(\"%s\", \"%s\")", extensionName, extensionBzlFile, extensionName));
for (ModuleExtensionTagClassInfo tagClass : extensionInfo.getTagClassList()) {
ImmutableList<String> attributeNames =
tagClass.getAttributeList().stream()
.map(AttributeInfo::getName)
.collect(toImmutableList());
summaryBuilder.append("\n").append(summary(String.format("%s.%s", extensionName, tagClass.getTagName()), attributeNames));
tagClass.getAttributeList().stream()
.map(AttributeInfo::getName)
.collect(toImmutableList());
summaryBuilder
.append("\n")
.append(
summary(
String.format("%s.%s", extensionName, tagClass.getTagName()), attributeNames));
}
return summaryBuilder.toString();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,4 +47,38 @@ public void markdownCodeSpan_backticksPadding() {
assertThat(MarkdownUtil.markdownCodeSpan("foo`")).isEqualTo("`` foo` ``");
assertThat(MarkdownUtil.markdownCodeSpan("foo``")).isEqualTo("``` foo`` ```");
}

@Test
public void markdownCellFormat_pipes() {
assertThat(MarkdownUtil.markdownCellFormat("foo|bar")).isEqualTo("foo\\|bar");
assertThat(MarkdownUtil.markdownCellFormat("|\\|foobar||")).isEqualTo("\\|\\\\|foobar\\|\\|");
}

@Test
public void markdownCellFormat_newlines() {
assertThat(MarkdownUtil.markdownCellFormat("\nfoo\nbar\n\nbaz\r\n\r\n\r\nqux\r\n"))
.isEqualTo("foo bar<br><br>baz<br><br>qux");
// Newline escapes are not expanded
assertThat(MarkdownUtil.markdownCellFormat("hello\\r\\nworld")).isEqualTo("hello\\r\\nworld");
}

@Test
public void markdownCellFormat_codeBlocks() {
assertThat(MarkdownUtil.markdownCellFormat("```\nhello();\n```"))
.isEqualTo("<pre><code>hello();</code></pre>");
assertThat(MarkdownUtil.markdownCellFormat("```\nhello();\n```\nor\n~~~\nbye();\n~~~"))
.isEqualTo("<pre><code>hello();</code></pre> or <pre><code>bye();</code></pre>");
assertThat(MarkdownUtil.markdownCellFormat("```bash\ncat foo.txt | cmd > /dev/null\n```"))
.isEqualTo(
"<pre><code class=\"language-bash\">cat foo.txt \\| cmd &gt; /dev/null</code></pre>");
assertThat(MarkdownUtil.markdownCellFormat("````\n```\n```\n````"))
.isEqualTo("<pre><code>```&#10;```</code></pre>");
}

@Test
public void markdownCellFormat_inlineMarkup() {
assertThat(MarkdownUtil.markdownCellFormat("<b>bold</b> <i>italic</i>"))
.isEqualTo("<b>bold</b> <i>italic</i>");
assertThat(MarkdownUtil.markdownCellFormat("**bold** _italic_")).isEqualTo("**bold** _italic_");
}
}
2 changes: 1 addition & 1 deletion test/bzlmod/docs.md.golden
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,6 @@ Emits the constraints of the host platform to a file.

| Name | Description | Default Value |
| :------------- | :------------- | :------------- |
| <a id="write_host_constraints-name"></a>name | The name of the target. The output file will be named <code>&lt;name&gt;.txt</code>. | none |
| <a id="write_host_constraints-name"></a>name | The name of the target. The output file will be named `<name>.txt`. | none |


Loading

0 comments on commit 4736754

Please sign in to comment.