Skip to content

😼 Dependency-free and lean DOM parser that outputs Markdown

License

Notifications You must be signed in to change notification settings

consected/domador

 
 

Repository files navigation

Domador

Build Status

Dependency-free and lean DOM parser that outputs Markdown

You can use it on the server-side as well, thanks to jsdom. The client-side version leverages the browser DOM. Originally based on html-md.

Install

You can get it on npm.

npm install domador --save

Or bower, too.

bower install domador --save

domador(input, options?)

Converts DOM tree (or HTML string) input into Markdown. domador takes the following options.

absolute

Convert relative links into absolute ones automatically.

href

The document's href, necessary for the absolute option to work properly outside of a browser environment.

inline

Links ([foo](/bar)) and image sources (![foo](/bar)) are inlined. By default, they are added as footnote references [foo][1]\n\n[1]: /bar.

fencing

The western art of combat with rapiers or rapier-like swords. It can also be set to true to use fences like instead of spaces when delimiting code blocks.

fencinglanguage

If fencing is enabled, fencinglanguage can be a function that will run on every <pre> element and returns the appropriate language in the fence. If the <pre> element contains a <code> element as its first child, fencinglanguage will be executed for that element as well in search of a match.

If nothing is returned, a language won't be assigned to the fence. The example below returns fence languages according to a md-lang-{language} class found on the pre element.

function fencinglanguage (el) {
  var match = el.className.match(/md-lang-((?:[^\s]|$)+)/);
  if (match) {
    return match.pop();
  }
}
allowFrame

When set to a function, allowFrame receives the src attribute for an <iframe> and domador expects a boolean in return. If the return value is true then the <iframe> will be added to the Markdown output.

domador(el, {
  allowFrame: function (src) {
    return src.indexOf('https://google.com/') === 0;
  }
});
tables

Domador understands well-formed HTML <table> structures and spits out GitHub flavored Markdown tables. This functionality is enabled by default but you can turn it off by setting tables to false.

transform

Allows you to take over the default transformation for any given DOM element. Ignore elements you don't want to override, and return Markdown for the ones you want to change. This method is executed on every single DOM element that's parsed by domador. The example below converts links that start with @ into mentions like @bevacqua instead of traditional Markdown links like [@bevacqua](/users/bevacqua). This is particularly useful to transform Markdown-generated HTML back into the original Markdown when your Markdown parser has special tokenizers or hooks.

domador(el, {
  transform: function (el) {
    if (el.tagName === 'A' && el.innerHTML[0] === '@') {
      return el.innerHTML;
    }
  }
});
markers

Advanced option. Setting markers to an array such as [[0, 'START'], [10, 'END']] will place each of those markers in the output, based on the input index you want to track. This feature is necessary because there is no other reliable way of tracking a text cursor position before and after a piece of HTML is converted to Markdown.

The following example shows how markers could be used to preserve a text selection across HTML-into-Markdown parsing, by providing markers for each cursor. When the output from domador comes back, all you need to do is find your markers, remove them, and place the text selection at their indices. The woofmark Markdown/HTML/WYSIWYG editor module leverages this functionality to do exactly that.

domador('<strong>foo</strong>', {
  markers: [[5, '[START]'], [10, '[END]']]
});
// <- '**[START]fo[END]o**'

Also note that, as shown in the example above, when a marker can't be placed in the output exactly where you asked for, it'll be cleanly placed nearby. In the above example, the [START] marker would've been placed "somewhere inside" the opening ** tag, but right after the opening tag finishes was preferred.

pad_ol_li and pad_ul_li

Specify the number of spaces to pad nested lists with (default: 2). This allows variants of Markdown with strict requirements (especially on ordered list indentation) to be accomodated.

pad_ol_li overrides the default for ordered lists and pad_ul_li overrides the default for unordered lists.

Tests

Read the unit tests for examples of expected output and their inputs. Run unit tests using the command below.

npm test

Disclaimer

Don't expect this to work for arbitrary HTML, it is intended to restore HTML compiled from a Markdown source back into Markdown.

License

MIT

About

😼 Dependency-free and lean DOM parser that outputs Markdown

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 100.0%