Web.Pandoc: refactor reader selection #554
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
With this change, input file formats are no longer restricted to the variants of a union on Hakyll's side (Web.Pandoc.FileType) which must be updated whenever Pandoc adds a new input format (and requires tightening the lower-bound on the dependency version). Instead, the
getReader
function from Pandoc is now used, in conjunction with a file-extension-synonym mapping similar to the one used by Pandoc's command-line application.(Of course, someone should probably raise an issue on Pandoc to have them factor-out their filename-to-reader/writer translation into a public API that Hakyll could use; that will make things even simpler on Hakyll's side.)
Update: Pandoc has indeed made just such a change, after this PR was first writtenhttps://github.com/jgm/pandoc/blob/2.7.3/src/Text/Pandoc/App/FormatHeuristics.hs is close, but not publicly accessible.With the recent addition of readers of non-textual formats (namely epub, docx and odt), there was a split among Pandoc's readers+writers, between String and Lazy ByteString input;
pandocCompiler
handles this under-the-hood, and for other usecases there is a new functionreadPandocLBSWith
, whose input must be fromCompiler.getResourceLBS
instead ofCompiler.getResourceBody
.Since the only lossless way to read binary data is to read it immediately to a ByteString, it would have been ideal for
readPandoc
andreadPandocWith
to have acceptedItem ByteString
from the start; changing this now would be even more breaking than removing FileType.Yes, this change breaks anyone who depended on FileType.