Web.Pandoc: refactor reader selection #554

mmirate · 2017-07-10T23:59:09Z

With this change, input file formats are no longer restricted to the variants of a union on Hakyll's side (Web.Pandoc.FileType) which must be updated whenever Pandoc adds a new input format (and requires tightening the lower-bound on the dependency version). Instead, the getReader function from Pandoc is now used, in conjunction with a file-extension-synonym mapping similar to the one used by Pandoc's command-line application.

(Of course, someone should probably raise an issue on Pandoc to have them factor-out their filename-to-reader/writer translation into a public API that Hakyll could use; that will make things even simpler on Hakyll's side.) ~~Update: Pandoc has indeed made just such a change, after this PR was first written~~ https://github.com/jgm/pandoc/blob/2.7.3/src/Text/Pandoc/App/FormatHeuristics.hs is close, but not publicly accessible.

With the recent addition of readers of non-textual formats (namely epub, docx and odt), there was a split among Pandoc's readers+writers, between String and Lazy ByteString input; pandocCompiler handles this under-the-hood, and for other usecases there is a new function readPandocLBSWith, whose input must be from Compiler.getResourceLBS instead of Compiler.getResourceBody.

Since the only lossless way to read binary data is to read it immediately to a ByteString, it would have been ideal for readPandoc and readPandocWith to have accepted Item ByteString from the start; changing this now would be even more breaking than removing FileType.

Yes, this change breaks anyone who depended on FileType.

theNerd247 · 2019-08-19T16:17:36Z

Is there any intention on this being merged? I can contribute if these changes are getting stale.

jaspervdj · 2019-08-20T09:03:50Z

@theNerd247 Yes, I'm still interested in getting this merged. I'd like to keep Hakyll's FileType module there, even though it's no longer being used, with a deprecation warning on the module that it will be removed in a future release. I'm fine with killing the tests for it; but I would like to have at least one test for the ByteString-based pandoc reading.

mmirate · 2019-08-21T13:51:35Z

I'd like to keep Hakyll's FileType module there, even though it's no longer being used, with a deprecation warning on the module that it will be removed in a future release. I'm fine with killing the tests for it; but I would like to have at least one test for the ByteString-based pandoc reading.

This is big news to me. Thank you, regardless; better late than never.

Given that feedback, I am still willing to respond to it and to make this PR applicable to current versions of hakyll and pandoc (atm it is highly stale wrt both sides). Though, I won't have cycles available until the weekend.

Web.Pandoc: refactor reader selection

da4445a

mmirate force-pushed the mmirate-patch-1 branch from 12ad66e to da4445a Compare August 24, 2019 23:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web.Pandoc: refactor reader selection #554

Web.Pandoc: refactor reader selection #554

mmirate commented Jul 10, 2017 •

edited

Loading

theNerd247 commented Aug 19, 2019

jaspervdj commented Aug 20, 2019

mmirate commented Aug 21, 2019

Web.Pandoc: refactor reader selection #554

Are you sure you want to change the base?

Web.Pandoc: refactor reader selection #554

Conversation

mmirate commented Jul 10, 2017 • edited Loading

theNerd247 commented Aug 19, 2019

jaspervdj commented Aug 20, 2019

mmirate commented Aug 21, 2019

mmirate commented Jul 10, 2017 •

edited

Loading