Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Web.Pandoc: refactor reader selection #554

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mmirate
Copy link

@mmirate mmirate commented Jul 10, 2017

With this change, input file formats are no longer restricted to the variants of a union on Hakyll's side (Web.Pandoc.FileType) which must be updated whenever Pandoc adds a new input format (and requires tightening the lower-bound on the dependency version). Instead, the getReader function from Pandoc is now used, in conjunction with a file-extension-synonym mapping similar to the one used by Pandoc's command-line application.

(Of course, someone should probably raise an issue on Pandoc to have them factor-out their filename-to-reader/writer translation into a public API that Hakyll could use; that will make things even simpler on Hakyll's side.) Update: Pandoc has indeed made just such a change, after this PR was first written https://github.com/jgm/pandoc/blob/2.7.3/src/Text/Pandoc/App/FormatHeuristics.hs is close, but not publicly accessible.

With the recent addition of readers of non-textual formats (namely epub, docx and odt), there was a split among Pandoc's readers+writers, between String and Lazy ByteString input; pandocCompiler handles this under-the-hood, and for other usecases there is a new function readPandocLBSWith, whose input must be from Compiler.getResourceLBS instead of Compiler.getResourceBody.

Since the only lossless way to read binary data is to read it immediately to a ByteString, it would have been ideal for readPandoc and readPandocWith to have accepted Item ByteString from the start; changing this now would be even more breaking than removing FileType.

Yes, this change breaks anyone who depended on FileType.

@theNerd247
Copy link

Is there any intention on this being merged? I can contribute if these changes are getting stale.

@jaspervdj
Copy link
Owner

@theNerd247 Yes, I'm still interested in getting this merged. I'd like to keep Hakyll's FileType module there, even though it's no longer being used, with a deprecation warning on the module that it will be removed in a future release. I'm fine with killing the tests for it; but I would like to have at least one test for the ByteString-based pandoc reading.

@mmirate
Copy link
Author

mmirate commented Aug 21, 2019

I'd like to keep Hakyll's FileType module there, even though it's no longer being used, with a deprecation warning on the module that it will be removed in a future release. I'm fine with killing the tests for it; but I would like to have at least one test for the ByteString-based pandoc reading.

This is big news to me. Thank you, regardless; better late than never.

Given that feedback, I am still willing to respond to it and to make this PR applicable to current versions of hakyll and pandoc (atm it is highly stale wrt both sides). Though, I won't have cycles available until the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants