Best practice for roundtrip of md -> docx -> md that helps differ collaborators' revisions ? #9780
Replies: 3 comments 5 replies
-
Using Markdown as our canonical source format at a publishing house, I've run into this problem a lot. It is so much easier to work with the folks who are willing to use a Markdown editor ... any Markdown editor. But for the cases when life just doesn't work out that way and can't be coerced, the only way I know of is to first round trip the Markdown back into Markdown, then take it back out to docx (or whatever). Then when you get it back you can at least compare the Markdown import with the already round-tripped Markdown. If you use the same formatting arguments like wrapping and header types for the Markdown→Markdown round trip as you do for the Docx→Markdown import you'll get something pretty similar. That will at least get you something sensible to diff for changes. Re-applying your preferred source formatting like using sentence-per-line is something I don't have a magic bullet for. I'm working in CaSILE to create diffing tools for prose that work across different source formatting, and also to re-apply formatting such as sentence-per-line, but while some parts of the system function quite well for production work, those two aspects are still pretty rough. |
Beta Was this translation helpful? Give feedback.
-
Thanks to the helpful suggestion of @alerque, the following steps can convert the edited docx to a sentence-per-line md, which seems to be sensible enough to diff for changes, especially if the original markdown or the not-yet-revised docx is also normalized once for reference. step 1: # --wrap=none from https://stackoverflow.com/questions/62967265/word-to-markdown-via-pandoc-prevent-line-breaks-in-paragraphs
./pandoc --wrap=none --extract-media ./ draft.docx -s -o draft.md step 2: # --wrap=preserve from https://tarleb.com/posts/semantic-line-breaks/
./pandoc draft.md -L break_lines.lua --wrap=preserve -s -o draft_lines.md
function Inlines (inlines)
starttime = os.date('%Y-%m-%d %H:%M:%S')
table.insert(inlines, 1, pandoc.Space())
table.insert(inlines, pandoc.Space())
-- Go from end to start to avoid problems with shifting indices.
for i = #inlines-1, 2, -1 do
if inlines[i] and inlines[i].t == 'Space' then
if inlines[i-1] and inlines[i-1].t == 'Str' and inlines[i-1].text:match("%.$") then
table.insert(inlines, i, pandoc.SoftBreak())
end
end
end
inlines:remove(1)
inlines:remove(#inlines)
endtime = os.date('%Y-%m-%d %H:%M:%S')
-- print(#inlines, starttime, endtime)
return inlines
end |
Beta Was this translation helpful? Give feedback.
-
I wonder if we should add an option |
Beta Was this translation helpful? Give feedback.
-
I am writing in markdown and send pandoc-generated docx to my collaborators. Collaborators revise in docx and send back to me. Ideally, I would like to figure out their revisions easily and precisely, and merge them in the original markdown and commit in git.
My problem with this roundtrip of md -> docx -> md is that it is difficult for me to figure out the revisions easily and precisely: (1) In the original markdown, I write one sentence per line. The docx -> md conversion "hard-wrap"s lines. (2) The docx -> md conversion "hard-code"s the numbers of the figure references and citations.
Could you help to suggest what is the best practice for roundtrip of md -> docx -> md that helps differ collaborators' revisions ? Many thanks !
PS: A similar stackoverflow post mentions the
--track-changes
flagBeta Was this translation helpful? Give feedback.
All reactions