Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss of style for titles and/or subtitles in .docx output document #10282

Open
sschaenz opened this issue Oct 10, 2024 · 27 comments
Open

Loss of style for titles and/or subtitles in .docx output document #10282

sschaenz opened this issue Oct 10, 2024 · 27 comments
Labels

Comments

@sschaenz
Copy link

I am using a customised ‘reference.docx’, which I created with the command ‘pandoc -o custom-reference.docx --print-default-data-file reference.docx’ and added individual styles for titles and subtitles. This file worked perfectly in Pandoc version 3.1.12 and formatted the title and subtitle as desired (e.g. frame with background colour and custom text colour).

However, after upgrading to Pandoc 3.5, the problem arises that titles and subtitles lose their formatting and the default formatting is used instead. The content is output correctly, but without the assigned formatting. This suggests that Pandoc may no longer recognise the custom styles or that the assignment of metadata to styles has been changed.

I am using a German version of Microsoft Word, so the style names in my ‘reference.docx’ file may correspond to the localised German names. Since I updated directly from Pandoc 3.1.12 to 3.5, I can't say exactly from which version the problem occurred. However, downgrading to 3.1.12 fixes the problem, so there seems to be a change in the newer Pandoc versions that affects the style assignment for metadata.

Regards
Stefan

@sschaenz sschaenz added the bug label Oct 10, 2024
@jgm
Copy link
Owner

jgm commented Oct 10, 2024

Have you checked the changelog to see what relevant changes were made between 3.1.12 and 3.5?
https://pandoc.org/releases

@sschaenz
Copy link
Author

Yes, I have read the release notes and found some possible clues, but they don’t help me resolve the issue as a user.

I have spent several hours researching and testing, but the documentation did not provide any insights to resolve my issue.

My understanding was that Pandoc’s reference file could be used to customize the layout of a Word document as long as only the Pandoc-supported styles were modified or utilized. This approach had always worked fine in the past. I also generated a new reference file using Pandoc 3.5 and ran targeted tests with it. However, when I modify the Title and Subtitle layouts (such as color or font), Word now uses the Standard layout style instead. This should not be happening.

I found the following relevant notes in the release history:

pandoc 3.2.1

“Clean up Abstract Title and Subtitle in default reference docx. Center Subtitle, remove color.”

  • This could be related to my problem, but it doesn’t help me find a solution.

pandoc 3.2

“Use current standard Word theme (#7280). This includes using the sans-serif font Aptos instead of the serif font Cambria, and default colors for headings. Remove duplicate DefaultParagraphFont in styles.xml.”

pandoc 3.1.12.2

Here’s one relevant note:
“Detect caption by style name not id (#9518). The styleId can change depending on the localization.”

I suspect that my issue with Title and Subtitle formatting might be related to a change in how style names are detected in Pandoc. In earlier versions, Pandoc always processed the reference.docx file based on English style names, regardless of the language settings in Microsoft Word.

It would be helpful if Pandoc could provide a way to standardize style name recognition independently of localization. This would prevent issues for users working with different language versions of Word.

@jgm
Copy link
Owner

jgm commented Oct 10, 2024

What lang are you using? Does specifying lang: en make the problem go away?

@sschaenz
Copy link
Author

lang: de
lang: en

I had also tried it, but there was no improvement.

Hm, my problem should actually be easy to recreate, right?

@jgm
Copy link
Owner

jgm commented Oct 10, 2024

Can you post files necessary to reproduce the issue? Your reference.docx and a sample markdown input, plus the command line you used?

@jgm
Copy link
Owner

jgm commented Oct 10, 2024

Note that in the 3.2 revisions, we added a "Title Char" style (standard for Word).
It may be that you need to adjust this style in your reference doc.

@jgm
Copy link
Owner

jgm commented Oct 10, 2024

There is also "Subtitle Char". These are character styles.

@sschaenz
Copy link
Author

Sorry for the delay, I've been very busy.

Here is an example that reproduces the problem for version 3.5 of Pandoc.

  1. display Pandoc version
$ pandoc --version                                                         
pandoc 3.5
  1. generate reference file from Pandoc (reference-3_5.docx)
$ pandoc -o reference-3_5.docx --print-default-data-file reference.docx
  1. open the reference file in Word and adjust the title and subtitle format (my-reference-3_5.docx)

Title: I have set the font colour to white, the spacing from the top to 80 pt and coloured a background frame in blue.

Subtitle: Font colour also white, spacing adjusted to 4 pt from the top and background colour in a lighter blue

  1. create Markdown example file (my-markdown.md
---
title: "Resource Template"
subtitle: "Subtitle Here"
date: "\today"
author: "My Name"
toc: true
toc-depth: 2
lang: en
customer_logo: false
client_name: "Client Name"
security_label: "Confidential"
...

# Introduction

Provide a brief introduction here.

# Section 1: Overview

Include a general overview of the resource.

## Subsection 1.1: Details

Details about the resource, including any relevant information, such as objectives, target audience, or specifications.

# Section 2: Implementation

Instructions or steps for implementing the resource.

## Subsection 2.1: Steps

1. Step one details
2. Step two details
3. Step three details

# Section 3: Additional Information

Include any additional relevant information, like references or contact details.

# Appendix

Include any additional resources or appendices here.
  1. Result (my-word-3_5.docx):

The title and subtitle are displayed in the ‘Normal’ style. Likewise ‘author’ and ‘date’. Date is not displayed because \today is a latex option and not for Word. If a date was entered, the date would appear. Format is also wrong here. There is also a page break in the template, which is also missing here.
my-markdown.md
my-reference-3_5.docx
my-word-3_5.docx
reference-3_5.docx

@sschaenz
Copy link
Author

Thank you for pointing out the updates in the 3.2 revisions regarding the "Title Char" and "Subtitle Char" styles. However, I’m having trouble with these points because I haven’t been able to find any relevant information in the documentation. I’m not sure how to access or adjust these character styles within the reference doc. Could you provide some guidance on how I can use these features?

@jgm
Copy link
Owner

jgm commented Oct 13, 2024

OK, this is very strange. Your reference doc has a Title style, but it doesn't get applied.
I will have to look into this.

@jgm
Copy link
Owner

jgm commented Oct 13, 2024

But when I try the same thing you did -- same method of creating a reference.docx -- it works fine.

@jgm
Copy link
Owner

jgm commented Oct 13, 2024

OK, the issue is this. Your reference.docx has w:styleId="para6", and this style has <w:name w:val="Title">. The styleId needs to be Title.

When you create your reference.docx, go to the Styles menu, find the already existing Title style, and modify this. I'm not sure what you did differently to create the style you had.

@sschaenz
Copy link
Author

I will test it tomorrow. I still have a computer with an older version of Pandoc. I haven't had the problem on this system so far. I will check if the versions of Word are the same, which they should be. My system and Word versions are German. As far as I can remember, the style sheets that were generated from Pandoc were always in English, which did not cause any problems. In Word they were also displayed with the names in English. Now it looks as if Pandoc or Word translates the name of the template (in my case into German). The fact that I use macOS may also play a role.

There are therefore several factors that can play a role:

  • Pandoc version
  • Microsoft Word Version
  • System and / or Word language
  • Operation System

and certainly also the user (in this case me). However, I have tried to rule out errors by creating a new test file including a reference file from scratch.

The style id can probably change depending on the language, which is probably why the names are used. If these are now translated (by whatever means) or adapted to the system language, this would provide an explanation.

@jgm
Copy link
Owner

jgm commented Oct 13, 2024

If I recall, we use styleId and not the display name, because that is the thing that is constant across differently localized versions of Word.

@sschaenz
Copy link
Author

pandoc 3.1.12.2

Here’s one relevant note:
“Detect caption by style name not id (#9518). The styleId can change depending on the localization.”

see above.

@jgm
Copy link
Owner

jgm commented Oct 13, 2024

OK, I got it reversed then. I knew it was one way or the other!

@sschaenz
Copy link
Author

Here are my tests and the results:

macOS 14.6.1 (Sonoma) Intel Core i7

Microsoft Word for Mac Version 16.89.1 (German)
Licence: Microsoft 365 subscription

pandoc --version
pandoc 3.1.12.2

pandoc -o reference-3_1_12.docx --print-default-data-file reference.docx

The format template Title is displayed in Word as ‘Title’ and Subtitle as ‘Subtitle’.

Open reference-3_1_12.docx with Word
There are no errors and no hints when opening. Everything is OK.
Customise style sheet and save as my-reference-3_1_12.docx

pandoc my-markdown.md -o my-word-3_1_12.docx --reference-doc=my-reference-3_1_12.docx

Result:

Word file is opened.

1st warning: ‘This document contains fields that may refer to other files. Do you want to update the fields in this document?’ (ok)
2nd warning that the table of contents needs to be updated. (ok)

As the fields and the table of contents could not yet be set to a current and valid value by Word, these instructions are understandable. However, you should always call up a Word file first and then save it again before sending it to other people, as they often thank you that the file is damaged!

The formatting is correct, everything is as it should be.

System on which the problem occurs:

macOS 15.0.1 (Sequoia) M3 Pro (ARM)
(Note: I must not have been paying attention, as I would rather not switch to Sequoia for a few months. A new major version of Apple can have all sorts of side effects. You should therefore wait at least 3 months until you upgrade, and the first updates are available).

Microsoft Word for Mac Version 16.89.1 (German)
Licence: Microsoft 365 subscription

Open reference-3_5.docx with Word
There are no errors or messages when opening. Everything is OK.
Customize style sheet and save as my-reference-3_5.docx
I have created a version here with compatibility mode activated to rule out any problems.

pandoc my-markdown.md -o my-word-3_5.docx --reference-doc=my-reference-3_5.docx
pandoc my-markdown.md -o my-word-3_5-cmp.docx --reference-doc=my-reference-3_5-cmp.docx

Result

Word file is opened with the references as before. However, the formatting is not correct. Title and subtitle are displayed correctly, but the formatting is incorrect and set to default for title, subtitle, and date. The author is correct (and the style sheet has the English name). For ‘Table of Contents’ and the chapter and section headings, the identifiers of the template are displayed in German, but the formatting is correct.

I actually suspect the issue is with Pandoc. For once, I would like to exclude Word as the source of the issue. Theoretically, the difference in macOS versions could still be a cause. If this is the case, then there is hardly anything you can do, and you are at the mercy of the folks at Apple. The different architecture of the CPU should not play a role.

I have now installed Pandoc 3.1.12 on the system and repeated the tests. There are no problems. The formatting appears to be correct. So it is probably not due to the different macOS versions.

One last test:

Since I had installed Pandoc 3.5. with brew, I did another installation with the version from the site https://github.com/jgm/pandoc/releases/tag/3.5. I almost expected that this would solve the problem. Unfortunately not. Something has been changed somewhere in Pandoc that is causing the issues. That's the end of my ideas. But I'll save myself the trouble of trying to find out from which version the problem occurs.

@jgm
Copy link
Owner

jgm commented Oct 15, 2024

I can't see how the issue would be the OS version.

However, I use pandoc 3.5 on ARM macOS (previous version), and I don't have any difficulties customizing the style. There are two factors that may be different in our cases:

  • your German-localized version of Word (I have English-localized v16.83)
  • how you modify the styles (maybe you are doing something different than I am to change the styles -- I simply go Format -> Styles and select the Title style, then modify it there and save.

One thing that is clearly different is the styleId of the style named Title in your reference docx. This may be relevant, though as you note, we claim to be looking up styles by name.

When I have a chance I can look into this further.

@sschaenz
Copy link
Author

I don't think it's related to the OS version either, I just wanted to mention all the possibilities that came to my mind. The problem will be the name of the style sheets. These are translated by Word into German, for example. If I make a change, the style is saved and the translated name is used. Then pandoc can no longer function properly. This is normally why you use IDs and not names. I don't know if Microsoft sees it differently. Anyway, I will try the following in the next few days: I will create a template with the current version and then use a text editor to search and replace the designations with the English designations. This should be a work around. Since templates are not constantly customized, that should be okay with me. I'll let you know when I've tried it. For a quick test, you can also rename Title in one of your reference files with the German translation In Titel. If the formatting is then lost and the template is set to Standard, then that is the problem.

@jgm
Copy link
Owner

jgm commented Oct 16, 2024

Note that in my-reference-docx-3_5.docx, styles.xml has

<w:style w:type="paragraph" w:styleId="para4">
<w:name w:val="Title"/>
<w:qFormat/>
<w:basedOn w:val="para0"/>
<w:next w:val="para1"/>
<w:pPr>
<w:spacing w:before="1600" w:after="80"/>
<w:contextualSpacing/>
<w:jc w:val="center"/>
<w:pBdr>
<w:top w:val="nil" w:sz="0" w:space="3" w:color="000000" tmln="20, 20, 20, 0, 60"/>
<w:left w:val="nil" w:sz="0" w:space="3" w:color="000000" tmln="20, 20, 20, 0, 60"/>
<w:bottom w:val="nil" w:sz="0" w:space="3" w:color="000000" tmln="20, 20, 20, 0, 60"/>
<w:right w:val="nil" w:sz="0" w:space="3" w:color="000000" tmln="20, 20, 20, 0, 60"/>
<w:between w:val="nil" w:sz="0" w:space="0" w:color="000000" tmln="20, 20, 20, 0, 0"/>
</w:pBdr>
<w:shd w:val="solid" w:color="365F91" tmshd="1677721856, 16777215, 9527094"/>
</w:pPr>
<w:rPr>
<w:rFonts w:ascii="Aptos Display" w:hAnsi="Aptos Display" w:eastAsia="Aptos Display" w:cs="Aptos Display"/>
<w:color w:val="ffffff"/>
<w:spacing w:val="-10" w:percent="96"/>
<w:kern w:val="1"/>
<w:sz w:val="56"/>
<w:szCs w:val="56"/>
<w:lang w:bidi="en-us"/>
</w:rPr>
</w:style>

and the name specified here is "Title", not "Titel". So any localization of that name must be happening somewhere outside the stylesheet. Since according to the commit comment you mentioned above, we are looking up styles by name and not styleId, we should be finding this style.

@jgm
Copy link
Owner

jgm commented Oct 16, 2024

PS. I tried manually changing the styleId from para4 to Title, and then it worked.

@jgm
Copy link
Owner

jgm commented Oct 16, 2024

It looks like the linked commit may have been focused on just the table caption; maybe we didn't make a general change to looking up styles by name intead of styleId.

@jgm
Copy link
Owner

jgm commented Oct 16, 2024

It's quite counterintuitive that Word works this way -- it's the name, not the styleId, that stays constant across localized versions -- but such is MS.

@sschaenz
Copy link
Author

I have narrowed down the problem to a specific Pandoc version. When using Pandoc 3.2, titles and subtitles in the Word file are still displayed correctly according to my customised template. The changes made are properly applied.

However, a deviation occurs as of version 3.2.1: titles and subtitles only appear according to Word's default settings, regardless of the Pandoc or custom templates. My adjustments to the style sheet are ignored, but the content remains correct. Titles and subtitles are displayed with the correct content, but without the intended formatting.

I tested various Pandoc versions to investigate. The problem first appeared in version 3.2.1, while in version 3.2 the templates worked as expected. The tests were carried out with the binary versions of Pandoc, which I downloaded directly from the GitHub page (https://github.com/jgm/pandoc/releases).

I hope this helps to narrow down the error and find a solution.

I will stick with the older version for the time being until the problem is fixed.

@jgm
Copy link
Owner

jgm commented Oct 24, 2024

IN the 3.2.1 changelog for docx writer we have two items that might be relevant:

  • Allow OpenXML templates to be used with docx (#8338, #9069, #7256, #2928). commit db559e1

  • Clean up Abstract Title and Subtitle in default reference docx. Center Subtitle, remove color. commit c26211b
    (This just makes Subtitle depend on Title rather than Normal, so I don't think it's the issue.)

@jgm
Copy link
Owner

jgm commented Oct 24, 2024

The OpenXML template contains:

+$if(title)$
+    <w:p>
+      <w:pPr>
+        <w:pStyle w:val="Title" />
+      </w:pPr>
+      $title$
+    </w:p>
+$endif$
+$if(subtitle)$
+    <w:p>
+      <w:pPr>
+        <w:pStyle w:val="Subtitle" />
+      </w:pPr>
+      $subtitle$
+    </w:p>
+$endif$

@jgm
Copy link
Owner

jgm commented Oct 24, 2024

[EDITED] We produce a docx with

<w:pStyle w:val="Title" />

and the way Word deals with this is to look up the style with styleId = "Title". The default pandoc reference.docx has such a style. When you edit it with your localized Word, change the Title style, and save it again, you get (my-reference-docx_3.5):

<w:style w:type="paragraph" w:styleId="Title">
<w:name w:val="Title"/>
<w:qFormat/>
<w:basedOn w:val="para0"/>
<w:next w:val="para1"/>
<w:pPr>
<w:spacing w:before="1600" w:after="80"/>
<w:contextualSpacing/>
<w:jc w:val="center"/>
<w:pBdr>
<w:top w:val="nil" w:sz="0" w:space="3" w:color="000000" tmln="20, 20, 20, 0, 60"/>
etc.

So far so good, although it's odd that Normal seems to have changed to para0.

Then, when you use this reference docx to create a new docx, (my-word_3.5.docx), styles.xml contains:

<w:style w:styleId="para4" w:type="paragraph">
<w:name w:val="Title" />
<w:qFormat />
<w:basedOn w:val="para0" />
<w:next w:val="para1" />
<w:pPr>
<w:spacing w:after="80" w:before="1600" />
<w:contextualSpacing />
<w:jc w:val="center" />
<w:pBdr>
<w:top tmln="20, 20, 20, 0, 60" w:color="000000" w:space="3" w:sz="0" w:val="nil" />
etc.

I just don't get this. When I use pandoc to do the same thing you described, using your own my-reference_3.5.docx, I don't get this result. And although your Word may be localized, your pandoc is not. So that is not the issue.

I ought to be able to use pandoc on the same inputs and get the same result as you. This has nothing to do with Word. So, I'm wondering whether we can repeat the entire process carefully.

Take your file linked above, my-reference_3.5.docx, and do this exact command:

echo "% Title" | pandoc --reference-docx my-reference_3.5.docx -o output.docx
^D

And then upload output.docx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants