diff --git a/_posts/2024-10-05-max-data-uri-size.adoc b/_posts/2024-10-05-max-data-uri-size.adoc index 368a53d5..b34cdf08 100644 --- a/_posts/2024-10-05-max-data-uri-size.adoc +++ b/_posts/2024-10-05-max-data-uri-size.adoc @@ -11,7 +11,7 @@ authors: - https://github.com/opoudjis excerpt: >- - This post describes the how Metanorma leverages Data URIs for media files and + This post describes how Metanorma leverages Data URIs for media files and document attachments to create a single, unified XML document for seamless distribution, and when it is necessary to disable Data URI encoding in cases. --- diff --git a/_posts/2024-10-26-i18n-cjk.adoc b/_posts/2024-10-26-i18n-cjk.adoc new file mode 100644 index 00000000..80ea9e9d --- /dev/null +++ b/_posts/2024-10-26-i18n-cjk.adoc @@ -0,0 +1,212 @@ +--- +layout: post +title: "Support for Japanese internationalisation" +date: 2024-10-26 +categories: documentation + +authors: + - name: Nick Nicholas + email: nick.nicholas@ribose.com + social_links: + - https://github.com/opoudjis + +excerpt: >- + This post describes how Metanorma internationalises content, specifically + for Japanese, in light of Metanorma supporting JIS and Plateau as new flavours. +--- + +== Introduction + +Metanorma supports a number of flavours of standardisation documents, many of +which use languages other than English. As a result, internationalisation of content +is a core concern of Metanorma -- particularly with automatically generated content, +such as captions, crossreferences, and autonumbering. + +Scripts other than Latin pose their own challenges in internationalisation, including +RTL (right-to-left) scripts like Arabic and Hebrew, and CJK (Chinese, Japanese, Korean), +ideographic scripts. Our +recent move to support documents in Japanese has led to a good deal of effort in CJK scripts +specifically. We have already written here on the work we have done with +link:/blog/2023-12-19/ruby-in-metanorma/[Ruby annotation]. We summarise here some of our recent work. + +== JIS and Plateau + +Metanorma has done some work in support of Guóbiāo (Chinese national) standards in the past, +and it supports Chinese as one of the six working languages of the ITU, alongside Arabic, +English, Spanish, French, and Russian. However, the most extensive work we have done on +internationalisation has been with Japanese, promoted by our expansion of Metanorma to two +flavours primarily using Japanese: + +* link:/author/jis/[`metanorma-jis`] supports Japanese Industrial Standards (JIS), published +by the Japanese Standards Association (JSA). The JSA as a national standards body coordinates +with ISO and IEC, and its format is closely aligned to ISO. Its documents are published in both +Japanese and English. + +* link:https://github.com/metanorma/metanorma-plateau[`metanorma-plateau`] supports the +https://www.mlit.go.jp/plateau/[Plateau] project of the Japanese Ministry of Land, Infrastructure, Transport and Tourism. +The flavour is implemented to derive from `metanorma-jis`, but overrides its formatting in several +instances. + +== Vertical printing + +The default for Japanese standardisation documents follows the Western convention of writing text +left-to-right, top-down; this is particularly preferred as standardisation documents typically +include mathematical formulas, and Western-language text. However, the +https://en.wikipedia.org/wiki/Horizontal_and_vertical_writing_in_East_Asian_scripts:[traditional Japanese practice of writing Japanese top-to-bottom, right-to-left] +remains common, particularly in legal text. Metanorma is currently working on implementing vertical +writing in CJK in the PDF format of JIS docuemnts, as a rendering option. + +== Japanese calender + +Japan uses the Western Gregorian calendar alongside the traditional https://en.wikipedia.org/wiki/Japanese_calendar:[Japanese calendar], +which uses regnal years for the year rather than Anno Domini dating. In official contexts, the Japanese calendar +is used: that includes indication of when documents were created and published. + +Metanorma uses ISO 8601, which is founded on the Gregorian calendar, to enter its dates as metadata; so the date +a document was published will be indicated as something like `created-date: 2020-10-11`. When the date +is displayed in the document frontispiece, it is rendered in the Japanese calendar, as 令和二年10月11日 +[Year 2 of the Reiwa era -- the reign of emperor Naruhito; month 10 day 11]. + +== Japanese numbering + +As with vertical printing, Japanese standardisation documents typically use Arabic numerals +for automated numbering (clause numbers, ordered list numbers) and in metadata (edition numbers, +dates of publication). However more conservatively formatted documents such as legal documents, +that tend to use vertical writing, also tend to use Japanese numbering (properly speaking, Chinese +numbering) in those contexts. Metanorma has recently added functionality in JIS and Plateau +to use Japanese numbering instead of Arabic numbering in those contexts. So by default, a Japanese +document equivalent to + +____ +Published: 2020-10-11 + +*1. Introduction. + +*1.1. Scope.* + +The following topics are in scope of this document: + +1. Japanese numbers. +2. Arabic numbers. +3. Conversion between Japanese and Arabic numbers. +____ + +would be: + +____ + +公開日: 令和二年10月11日 + +*1 はじめに。* + +*1.1 範囲。* + +このドキュメントの範囲は以下のトピックです: + +1. 日本語の数字。 +2. アラビア数字。 +3. 日本語とアラビア数字の変換。 +____ + +If Japanese numbering is set: + +[source,asciidoc] +---- +:presentation-metadata-autonumbering-style: japanese +---- + +the document will instead look like: + +____ + +公開日: 令和二年十月十一日 + +*一 はじめに。* + +*一・一 範囲。* + +このドキュメントの範囲は以下のトピックです: + +一. 日本語の数字。 +二. アラビア数字。 +三. 日本語とアラビア数字の変換。 +____ + +NOTE: the dot between numbers in clause numbers is a middle dot in Japanese numbering. + +== Full-width punctuation + +Punctiation in CJK scripts is different from that of Roman script, even where CJK has adopted +Western punctuation. In order to fit in with the ideographic characters of Chinese, Japanese and +Korean, the punctuation of CJK needs to be of the same width as an ideographic character +("full-width punctuation"). Automatically populated text in Metanorma will be automatically +populated by default with Roman punctuation; any such punctuation needs to be adjusted to +be full-width. The context where this is most apparent is in bibliographic references, which +are populated by template out of a bibliographic database; but this also applies to cross-references +and captions. + +For instance, a list item cross-reference will by default end in a closing parenthesis: _1)_. +If Japanese numbering is being used, that needs to be rendered not as _一)_, but as _一)_, +with a full-width parenthesis. + +Often in technical documents, Roman text and Arabic numberals are interspersed with CJK text. +In such cases, full-width punctuation should not be applied when it adjoins Roman text, but +only with CJK text. So in the example above, if Arabic numbering is used for lists, _1)_ should +be left alone, and not converted to _1)_. + +== CJK carriage return + +In Roman text, the carriage return at the end of a line in Asciidoc is interpreted as space; so +a text entered as + +[source,asciidoc] +---- +Now is the time for all good men +to come to the aid of the party. +---- + +is reflowed in Metanorma XML (and thus Metanorma outputs) as + +[source,xml] +---- +
Now is the time for all good men to come to the aid of the party.
+---- + +Space is used much more sparingly in CJK; as a result, a carriage return in CJK Asciidoc text +is *not* interpreted as space; so + +[source,asciidoc] +---- +今こそ、すべての善良な人々が +政党を支援する時です。 +---- + +is reflowed in Metanorma XML as + +[source,xml] +---- +今こそ、すべての善良な人々が政党を支援する時です。
+---- + +with no Roman or CJK space introduced between 人々が and 政党を. + +However, as with punctuation, any lines ending with Roman text have the space respected: + +[source,asciidoc] +---- +実施は中村秀子氏と John +Smith 氏の間で交渉されました。 +---- + +reflows to + +[source,xml] +---- +実施は中村秀子氏と John Smith 氏の間で交渉されました
+---- + +== Extended space + +In CJK scripts, titles consisting of only a few characters are rendered in extended spacing; +so _Foreword_ as a title is not rendered as 序文, but as 序 文. This behaviour has been implemented +in Metanorma for all section titles consisting of four characters or less. diff --git a/_posts/2024-11-07-iso-historical.adoc b/_posts/2024-11-07-iso-historical.adoc new file mode 100644 index 00000000..142cbfee --- /dev/null +++ b/_posts/2024-11-07-iso-historical.adoc @@ -0,0 +1,135 @@ +--- +layout: post +title: "Support for historical ISO versions" +date: 2024-11-07 +categories: documentation + +authors: + - name: Nick Nicholas + email: nick.nicholas@ribose.com + social_links: + - https://github.com/opoudjis + +excerpt: >- + This post describes how Metanorma supports legacy versions of ISO standards. +--- + +== Introduction + +Metanorma is mostly used to prepare new standards for publication, but it is also starting +to be used to format and generate older versions of standards as well, particularly as +those standards are integrated with data. In particular we have been working on the +https://www.iso.org/standard/7472.html:[ISO-2533] standard (the +https://en.wikipedia.org/wiki/International_Standard_Atmosphere[International Standard Atmosphere], +a data-intensive standard published in 1975, with addenda in 1985 and 1997, and with a +new version currently in preparation. This work has required us to regenerate the standards +as they were originally published, using the same data, and with the look-and-feel ISO documents +had at the time, rather than how those documents would be presented now. + +This work has motivated us to support older versions of how ISO standards were specified and formatted, +so that such regenerated standards do not look anachronistic. There have been several different iterations +of document layout for ISO standards over the years: + +`2024`::: (default) The latest document layout as of 2024 (default) +`2013`::: Document layout used from 2013 to early 2024. +`2012`::: Document layout used from mid-2012 to 2013. It is equivalent to the `1989` layout with a logo change. +`1989`::: Document layout used from 1989 to mid-2012. +`1987`::: Document layout used from 1987 to 1989. +`1972`::: Document layout used from 1972 to 1987. +`1951`::: Document layout used from 1951 to 1971. The first available published ISO layout. + +Metanorma is configured to select the appropriate layout to render a given document in. This is done in +two ways: + +* By default, if the `:copyright_year:` document attribute is specified for a document, that year +is compared to the ranges given above, and the corresponding document layout is applied for the document. +(Dates are assumed to apply from January 1 of each year, except for the `2012` format, which applies from +`2012-07-01`.) +* If the user specifies one of the given years as the `:document-scheme:` document attribute, that +year's layout is applied to the document, overriding any layout chosen through the `:copyright_year:`. + +At this time, the document layout is only applied to PDF output: HTML and Word output use the latest +output, no matter what the document scheme. + +.The "Rice document" PDF cover page, as it appears in the 2013 document scheme (inferred from its copyright year 2016) +image::/assets/blog/2024-11-07a.png[] + +.The "Rice document" PDF cover page, as it appears in the 2024 document scheme +image::/assets/blog/2024-11-07b.png[] + +.The "Rice document" PDF cover page, as it appears in the 1951 document scheme +image::/assets/blog/2024-11-07c.png[] + +The differences between schemes are mostly a matter of visual presentation, but before 2013, ISO documents +allowed Scope, Normative References and Terms and Definitions to be subclauses of an initial General +clause, rather than requiring them to be separate clauses at the start of the document body. + +Document attributes are mostly the same across the document schemas, with the following exceptions: + +* In `2024`, the attribute `:semantic-metadata-feedback-link:`, which specifies a URL for readers to provide +feedback for a specific document, is used to generate a QR code on the cover page of the document PDF. +* From 1994, ISO has used the +https://en.wikipedia.org/wiki/International_Classification_for_Standards[International Categorization for Standards] +number to classify documents, and the ICS number appears on document cover pages; +it is specified as comma-delimited values of the `:library-ics:` document attribute (e.g. `:library-ics: 43.040.20,35.220.20-10`. +Prior to 1994, ISO instead used the +https://en.wikipedia.org/wiki/Universal_Decimal_Classification[Universal Decimal Classification (UDC)], +and this is supported through the generic `:classification:` document attribute; values are comma-delimited, +and each UDC value must be prefixed with `UDC:` (e.g. `:classification: UDC:663.971/.976:620.1:551.511.12, UDC:535.643.2`). + +In addition, our support for legacy format of ISO means we now support not only Amendments and Technical Corrigenda +of documents (`:doctype: amendment`, `:doctype: technical-corrigendum`), but also Addenda (`:doctype: addendum`), +which were published by ISO under the 2000s. Addenda are marked up in the same way as Amendments and Technical Corrigenda: +they are updates of documents (whose identifier is given under `:updates:`), and they have distinct titles, +indicated through `:title-addendum-{en,fr}:`. For example, the following is how ISO 2533:1975/ADD 1:1985 (Addendum 1 of ISO 2533) +is marked up: + +[source,asciidoc] +---- += Standard atmosphere +:docnumber: 2533 +:edition: 1 +:copyright-year: 1975 +:revdate: 1985-02-15 +:language: en +:title-main-en: Standard atmosphere +:title-intro-fr: Atmosphère type +:updates: ISO 2533:1975 +:has-draft: ISO 2533:1975/DAD 1 +:updates-document-type: international-standard +:addendum-number: 1 +:doctype: addendum +:docstage: 60 +:docsubstage: 60 +---- + +Note the use of `has-draft:`, which gives the identifier of the pre-publication version of the addendum +(`ISO 2533:1975/DAD 1`: Draft Addendum 1) + +And this is how ISO 2533:1975/ADD 2:1997 is marked up: + +[source,asciidoc] +---- += Standard atmosphere +:docnumber: 2533 +:edition: 1 +:copyright-year: 1985 +:revdate: 1997-11-01 +:language: en +:title-main-en: Standard atmosphere +:title-main-fr: Atmosphère type +:title-main-ru: Стандартная атмосфера +:title-addendum-en: Extension to -- 5000 m and standard atmosphere as a function of altitude in feet +:title-addendum-fr: Extension à -- 5000 m, et atmosphère type en fonction de l'altitude, en feet +:title-addendum-ru: Расширени до -- 5000 м и стандартная атмосфера в функции от высоты в футах +:updates: ISO 2533:1975 +:updates-document-type: international-standard +:addendum-number: 2 +:doctype: addendum +:docstage: 60 +:docsubstage: 60 +---- + +Addendum 2 does not give a pre-publication version identifier, but it does provide a title of the addendum +specifically. + diff --git a/assets/blog/2024-11-07a.png b/assets/blog/2024-11-07a.png new file mode 100644 index 00000000..d5405eef Binary files /dev/null and b/assets/blog/2024-11-07a.png differ diff --git a/assets/blog/2024-11-07b.png b/assets/blog/2024-11-07b.png new file mode 100644 index 00000000..ed1f4d1e Binary files /dev/null and b/assets/blog/2024-11-07b.png differ diff --git a/assets/blog/2024-11-07c.png b/assets/blog/2024-11-07c.png new file mode 100644 index 00000000..6638ced2 Binary files /dev/null and b/assets/blog/2024-11-07c.png differ diff --git a/author/iso/ref/document-attributes.adoc b/author/iso/ref/document-attributes.adoc index 84dbc47d..0af5e1f6 100644 --- a/author/iso/ref/document-attributes.adoc +++ b/author/iso/ref/document-attributes.adoc @@ -65,7 +65,7 @@ updating [added in https://github.com/metanorma/isodoc/releases/tag/v1.3.25]. `:docsubtype:`:: A subclass of doctype for which special processing rules apply. -`vocabulary`::: +`:vocabulary`::: The "vocabulary" document type is defined in the https://www.iso.org/ISO-house-style.html[ISO House Rules] and title requirements defined in the ISO/IEC Directives, Part 2, 2018, 11.5.2. @@ -139,7 +139,7 @@ There may be more than one ICS for a document; if so, they should be comma-delim `:classification:`:: + -- -(for `document-scheme` values of `1989` and prior, and a publication date of 1994 onwards) +(for `document-scheme` values of `1989` and prior, and a publication date before 1994) The https://en.wikipedia.org/wiki/Universal_Decimal_Classification[Universal Decimal Classification (UDC)] @@ -196,6 +196,9 @@ as CIE uses UDC. ==== -- +`:price-code:`:: price code group of publication, as documented in the +https://www.iec.ch/members_experts/tools/pdf/IEC_DATA_FEEDS.pdf[IEC Data Feeds: Technical documentation document] [added in https://github.com/metanorma/metanorma-iso/releases/tag/v2.8.10] + === Document identifier ==== General diff --git a/author/topics/collections/configuration.adoc b/author/topics/collections/configuration.adoc index d35f1695..2cf9eb92 100644 --- a/author/topics/collections/configuration.adoc +++ b/author/topics/collections/configuration.adoc @@ -593,6 +593,10 @@ Prefatory content from the collection manifest [added in https://github.com/meta `final-content`:: Final content from the collection manifest [added in https://github.com/metanorma/metanorma/releases/tag/v1.5.6]. +`bibdata`:: +A hash representation of the `bibdata` element representing the bibliographic metadata +of the manifest [added in https://github.com/metanorma/metanorma/releases/tag/v2.0.8]. + == Multilingual documents diff --git a/author/topics/sections.adoc b/author/topics/sections.adoc index 86a2eb69..238c28c2 100644 --- a/author/topics/sections.adoc +++ b/author/topics/sections.adoc @@ -122,7 +122,16 @@ For these types of sections, enter them without * Index: `[index]` * Annexes: `[appendix]` -NOTE: Documents can contain only one Abstract, one Acknowledgements section, and one Index. +[NOTE] +-- +Documents can contain only one Abstract, one Acknowledgements section, and one Index. +In most flavours and documents types, documents can only contain one Terms and Definitions section. +If a second matching clause is found, it is treated as a normal clause. + +However, this can be overridden by specifying the section type in a `heading` attribute: +this is interpreted as the user explicitly wanting that section type to +apply [added in https://github.com/metanorma/metanorma-standoc/releases/tag/v2.10.0]. +-- The following example indicates usage of the section titles. diff --git a/develop/topics/adopting-toolchain.adoc b/develop/topics/adopting-toolchain.adoc index ed115c0d..b41c97c0 100644 --- a/develop/topics/adopting-toolchain.adoc +++ b/develop/topics/adopting-toolchain.adoc @@ -41,8 +41,8 @@ To customise behaviour further than Simple Adoption, you need to create a custom The toolchains currently available proceed in two steps: -. map an input markup language (currently AsciiDoc only) into Metanorma Standoc XML; and -. map Metanorma Standoc XML into various output formats (currently Word doc, HTML, PDF via HTML). +. map an input markup language (currently AsciiDoc only) into Metanorma Standoc Semantic XML; and +. map Metanorma Standoc XML into various output formats (currently Word doc, HTML, PDF, all via Metanorma Standoc Presentation XML). Running the `metanorma` CLI tool involves a third step, of exposing the capabilities available in the first two in a consistent format. @@ -50,13 +50,15 @@ These two steps are represented as three separate modules, which are included in Your adaptation of the toolchain will need to instantiate these three modules. The connection between the two first steps is taken care of in the toolchain, and metanorma explicitly invokes the two steps, feeding the XML output of the first step as input into the second. The metanorma-sample gem outputs both Word and HTML; you can choose to output only Word, or only HTML, and you can choose to generate PDF as well. -The modules involve classes which rely on inheritance from other classes; the current gems use `Metanorma::{Standoc, ISO, Generic}::Converter`, `Isodoc::{Metadata, HtmlConvert, WordConvert}`, and `Metanorma::Processor` as their base classes. This allows the standards-specific classes to be quite succinct, as most of their behaviour is inherited from other classes; but it also means that you need to be familiar with the underlying gems, in order to do most customization. +The modules involve classes which rely on inheritance from other classes; the current gems use `Metanorma::{Standoc, ISO, Generic}::Converter`, `Isodoc::{PresentationXML, Metadata, HtmlConvert, WordConvert}`, and `Metanorma::Processor` as their base classes. This allows the standards-specific classes to be quite succinct, as most of their behaviour is inherited from other classes; but it also means that you need to be familiar with the underlying gems, in order to do most customization. In the case of `Metanorma::X` classes, the changes you will need to make involve the intermediate XML representation of your document, which is built up through Nokogiri Builder; e.g. adding different enums, or adding new elements. The adaptations in `Metanorma::Generic::Converter` are limited (and are almost all to do with reading in properties from a config file), and most projects can take them across as is. The customizations needed for `Metanorma::Generic::Processor` are minor, and involve invoking methods specific to the gem for document generation. -The customizations needed for `Isodoc::Generic` are more extensive. Three base classes are involved: +The customizations needed for `Isodoc::Generic` are more extensive. Five base classes are involved: + +* `Isodoc::PresentationXMLConvert` converts Metanorma Standoc Semantic XML to Metanorma Standoc Presentation XML. * `Isodoc::Metadata` processes the metadata about the document stored in `//bibdata`. This information typically ends up in the document @@ -65,15 +67,21 @@ is extracted into a `Hash`, which is passed to document output (title page, Word header) via the https://shopify.github.io/liquid/[Liquid template language]. See link:/develop/topics/metadata-and-boilerplate/[Metadata and Predefined text] for more information. +The Metadata class is only used for HTML and DOC; PDF parses the Presentation XML +in one go. -* `Isodoc::HtmlConvert` converts Metanorma Standoc XML to HTML. +* `Isodoc::HtmlConvert` converts Metanorma Standoc Presentation XML to HTML. -* `Isodoc::PDFConvert` converts Metanorma Standoc XML to HTML. +* `Isodoc::PDFConvert` converts Metanorma Standoc Presentation XML to PDF. -* `Isodoc::WordConvert` converts Metanorma Standoc XML to Word HTML; the https://github.com/metanorma/html2doc[html2doc] gem then converts this to a .doc document. +* `Isodoc::WordConvert` converts Metanorma Standoc Presentation XML to Word HTML; the https://github.com/metanorma/html2doc[html2doc] gem then converts this to a .doc document. The `Isodoc::HtmlConvert` and `Isodoc::WordConvert` are expected to be near-identical, since any rendering differences between the two are addressed in the HTML CSS stylesheet. The `Isodoc::HtmlConvert` and `Isodoc::WordConvert` overlap substantially, as both use variants of HTML. However there is no reason not to make substantially different rendering choices in the HTML and Word branches of the code. +In addition to these, `Relaton::Render` is used to provide a rendering of the Relaton XML bibliographic references for the document +in the Presentation XML, based on a stylesheet provided for the flavour. The `Relaton::Render` methods may themselves be +customised for the current flavour. The configuration of `Relaton::Render` is described in https://relaton.org/specs/relaton-render[relaton.org]. + === Metanorma::Standoc customization examples In the following snippets, the parameter `node` represents the current node of the AsciiDoc document, and `xml` represents the Nokogiri Builder node of the XML output.