Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove non-ascii letters escaping from BibtexUtil #1115

Open
kosarko opened this issue May 23, 2024 · 4 comments
Open

Remove non-ascii letters escaping from BibtexUtil #1115

kosarko opened this issue May 23, 2024 · 4 comments
Assignees
Labels

Comments

@kosarko
Copy link
Member

kosarko commented May 23, 2024

"Þ" is not escaped correctly
"ð" is not escaped at all

utf8 seems to be the default input encoding since 2019 (overleaf, texlive).
Even if escaped correctly, you need something like \usepackage[T1]{fontenc} to render them correctly (this applies even to czech chars).

We can drop the accents, and the other parts of the BibtexUtil should be investigated if they are still relevant

@kosarko kosarko self-assigned this May 23, 2024
@stranak
Copy link
Member

stranak commented May 26, 2024

Completely agree. Seems like at least since 2018 Tex already assumes the source is in UTF-8. Still, the documentation says that Bibtex only supports a subset of UTF-8. To be safe, one should apparently use BibLaTeX, which in turn uses Biber (which is in Perl), not BibTeX.

I don't know how commonly used BibLaTeX is (I like and tent to use it), but maybe we can make some info note to recommend using it.

Another benefit would be that BibLaTex has more publication types including @dataset and @software and they should correctly display DOI and/or URL fields. So we could generate the citations directly in BibLaTeX style, avoiding the problematic @misc #859 .

@jakoble
Copy link

jakoble commented Oct 21, 2024

In my presentation at last week's conference, I claimed that @dataset (not usable with bibtex, as pointed out by Pavel) outputs the same thing @book. I now looked into this in more detail. It turns out that while this is definitely the case for the APA and authoryear citation styles, it isn't for e.g. ieee or even MLA. The most obvious thing is that the italics of the title are sometimes lost (e.g., with MLA). @software doesn't even work with e.g. the chicago-authordate style. In other words, with @dataset, you also get inconsistent citation between the different styles (sometimes italic formatting of title, sometimes title in parentheses, all of which points to the idea that datasets don't fit neatly into existing bibliographic styles).

So, to my mind the only way to ensure consistency with the yellow-box of the LINDAT-style repositories, is to define the bib entry as follows:

@book{parlamint-book,
title = {Multilingual comparable corpora of parliamentary debates {ParlaMint} 4.1},
author = {Erjavec, Toma{\v z} and others},
url = {http://hdl.handle.net/11356/1912},
publisher = {Slovenian language resource repository {CLARIN}.{SI}},
copyright = {Creative Commons - Attribution 4.0 International ({CC} {BY} 4.0)},
issn = {2820-4042},
year = {2024}
}

So the main change from the current exports is from @misc to @book, and from the field note = to publisher = .

As an obvious example, the above will always get you the italic formatting of titles, which is in line with the citation box.

While it might seem intuitively odd treating datasets as books, it might not be that odd from the point of view of bibliographic (rather than real-world) ontology -- note that datasets have the same elements as books (titles, publishers, years), rather than anything else (e.g., journal articles, which are defined in terms of things like journal name, journal volume, issue, and so forth, none of which is relevant for datasets/software).

@jakoble
Copy link

jakoble commented Oct 21, 2024

As an addendum: I know I'm harping on about italics, but it is the case that the APA guidelines define precisely such a format for datasets:

https://apastyle.apa.org/style-grammar-guidelines/references/examples/data-set-references

For Chicago as well, although this isn't an official CMOS site: https://libguides.murdoch.edu.au/Chicago/dataset. But I think CMOS doesn't officially define the dataset format.

MLA as well: https://library.webster.edu/data/mla

@stranak
Copy link
Member

stranak commented Oct 21, 2024

See texdoc biblatex https://texdoc.org/serve/biblatex.pdf/0 for documentation. @dataset is a bit like book, but not really. It has the same minimal, but different supported attributes (like, unilke @book, it has a "version"). You can also find in the same PDF, that while the @software type exists, but currently it is just an alias for @misc. So I would take the software as not really yet supported.

With @dataset though, it seems supported. E.g. in MLA: https://mirrors.nic.cz/tex-archive/macros/latex/contrib/biblatex-contrib/biblatex-mla/doc/biblatex-mla.pdf see in section 3.4 (p. 19) and the example linked from there. I see similar support in other biblatex style packages (not the same as bibtex styles!), so I would leave it up to them, whoat should be in italics, etc.

The key question to me is, whether we can switch to biblatex. Or maybe switch, but also produce bibtex as a fallback option? And for bibtex, can we go with UTF-8 and should we use @book as the better approximation of @dataset than @misc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants