Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems inserting bib-entries encoded in UTF-8 to PDF files #61

Open
kamapu opened this issue Dec 28, 2018 · 11 comments
Open

Problems inserting bib-entries encoded in UTF-8 to PDF files #61

kamapu opened this issue Dec 28, 2018 · 11 comments

Comments

@kamapu
Copy link

kamapu commented Dec 28, 2018

I am trying to generate a report of data from a database, including references as data sources. For convenience I use print.BibEntry to insert references from a bibtex file into a PDF. The reference is stored as biblio.bib and looks like:

@ARTICLE{SanMartin2004,
  author = {San Mart\'{i}n, Cristina and Ram\'{i}rez, Carlos and Alvarez, Miguel},
  title = {Estudio de la vegetaci\'{o}n de "mallines" y "campa\~{n}as" en la
	{Cordillera Pelada} ({Valdivia, Chile})},
  journal = {Revista Geogr\'{a}fica de Valpara\'{i}so},
  year = {2004},
  volume = {35},
  pages = {261--273},
  file = {SanMartin2004.pdf:SanMartin2004.pdf:PDF},
  groups = {My Publications},
  keywords = {Vegetation und Flora Chiles 42},
  owner = {m_alvarez},
  refid = {36},
  timestamp = {2013.12.30}
}

I write then a document in markdown (example.Rmd).

---
title: Example Text
author: Miguel
output: pdf_document
---

```{r echo=FALSE, message=FALSE}
library(RefManageR)
biblio <- ReadBib("biblio.bib", check ="warn")
```

**`r capture.output(print(biblio["SanMartin2004"]))`**

Blabla

Then I use rmarkdown to render the document:

library(rmarkdown)
render("example.Rmd", encoding="UTF-8")

example

Unfortunately I am struggling with special characters. I tried many options but I cannot get rid of strange symbols produced by the printing function (seems to be better handled in the title of the article but not for authors and journal's name).

Is there a way to deal with this issue?

@mwmclean
Copy link
Collaborator

What happens if you print biblio at the console? What is the result of Sys.getlocale()?

@kamapu
Copy link
Author

kamapu commented Feb 15, 2019

The output of Sys.getlocale() is:

[1] "LC_COLLATE=German_Germany.1252;LC_CTYPE=German_Germany.1252;LC_MONETARY=German_Germany.1252;LC_NUMERIC=C;LC_TIME=German_Germany.1252"

When I print biblio at the console, I get:

r_screen

In the meantime, I switched to the package bibentry in LaTeX and put the required commands in the preamble for markdown. The later option is working perfectly but the alternative with RefManageR would make the coding much simpler.

@mwmclean
Copy link
Collaborator

What about l10n_info()? You will probably need to change your system locale to get this working. Figuring out what the proper locale is on Windows can be annoying.

@kamapu
Copy link
Author

kamapu commented Apr 1, 2019

Sorry @mwmclean I am been busy preparing field tript to Kenya and been there. Now I'm back.

Here the output of my console:

l10n_info()
$MBCS
[1] FALSE

$`UTF-8`
[1] FALSE

$`Latin-1`
[1] TRUE

$codepage
[1] 1252

@kamapu
Copy link
Author

kamapu commented Apr 1, 2019

One day, after meeting some friends in a bar and going back home, I saw this on the display of stations in the bus:
no_utf8

I was convinced, I'm getting mad...

@mwmclean
Copy link
Collaborator

mwmclean commented Apr 2, 2019

Haha, are you on a recent version of Windows? There is beta support for UTF-8 in Windows 10 17133. Try following the instructions here and hopefully you see a checkbox that will allow you to enable UTF-8 and fix your problems.

@kamapu
Copy link
Author

kamapu commented Apr 3, 2019

I found the checkbox and now it is checked. Otherwise, nothing else had changed

mwmclean added a commit that referenced this issue Apr 4, 2019
* In single-byte locales, the gsub call in collapseF() can cause multi-byte chars
to be converted to single-byte ones, so only perform when
necesssary
* collapseF only used when bib.style = 'authoryear'
* Affects #62, #61
mwmclean added a commit that referenced this issue Apr 4, 2019
* In single-byte locales, the gsub call in collapseF() can cause
multi-byte chars to be converted to single-byte ones, so only
perform when necesssary
* collapseF only used when bib.style = 'authoryear'
* Additionally, a period could be removed from the last initial in the
first author's given name when first.inits = TRUE. This has been
corrected.
* Possibly affects #62, #61
@mwmclean
Copy link
Collaborator

mwmclean commented Apr 4, 2019

Can you please install the latest version from GitHub and see if that helps devtools::install_github("ropensci/RefManageR")?

@kamapu
Copy link
Author

kamapu commented Apr 11, 2019

I had to uncheck the UTF8 box in Windows because I was getting a warning message refered to "wrong charset" when starting PostgreSQL...

@kamapu
Copy link
Author

kamapu commented Apr 11, 2019

Dear @mwmclean : I fear, I may be wasting your time in a problem without solution (sorry for that). I will be happy if this issue is solved but I already used a different alternative. The decision is in your hands now.

About the las recommendation

  • I installed the last version of RefManageR from github but I am still getting the same output.
  • I had to switch back the UTF8 box because it was conflicting with PostgreSQL (wich I also frequently use).

What I'm wandering about

  • The title is properly formatted (Latex tags converted to accents)
  • In the name of the journal, only the first word with accent (Geografica) is properly formatted but not the next (Valparaiso)
  • The authors are not formatted at all
  • If I use the bibtex engine instead of markdown, everything is properly formatted

@RLumSK
Copy link

RLumSK commented Jul 18, 2019

Observation

I guess I encountered the same problem, here is my example BibTeX file:

@article{Fuchs_2012,
	doi = {10.1111/j.1502-3885.2012.00299.x},
	url = {https://doi.org/10.1111%2Fj.1502-3885.2012.00299.x},
	year = 2012,
	month = {nov},
	publisher = {Wiley},
	volume = {42},
	number = {3},
	pages = {664--677},
	author = {Markus Fuchs and Sebastian Kreutzer and Denis-Didier Rousseau and Pierre Antoine and Christine Hatt{\'{e}} and France Lagroix and Olivier Moine and Caroline Gauthier and Jiri Svoboda and Lenka Lis{\'{a}}},
	title = {The loess sequence of Doln{\'{\i}} V{\v{e}}stonice, Czech Republic: A new {OSL}-based chronology of the Last Climatic Cycle},
	journal = {Boreas}
}

The dataset was extracted via rorcid::orcid_citations(). When I then use RefManageR::ReadBib(), I can see that something went wrong (extract):

str(RefManageR::ReadBib(bib_file))

 .. .. ..$ :List of 5
  .. .. .. ..$ given  : chr [1:2] "Lenka" "Li"
  .. .. .. ..$ family : chr "á"
  .. .. .. ..$ role   : NULL
  .. .. .. ..$ email  : NULL

System

  • R-devel
  • 'RefManageR' (1.2.1.3) from GitHub
  • System macOS 10.14.5
l10n_info()
$MBCS
[1] TRUE

$`UTF-8`
[1] TRUE

$`Latin-1`
[1] FALSE

Thanks for your efforts and please let me know if I can contribute with further details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants