Skip to content

texttechnologylab/GerParCor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Paper Conference version

Paper Conference version

GerParCor

GerParCor

German Parliamentary Corpus (GerParCor)

Abstract

In 2022, the largest German-speaking corpus of parliamentary protocols from three different centuries, on a national and federal level from the countries of Germany, Austria, Switzerland and Liechtenstein, was collected and published - GerParCor. Through GerParCor, it became possible to provide for the first time various parliamentary protocols which were not available digitally and, moreover, could not be retrieved and processed in a uniform manner. Furthermore, GerParCor was additionally preprocessed using NLP methods and made available in XMI format. In this paper, GerParCor is significantly updated by including all new parliamentary protocols in the corpus, as well as adding and preprocessing further parliamentary protocols previously not covered, so that a period up to 1797 is now covered. Besides the integration of a new, state-of-the-art and appropriate NLP preprocessing for the handling of large text corpora, this update also provides an overview of the further reuse of GerParCor by presenting various provisioning capabilities such as API’s, among others.

GerParCor is available via https://gerparcor.texttechnologylab.org

GerParCor 2022

GerParCor 2022 is available via http://lrec2022.gerparcor.texttechnologylab.org

# Parliament Sessions From Until Status / Download
1 Reichstag (NG + Zoll) 1990 02/25/1867 05/24/1895 Download
2 Reichstag (Empire) 2183 12/03/1895 10/26/1918 Download
3 Weimar Republic 1328 02/06/1919 12/09/1932 Download
4 ThirdReich 20 03/21/1933 04/24/1942 Download
5 Bundesrat 1008 09/07/1949 10/08/2021 Download
6 Bundestag 4158 09/07/1949 09/07/2021 Download
7 Baden-Würtemberg 412 06/05/1984 09/29/2021 Download
8 Bayern 2221 12/16/1946 10/14/2021 Download
9 Berlin 582 04/02/1989 09/16/2021 Download
10 Brandenburg 442 10/26/1990 08/27/2021 Download
11 Bremen 1102 07/04/1995 09/16/2021 Download
12 Hamburg 586 10/08/1997 11/03/2021 Download
13 Hessen 1297 02/04/1947 09/29/2021 Download
14 Mecklenburg-Vorpommern 659 10/26/1990 06/11/2021 Download
15 Niedersachsen 1109 06/22/1982 09/15/2021 Download
16 Nordrhein-Westfalen 2041 05/21/1947 10/08/2021 Download
17 Rheinland-Pfalz 1562 07/24/1947 09/22.2021 Download
18 Saarland 876 07/23/1959 09/15/2021 Download
19 Sachsen 690 10/27/1990 11/18/2021 Download
20 Sachsen-Anhalt 607 10/28/1990 09/17/2021 Download
21 Schleswig-Holstein 1776 02/26/1946 02/11/2021 Download
22 Thüringen 761 10/25/1990 11/19/2021 Download
23 Liechtenstein 504 03/13/1997 11/06/2021 Download
24 Nationalrat (AT) 4267 10/21/1918 05/17/2021 Download
25 Nationlarat (CH) 368 12/06/1999 12/09/2021 Download

Cite

If you want to use the project or the corpus, please quote this as follows:

  • G. Abrami, M. Bagci, L. Hammerla, and A. Mehler, “German Parliamentary Corpus (GerParCor),” in Proceedings of the Language Resources and Evaluation Conference, Marseille, France, 2022, pp. 1900-1906. [Link] [PDF]

  • G. Abrami, M. Bagci and A. Mehler, “German Parliamentary Corpus (GerParCor) Reloaded,” in Proceedings of the 2024 Joint International Conference on Computational Linguistics, (LREC-COLING 2024), Torino, Italy, 2024, pp. 7707-7716. [Link] [PDF]

BibTeX

@InProceedings{Abrami:Bagci:Hammerla:Mehler:2022,
  author         = {Abrami, Giuseppe and Bagci, Mevl\"{u}t and Hammerla, Leon and Mehler, Alexander},
  title          = {German Parliamentary Corpus (GerParCor)},
  booktitle      = {Proceedings of the Language Resources and Evaluation Conference},
  month          = {June},
  year           = {2022},
  address        = {Marseille, France},
  publisher      = {European Language Resources Association},
  pages          = {1900--1906},
  url            = {https://aclanthology.org/2022.lrec-1.202}
}

@inproceedings{Abrami:et:al:2024,
    address   = {Torino, Italy},
    author    = {Abrami, Giuseppe and Bagci, Mevl{\"u}t and Mehler, Alexander},
    booktitle = {Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)},
    editor    = {Calzolari, Nicoletta and Kan, Min-Yen and Hoste, Veronique and Lenci, Alessandro and Sakti, Sakriani and Xue, Nianwen},
    month     = {may},
    pages     = {7707--7716},
    publisher = {ELRA and ICCL},
    title     = {{G}erman Parliamentary Corpus ({G}er{P}ar{C}or) Reloaded},
    url       = {https://aclanthology.org/2024.lrec-main.681},
    year      = {2024}
}