Skip to content

Latest commit

 

History

History
71 lines (39 loc) · 5.25 KB

1909.01066.md

File metadata and controls

71 lines (39 loc) · 5.25 KB

Language Models as Knowledge Bases?, Petroni et al., 2019

Paper, Tags: #nlp, #linguistics

Language models might learn linguistic knowledge as well as relational knowledge. We present an in-depth analysis of the relational knowledge already present in a wide range of pretrained LMs. We find that:

  1. Without fine-tuning, BERT contains relational knowledge competitive with traditional NLP methods that have some access to oracle knowledge
  2. BERT also does remarkably well on open-domain questions answering against a supervised baseline
  3. Certain types of factual knowledge are learned much more readily than others by standard LM pretraining approaches

Code can be found in Github.

Knowledge bases are solutions for accessing annotated gold-standard relational data by enabling queries such as (Dante, born-in, X). But in practic we need to extract relational data from text or other modalities to populate these knowledge bases. This involves entity extraction, coreference resolution, entity linking and relation extraction, which need supervised data and fixed schemas. Errors can easily propagate and accumulate throughout the pipeline.

We could query neural LMs for relational data by asking them to fill in maskd tokens in sequences like "Dante was born in [Mask]". LMs don't need schema engineering, human annotations and they support an open set of queries.

LMs have qualities to be used as potential representations of relational knowledge.

LAMA

LAnguage Model Analysis probe, consists of a set of knowledge sources, each comprised of a set of facts. A pretrained LM knows a fact (subject, relation, object) such as (Dante, born-in, Florence), if it can successfully predict masked objects in cloze sentences such as "Dante was born in ___" expressing the fact.

We test for a variety of types of knowledge:

  • relations between entities stored in Wikidata
  • common sense relations between concepts from ConceptNet
  • knowledge necessary to answer NL questions in SQuAD

Our investigation reveals that:

  1. the largest BERT model captures accurate relational knowledge comparable to that of a knowledge base
  2. Factual knowledge can be recovered surprisingly well from pretrained LMs, but performance very poor for N-to-M relations
  3. BERT outperforms other LMs in recovering factual and commonsense knowledge and is more robust to the phrasing of a query
  4. BERT achieves good resulst for open-domain QA

Background, LMs

In our study we use the pretrained fairseq library. Transformer-XL can take into account a longer history by caching previous outputs and by using relative instead of positional encoding.

Related work

Existing work focuses on understanding linguistic and semantic properties of word representations or how well pretrained sentence representations and LMs transfer linguistic knowledge to downstream task.

Our investigation wants to answer to what extent pretrained LMs store factual and commonsense knowledge by comparing them with symbolic knowledge bases populated by traditional relation extraction approaches.

The LAMA Probe

It tests the factual and commonsense knowledge in LMs. It provides a corpus of facts, and facts can be either subject-relation-object triples or question-answer pairs. Each fact is converted into a cloze statement used to query the LM for a missing token.

Our assumption is that models which rank ground truth tokens high for these cloze statements have more factual knowledge. We use the Google-RE corpus with ~60k facts manually extracted from Wikipedia. It covers five relations but we consider only three, 'date of birth', 'place of birth' and 'place of death'.

The T-REx knowledge source is derived frmo the T-REx dataset and is much larger than Google-RE with a broader set of relations.

We also use ConceptNet, a multilingual knowledge base.

We also use SQuAD, a popular question answering dataset.

Results

BERT outperforms all other models by a substantial margin, and BERT large is better than BERT base. The performance of BERT in retrieving factual knowledge are close to the performance obtained by automatically building a knowledge base with an off-the-shelf relation extraction system and oracle-based entity linking.

When BERT has a high confidence in its prediction, it is often correct.

The results on the ConceptNet corpus are in line with Google-RE and T-REx. The BERT-large model consistently achieves the best performance, and it is able to retrieve commonsense knowledge at a similar level to factual knowledge.

Conclusion

BERT-large is able to recall factual and commonsense knowledge better than its competitors and at a level remarkably competitive with non-neural and supervised alternatives.

It is non-trivial to extract a knowledge base from text that performs on par to directly using pretrained BERT-large. We suspected BERT might have an advantage due to the larger amount of data it has processed, so we added Wikitext-103 as additional data to the relation extraction system and observed no significant change in performance.