FrameNet Querying Tutorial

@author: Sean Trott ([email protected])

The framenet module is intended to provide linguists and other researchers with easy access to the data stored on FrameNet. The version I've tested with is v1.6, but the code should be stable across other versions, assuming the XML markup doesn't change.

System Requirements
Getting Started
- [FrameNet Data] (#framenet-data)
- [Installation] (#installation)
- [Initialization and Building] (#initialization-and-building)
- [Initial Queries] (#initial-queries)
  - [Individual Frames] (#querying-individual-frames)
  - [FrameNet Utilities] (#framenet-class-tools)
  - [Lexical Units] (#lexical-units)
  - [Annotations] (#annotations)
- [Summary of Introduction] (#summary-of-introduction)
[More Advanced Queries] (#more-advanced-queries)
- [Using List Comprehensions] (#using-list-comprehensions)
  - [Individual Valences] (#individual-valences)
  - [FE Group Realizations] (#fe-group-realizations)
[Annotations and Valences for all Frames] (#annotations-and-valences-for-all-frames)

System Requirements

You should have:

A version of the FrameNet data
A working installation of Python (2.7+ should work, though I've only done extensive testing with 3)
An interactive environment for running Python code (Terminal, Command Prompt, Python notebook, etc.)

Getting Started

FrameNet Data

The FrameNet data can be accessed through a data-dump by sending a request. Once you've unzipped the file, it should be a folder that looks something like:

fndata-1.6

(Alternatively, if you're following along in the workshop, you might have obtained the data from a flash drive.)

Installation

First, you'll need to download the module. Using Terminal, Git Bash, or whichever Git-adorned tool you prefer, navigate to the directory you plan to install the module in:

$ cd projects

Then, use git clone to copy the remote repository onto your machine:

$ git clone https://github.com/icsi-berkeley/framenet.git

If all goes well, you will now have a copy of the framenet module on your computer. The code only relies on the xml library, which is included in most standard Python distributions, so you should be able to run the scripts out of the box.

Initialization and Building

The module comes with multiple build scripts. The python command to run the main.py file is not complicated, but the shell/batch scripts make it somewhat easier.

Note: if you want to run the build.sh or build.bat scripts as is, you'll need to move your FrameNet data-dump into the framenet folder you just created with the git clone script. You can do this manually by dragging the fndata-1.6 folder into the _framenet _folder, or by using the following bash commands:

$ cd projects/framenet
$ mv {path}fndata-1.6 .

(You may need to preface the mv command with the sudo keyword, and then fill in the password prompt.)

Alternatively, you can alter the path specifies in the build files:

python -i main fndata-1.6/

Change the last argument to the full path of wherever the data is stored on your computer, e.g.:

python -i main /Users/sean/projects/fndata-1.6/

Once this is done, run the build script:

$ sh build.sh

Or, on Windows:

$ START analyzer.bat

If you prefer simply running the contents of the script, you can do this as well:

$ python -i main fndata-1.6/

Initial Queries

Note: If the purpose of a class or function is unclear, recall that you can use the help function, and the shell will print out documentation I've written for that class, e.g.:

>>> help(fn)

The build script will open up an interactive Python shell. From here, you have access to two primary classes:

fn: A "FrameNet" (src.framenet) object.
fnb: A "FramenetBuilder" (src.builder) object.

Querying Individual Frames

The FrameNet object can be used to retrieve information about frames. The most basic command is retrieving a Frame from an input string:

>>> frame = fn.get_frame("Motion")

This new frame object can now be queried to learn basic information about the frame:

>>> frame.ID
7
>>> frame.name
Motion

Other useful information includes: children, definition, elements, fe_relations, lexicalUnits, name, parents, and relations. The "parents" of a frame are the frames it inherits from. In this case "Motion" only inherits from the "Event" frame:

>>> frame.parents
['Event']

Within a given frame, you can also retrieve a specific FrameElement object (src.frames) by passing in its name:

>>> fe = frame.get_element("Goal")

The FrameElement object has some useful information of its own, including which FEs it requires and which it excludes. In this case, the 'Goal' FE excludes the 'Area' FE:

>>> fe.excludes
['Area']

You can also view the lexical units associated with a frame:

>>> frame.lexicalUnits
[move.v, go.v, drift.v, glide.v, ...]
>>> lu = frame.get_lu("move.v")
>>> lu.status
Finished_initial

Note: By default, the build script does not read in any valence data, and thus the lexical units for a frame only contain shallow information. This data can be accessed on demand, and is covered in the Querying Valence Data section.

[See the documentation in src.frames for more ideas on what information you can gather with an individual Frame object.]

FrameNet Class Tools

The FrameNet object (src.framenet) has some useful tools built in as well. The help documentation contains more info, but below is a list of basic sample queries. Note that these queries could be combined or elaborated on to perform more complex queries.

>>> fn.get_root(frame).name
Event
>>> fn.subtype_s("Motion", "Event")
True
>>> frame2, frame3 = fn.get_frame("Self_motion"), fn.get_frame("Execute_plan")
>>> fn.common_supertype(frame2, frame3)
Intentionally_act

You can also retrieve a list of frames from a given lexical unit string, such as:

>>> frames = fn.get_frames_from_lu("stream.v")
>>> len(frames)
2
>>> frames[0].name
Fluidic_motion

Lexical Units

The initial information gathered about lexical units for a frame is not particularly rich. This is because the LU information is stored in separate XML data, and reading it all in during initialization-time takes too long. Instead, the module provides the option of building more complex LexicalUnit objects using the FrameNetBuilder:

>>> fnb.build_lus_for_frame("Motion", fn)

As the code snippet demonstrates, the function takes as arguments the name of the desired frame, as well as the FrameNet object ("fn"). Now, all of the lexical units for the "Motion" frame (which we've already retrieved and set to the frame variable) will contain useful valence information.

Additionally, you can access the entire set of valences in a single list directly from the frame:

>>> len(frame.individual_valences)
421

You can also view the individual "Frame Element Realizations" (see below) aggregated for a frame, as well as the "Group Frame Element Realizations" (see below):

>>> frame.fe_realizations >>> frame.group_realizations

If we want to look at the valence patterns for a given lu, we can do so:

>>> lu = frame.lexicalUnits[0]
>>> lu.name
move.v
>>> lu.individual_valences
[Frame: Motion, GF: Dep, PT: PP[around], FE: Area, total: 2,
Frame: Motion: GF: Dep: PT: PP[in], FE: Carrier, total: 1,
....]

Valences

Each LU has a field called individual_valences which is bound to a list of Valence objects (src.lexical_units). A valence object contains the following information:

Associated frame
GF (Grammatical Function)
PT (Phrase Type)
FE (Frame Element)
Total (# Annotations)
Annotations (sentences containing this valence unit)

>>> v1 = lu.individual_valences[0]
>>> v1.pt
PP[around]
>>> v1.annotations[1]
'It's comfortable and not too restrictive and the box foot gives ample room for your feet to move around freely.'

Valence objects also have a "lexeme" field:

>>> v1.lexeme move.v

Frame Element Realizations

Each LU also has a list of FE realizations associated with it. A FERealization object (src.lexical_units) contains the following information:

Frame
Total
Lexeme (lu)
Valences
Frame Element
Annotations

The valences field maps onto a list of Valence objects (see above). This is simply an alternative way of accessing and grouping similar underlying data. One of my goals with the module was to capture all the information available in the XML files, and there are many ways of storing and grouping this information.

>>> lu.fe_realizations
[Total: 2, lexeme: move.v, fe: Area,
Total:1, lexeme: move.v, fe: Carrier,
...]

You can index into the valences field to view the valences associated with that FE realization. Sometimes, there is only one valence associated with a given FE realization; sometimes, there are many.

>>> lu.fe_realizations[0].valences
[Frame: Motion, GF: Dep, PT: PP[around], FE: Area, total: 2]

This is the same valence we viewed above, the first element stored in the individual_valences field. Again, there are multiple ways of accessing this information.

Frame Element Group Realizations

This is a representation of multiple frame elements occurring together in a particular piece of data (e.g., "Theme" and "Path"). This object (src.lexical_units) contains:

Frame
Total
Valence Patterns
LU

Just as a Frame Element Realization has 1-N associated valences, an FE-Group-Realization can have 1-N associated valence patterns. A valence pattern is a set of valences occurring together. Thus, a group realization has a total, which may or may not be the same as the valence pattern's total.

>>> group_fe = lu.valences[0]
>>> group_fe
Total: 1
Valence Patterns: [Total: 1
Valences: [Frame: Motion, GF: Dep, PT: PP[around], FE: Area
Frame: Motion, GF: Dep, PT: AVP, FE: Manner
Frame: Motion, GF: Ext, PT: NP, FE: Theme]
LU: move.v]

This is a significant amount of information, but if you look at each part separately, it makes sense. It's saying that there is 1 FE Group realization of the elements:

Area
Manner
Theme

Furthermore, there is 1 valence pattern associated with this group realization, and it contains the following valences:

A PP[around] for the Area FE
An AVP for the Manner FE
An NP PT for the Theme FE

The valence pattern can be further queried to access the individual valences:

>>> group_fe.valencePatterns[0].valenceUnits[0]
Frame: Motion, Dep, PT: PP[around], FE: Area

Annotations

The Annotation object now also contains mappings from each excerpt of text to the corresponding valence pattern. Annotations can be accessed from the Frame object:

>>> frame = fn.get_frame("Ingestion")
>>> fnb.build_lus_for_frame("Ingestion", fn)
>>> frame.annotations[0]
In Madagascar, geckos lap nectar from palm flowers.

To see the mappings from text to valence pattern, see below:

>>> frame.annotations[0].text_to_valence
{'from palm flowers': [FE: Source, PT: PP[from], GF: Dep], 'nectar': [FE: Ingestibles, PT: NP, GF: Obj], 'geckos': [FE: Ingestor, PT: NP, GF: Ext]

Summary of Introduction

While the documentation and code contain more in-depth descriptions of the functions and aspects of these objects and fields, here is a "cheat sheet" for relevant fields of Frames and Lexical Units, which will probably be the primary "objects" of interest.

A Frame object contains the following fields:

Name: self.name
List of Frame Elements: self.elements
List of lexical units: self.lexicalUnits
List of frame relations: self.relations
Children: self.children
Parents: self.parents
List of Frame Element relations: self.fe_relations
Frame definition: self.definition
ID: self.ID
List of individual Valences associated with frame: self.individual_valences
List of FE group realizations: self.group_realizations
List of FE realizations: self.fe_realizations
List of annotations: self.annotations Note: the last four fields are empty lists, by default, until the lexical units for the frame have been built, using:

fnb.build_lus_for_frame("{Frame_name}", fn)

A Lexical Unit object contains the following fields:

POS: self.pos
Name: self.name
Frame name: self.frame
ID: self.ID
Definition: self.definition
Group FE Realizations: self.valences
Semtype (if applicable): self.semtype
FE Realizations: self.fe_realizations
Individual valences: self.individual_valences
Lexeme: self.lexeme
Annotations: self.annotations

More Advanced Queries

While the built-in methods and classes provide useful tools for querying FrameNet data, combining these methods with some relatively simple Python tools will greatly enhance their effectiveness. I have included a number of built-in scripts (scripts.py), many of which can be modified or used as a base for expanding more functions and manipulations of the data.

Using List Comprehensions

A particularly useful tool in Python is the list comprehension. This is a way to define and construct a list or set of data-points in a single line of code. The link provided gives excellent examples of everyday applications of the list comprehension.

A list comprehension could be used for something as simple as collecting all the frames that inherit directly from "Cause_motion" (note that this could also be done by accessing the children field of the Cause_motion frame):

>>> inherit = [frame for frame in fn.frames if "Cause_motion" in frame.parents]

Note that if all you want are the frame names, as opposed to the objects, you can store that data-point instead:

>>> inherit = [frame.name for frame in fn.frames if "Cause_motion" in frame.parents]
>>> inherit
[Cause_fluidic_motion, Passing, Shoot_projectiles]

We can compare this against the children of Cause_motion to make sure we're correct:

>>> fn.get_frame("Cause_motion").children
[Cause_fluidic_motion, Passing, Shoot_projectiles]

List comprehensions can also be embedded within each other, just as "for-loops" can be embedded normally. Perhaps we are interested in all the frame names of frames that contain the "Interlocutors" Frame Element:

>>> list_frames = [frame.name for frame in fn.frames if "Interlocutors" in [element.name for element in frame.elements]]

Individual Valences

These list comprehensions are especially useful for querying valence patterns in a given frame (or set of frames). For example, if you're interested in recovering all of the valences with a "DNI" (definite null instantiation) phrase type in a frame, you can use the following list comprehension:

>>> frame = fn.get_frame("Motion")
>>> fnb.build_lus_for_frame("Motion", fn)
>>> dni = [valence for valence in frame.individual_valences if valence.pt=="DNI"]

You can then inspect the list of DNI valences:

>>> dni[0]
Frame: Motion, GF: , PT: DNI, FE: Goal, total: 1
>>> dni[0].annotations
[To counter this, the populations moved from their homes on the coast and built settlements inland, out of sight of the raiding parties.]
>>> dni[4].lexeme
snake.v
>>> dni[4].annotations
[The palm groves were full of brick kilns and trails of black smoke snaked between the trees.]

If you're interested in which Frame Elements are DNI, you can filter out just that information as well:

>>> dni_fe = [v.fe for v in dni]
>>> dni_fe
[Goal, Goal, Source, Area, Area]

You can use the Python Counter module to count the unique occurrences of each FE:

>>> from collections import Counter
>>> Counter(dni_fe)
Counter(Goal: 2, Area: 2, Source: 1)

FE Group Realizations

If you're interested in FE Group Realizations - data on which frame elements show up together in which constructional patterns - you can also use list comprehensions to extract relevant information. For example, if you're interested in FE Group Realizations containing both the "Theme" and "Path" elements in the "Motion" frame, you can query the group_realizations field:

>>> frame = fn.get_frame("Motion")
>>> fnb.build_lus_for_frame("Motion", fn)
>>> theme_path = [r for r in frame.group_realizations if set(['Theme', 'Path']).issubset(r.elements)]
>>> theme_path[0].elements
[Carrier, Manner, Path, Theme]
>>> theme_path[0].annotations
In the garden the leaves shone in the sunlight, and the flowers moved gently in the summer wind.

(Path, in the annotation above, is classified as an "indefinite null instantiation".)

Annotations and Valences for all Frames

If you'd like to view and query annotation and valence data across all FrameNet frames, you can simply use the build_lus_for_frame function on each frame:

for frame in fn.frames:
fnb.build_lus_for_frame(frame.name, fn)

This takes some more time (~5-10 minutes), but afterwards, you'll have access to all of the frame exemplar annotation data, which you can use to make interesting queries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly