This is a repository of openly available hypergraph datasets in JSON format with documentation more extensively describing the datasets. They are hosted in the XGI Community on Zenodo and a table of statistics can be found on Read The Docs. There is also a rudimentary inspection script for checking that datasets are in the proper format. This is loosely inspired by Datasheets for Datasets by Gebru et al.
The xgi-data format for hypergraph data sets is a JSON data structure with the following structure:
hypergraph-data
: This tag accesses the attributes of the entire hypergraph dataset such as the authors or dataset name.node-data
: This tag accesses the nodes of the hypergraph and their associated properties as a dictionary where the keys are node IDs and the corresponding values are dictionaries. If a node doesn't have any properties, the associated dictionary is empty.name
: This tag accesses the node's name if there is one that is different from the ID specified in the hyperedges.- Other tags are user-specified based on the particular attributes provided by the dataset.
edge-data
: This tag accesses the hyperedges of the hypergraph and their associated attributes.name
: This tag accesses the edge's name if one is provided.timestamp
: This is the tag specifying the time associated with the hyperedge if it is given. All times are stored in ISO8601 standard.- Other tags are user-specified based on the particular attributes provided by the dataset.
edge-dict
: This tag accesses the edge IDs and the corresponding nodes which participate in that hyperedge.
All IDs are strings but can be converted to other types if desired.
Currently available data sets are:
- coauth-dblp
- coauth-mag-geology
- coauth-mag-history
- congress-bills
- contact-high-school
- contact-primary-school
- dawn
- diseasome
- disgenenet
- email-enron
- email-eu
- eventernote-events
- eventernote-places
- hospital-lyon
- house-bills
- house-committees
- hypertext-conference
- hyperbard
- invs13
- invs15
- kaggle-whats-cooking
- malawi-village
- ndc-classes
- ndc-substances
- plant-pollinator-mpl-015
- plant-pollinator-mpl-016
- plant-pollinator-mpl-049
- plant-pollinator-mpl-062
- science-gallery
- senate-bills
- senate-committees
- sfhh-conference
- tags-ask-ubuntu
- tags-math-sx
- tags-stack-overflow
- threads-ask-ubuntu
- threads-math-sx
- threads-stack-overflow
These datasets can be loaded with xgi
using the following lines:
import xgi
H = xgi.load_xgi_data("<dataset_name>")
where <dataset_name>
is chosen from the list above.
These datasets have been taken from the following sources:
index.json
is a dictionary of the data sets that are currently available on xgi-data and the url where they are hosted.
The code
folder contains the scripts used to convert hypergraph datasets into a more standard format and the JSON inspection script. This code can be adapted to convert data sets that are currently not part of xgi-data into xgi-data format.
To check if a file has the xgi-data format, run the following command:
python inspect_json.py filepath.json
The XGI-DATA package has been supported by NSF Grant 2121905, "HNDS-I: Using Hypergraphs to Study Spreading Processes in Complex Social Networks".