This repository contains datasets and code for the paper "HINT3: Raising the bar for Intent Detection in the Wild" accepted at EMNLP-2020's Insights workshop
Published paper is available here
Update Feb 2021: We noticed in our analysis of the results that there are few ground truth labels which are incorrect. Hence, we're releasing a new version, v2 of the dataset, present inside dataset/v2 folder. All the results in the paper were obtained on the earlier version of the dataset present inside dataset/v1, which should be used to exactly reproduce the results presented in the paper.
- Train and Test sets for SOFMattress, Curekart and Powerplay11
are available in
dataset
folder for both Full and Subset variations. - You can also use
prepare_subset_of_data.ipynb
notebook to generate subset variations of full datasets. All the entailment assets generated can be downloaded from here.
We have done EDA analysis on the datasets which is accessible
from the data_exploration
folder.
Predictions from BERT and 4 NLU platforms on test sets used for
analysis in the paper are present in preds
folder. Feel free to
do further analysis on these predictions if you want.
All the metrics from BERT and 4 NLU platforms on test sets
are present in results
folder for further analysis. Graphs plotted in
the paper can be reproduced using analysis/plot_metrics_graph.ipynb
notebook
The scripts to generate training data and predicting intents
based on the testing data for all the 4 platforms and BERT
based classifier are inside platforms
folder within
their named directories.
-
The
training_data_conversion.ipynb
notebook is used to convert the training set into a JSON format that Rasa mandates in order to train its model. The generated JSON file is created inside thedata
directory -
In order to train a model for one particular bot, keep only that bot's JSON file inside the
data
directory -
Train the model using this command:
rasa train nlu
-
Once the model is trained, its tar.gz file will be stored inside the
models
directory based on the current timestamp -
In order to start the NLU server, run the following command:
rasa run --enable-api -m models/nlu-<timestamp>.tar.gz
wherenlu-<timestamp>.tar.gz
is the name of the model's file created in the previous step -
In order to generate a report against a testing set file, run the
generate_preds.ipynb
notebook after specifying the name of the bot. Generated predictions will be stored insidepreds
folder
-
The
training_data_conversion.ipynb
file is used to convert the training set into a bunch of JSON files that Dialogflow mandates in order to train its model. The generated JSON files are stored inside theintents
directory -
Login to the Diaologflow dashboard using a Gmail account and visit
https://dialogflow.cloud.google.com
-
Dialogflow allows bulk upload of the training set by importing a zip file. The compressed folder has a predefined structure. In order to create this folder, create a copy of the
agent_template
directory and rename the folder as per your bot name. Then, copy all the JSON files created in step 1 and paste it inside theintents
folder of your agent directory. Then, open theagent.json
file and edit thedisplayName
property to specify the name of the agent of your bot. An agent is analogous to an app or a bot. Once these changes are done, compress the agent directory into a zip file -
Create a new agent on the Dialogflow dashboard here:
https://dialogflow.cloud.google.com/?authuser=1#/newAgent
-
Delete
Default Fallback Intent
from the intents dashboard -
Edit the agent:
https://dialogflow.cloud.google.com/?authuser=1#/editAgent/mt11-agent-ugmx/
-> Export & Import -> Import from zip -> upload the agent zip file. This will allow us to bulk upload all intents along with their respective utterances -
Go to Edit agent -> ML settings. The default threshold value is 0.3. Change it to 0.05 and Train the model
-
Copy the CURL request from the API playground. We can get the authentication token and the model's API endpoint from this CURL request
-
The
generate_preds.ipynb
file will help generate predictions for the bot.
-
The
training_data_conversion.ipynb
file will generate a JSON file based on the training set's CSV file -
Login to
luis.ai
, go tohttps://www.luis.ai/applications
and click onNew app for conversation
->Import as JSON
. Upload the JSON file generated in the first step -
Once all the intents are uploaded, click on the
Train
button to train the model. Once the model is trained, click onPublish
followed by selectingProduction slot
-
Now, go to the
Manage
section of the app and copy the App ID. We will be using this App ID in thegenerate_preds.ipynb
file to generate our prediction reports -
Go to the settings page of your account in order to get the
PREDICTION_KEY
andPREDICTION_ENDPOINT
used ingenerate_preds.ipynb
file
-
Access requests for signup on Haptik are processed via contact form at https://haptik.ai/contact-us/
-
Once you get the access, you'll be able to create bots and run predictions using the scripts provided in
platforms/haptik
-
Results on BERT can be reproduced using scripts in the folder
platforms/bert
-
The folder also contains config for each of the models trained on Full and Subset variations of datasets
If you use this in your research, please consider citing:
@inproceedings{arora-etal-2020-hint3,
title = "{HINT}3: Raising the bar for Intent Detection in the Wild",
author = "Arora, Gaurav and
Jain, Chirag and
Chaturvedi, Manas and
Modi, Krupal",
booktitle = "Proceedings of the First Workshop on Insights from Negative Results in NLP",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.insights-1.16",
doi = "10.18653/v1/2020.insights-1.16",
pages = "100--105",
abstract = "Intent Detection systems in the real world are exposed to complexities of imbalanced datasets containing varying perception of intent, unintended correlations and domain-specific aberrations. To facilitate benchmarking which can reflect near real-world scenarios, we introduce 3 new datasets created from live chatbots in diverse domains. Unlike most existing datasets that are crowdsourced, our datasets contain real user queries received by the chatbots and facilitates penalising unwanted correlations grasped during the training process. We evaluate 4 NLU platforms and a BERT based classifier and find that performance saturates at inadequate levels on test sets because all systems latch on to unintended patterns in training data.",
}