Language Translation using Zabaan

Zabaan is an Urdu word which means Dialect, tongue, or form of speech

Zabaan is a prototype translation platform that uses a hybrid approach of human-in-the-loop (HITL) and Neural Machine Translation (NMT) techniques to suggest domain specific translations of fire, electrical, and life safety texts. Zabaan was extensively trained on NFPA datasets like Code & Standards, Research, and Public Education & Outreach material and currently focused on bilateral English (EN) and Spanish (ES) translations. Zabaan was developed by NFPA's Data Analytics team in collaboration NFPA's Internatinal Operations team with support from WPI's GQP Program. The platform come with a lightweigh UI front end has options to get instant translations and edit incorrect ones suggested by the NMT machine.

Built With

OpenNMT-tf - A general purpose sequence learning toolkit using TensorFlow
Tensorflow Serving - A flexible, high-performance serving system for machine learning models
Tornado Web Framework - A Python web framework and asynchronous networking library
MongoDB - A document-based, distributed database built for modern application developers

Getting Started

I. Project Setup

Clone the Repository

git clone https://github.com/NFPA/Zabaan.git
cd Zabaan/Serving

Activate python 3.6 environment (Assuming your using the EC2 Instance with Deep Learning AMI)

source activate tensorflow_p36

Install packages This installs the python packages for Tokenization, TFServing API 1.X, PyMongo

pip install -r requirements.txt

Get a Latest MongoDB docker image to the machine and map to a directory. Change the volumn path and port number accordingly.

mkdir mongodb
docker run --name gqp-mongo -d -v /home/ubuntu/Zabaan/Serving/mongodb:/data -p 27017:27017 mongo:latest

Once you start/stop the MongoDB docker image, for next time just start with container ID/name, no need to download the image again.

docker start <container_name/id>

Copy all the serving models into models folder. You can download the EuroParl model and NFPA model.

cp /home/ubuntu/demo/models/* /home/ubuntu/Zabaan/Serving/nfpa_models/

We have trained all the models in OpenNMT-tf format. For more details on OpenNMT-tf Saved Model format and Creating/Serving OpenNMT Models. Please see OpenNMT Serving

For more details on Serving tensorflow models. Please see Tensorflow Serving

Check the model.config so it has the required configuration of the model you want to serve.

config: {
    name: "name_of_the_model",
    base_path: "/realtive/path/to/model",
    model_platform: "tensorflow"
  }

With the MongoDB docker started, Start a Tensorflow Serving GPU instance in the background. Note: Change the source path accordingly, put your absolute path here. After you start the docker image, you can use to check if success.

nvidia-docker run --name tf_server -d --rm -p 8500:8500 --mount type=bind,source=/home/ubuntu/Zabaan/Serving/nfpa_models/,target=/models/nfpa_models -t tensorflow/serving:1.11.0-gpu --model_config_file=/models/nfpa_models/models.config

Verify TF Server started using docker log command:

 docker container logs tf_server

It should give you something like below at the end of log file:

2020-11-16 20:50:35.246427: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: euro_attention version: 1564872567}
2020-11-16 20:50:35.251353: I tensorflow_serving/model_servers/server.cc:285] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2020-11-16 20:50:35.255347: I tensorflow_serving/model_servers/server.cc:301] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 235] RAW: Entering the event loop ...

Start the server, the mapped endpoints in this file call the requires functions and models from client file.

python server.py --port 8500 --model_name euro_attention

Application should be running on localhost:8080

Results

BLEU Scores on NFPA Content, before and after domain adaption to NFPA data.

	En-Es	Es-En	No. of Sentences (Train/Dev/Test)
Before Domain Adaption	35.98	41.3	1.7M / 1000 / 500
After Domain Adaption	65.89	73.25	93k / 1000 / 1000

Acknowledgments

NFPA Data Analytics Team for testing and providing feedback
WPI Data Science Graduate Qualifying Project (GQP) Initiative.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Serving		Serving
analysis		analysis
logo		logo
General_Modeling_Pipeline.md		General_Modeling_Pipeline.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Translation using Zabaan

Zabaan is an Urdu word which means Dialect, tongue, or form of speech

Built With

Getting Started

I. Project Setup

Results

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

License

NFPA/Zabaan

Folders and files

Latest commit

History

Repository files navigation

Language Translation using Zabaan

Zabaan is an Urdu word which means Dialect, tongue, or form of speech

Built With

Getting Started

I. Project Setup

Results

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages