Skip to content

zycalice/domain-adaptation-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

domain-adaptation-nlp

Dataset

Our amazon dataset (Blitzer et al., 2007) can be downloaded here. Put this file in a folder called "data/amazon_reviews".

This data contains 2000 samples of the four categories in the amazon reviews data:

  • Books
  • Electronics
  • Home and Kitchen (Kitchen)
  • Movies and TV (DVDs)

We choose these categories because they are frequently used in nlp sentiment analysis domain adaptation papers.

You can open the data (for example the amazon data) using the following code, although this step should be already included in any function you need to run.

with open("../data/amazon_reviews/amazon_4.pickle", "rb") as fr:
        all_data = pickle.load(fr)

For each element in the amazon data, and for the movie data, the structure is as follows:

  • [0] bert embeddings ([CLS] layer)
  • [1] y labels (0 means negative and 1 means positive)
  • [2] domain name

Instructions to run

Balanced Conf Model and Few Labels Models

  • Create an output folder under this root directory if it does not exist.
  • Run src/sentiment_classification_amazon.py from the root directory.

Householder Transformation

  • Adjust the n of n_fold want to use(default: 1000).
  • Run src/domain_space_alignment.py from the root directory.

About

Research Project on Domain Adaptation in NLP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published