Skip to content

Report on study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on of language understanding tasks. Analysing the inner workings of T5 and related transformer model architecture on multiple standard NLP datasets.

Notifications You must be signed in to change notification settings

Lakshmi-bashyam/Analysis_of_transfer_learning_methods

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploring the limits of transfer learning with text-to-text transformer

Report for Machine learning for NLP seminar offered by Language technology department @ Uds

Abstract

Transfer learning has become widely popular learning technique in many applications specifically in a natural language processing setting. There can be many ways to transfer the knowledge from one setting to another. In this report, the popular technique of building pre-trained transfer models and its performance of downstream task is discussed in detail.

This type of transfer learning involves two steps, First is to train a machine learning model on a data rich task. Here, data rich task refers to the task for which data is easily available such as language modelling where the data can be simply scraped from the web pages which are abundantly available. This model called the pretrained model captures the low-level details of the task such as semantics or grammar of the language. Second step involves reusing this general-purpose knowledge gained by fine tuning the pretrained model on specific NLP tasks such as named entity recognition or sentiment classification task.

The paper “Exploring the limits of transfer learning with text-to-text transformer” extends this idea into a text-in, text out model that can be used across many NLP tasks. It also presents a detailed comparative study of different transfer learning techniques and incorporates the best performing methods to present the final model T5 along with the C4 dataset that was used to train it. Together it achieves state of the art performance at the time of its release on many NLP tasks.

About

Report on study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on of language understanding tasks. Analysing the inner workings of T5 and related transformer model architecture on multiple standard NLP datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages