SkimLit: Categorizing Medical Research Paper Abstracts

Overview

SkimLit is a project designed to automatically categorize sections of medical research paper abstracts into five distinct categories:

Background
Methods
Objective
Results
Conclusions

This is achieved using a multimodal model that incorporates both word and character level tokenization, as well as a model for line number.

Dataset

The dataset is sourced from PubMed 200k RCT

Features

Multimodal Tokenization: Utilizes both word and character level tokenization to capture detailed textual features.
Line Number Model: Incorporates the position of each line within the abstract to provide context.
Category Prediction: Accurately predicts the category for each line in the abstract.

Model Details

The multimodal model combines:

Word Level Tokenization: Splits the abstract into words and extracts features.
Character Level Tokenization: Splits the abstract into characters and extracts features.
Line Number Model: Incorporates the line number of each line within the abstract.

Training

The training process involves:

Training the model using a suitable machine learning framework Tensorflow .

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
skimlit.ipynb		skimlit.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SkimLit: Categorizing Medical Research Paper Abstracts

Overview

Dataset

Features

Model Details

Training

About

Releases

Packages

Languages

psyuktha/skimlit

Folders and files

Latest commit

History

Repository files navigation

SkimLit: Categorizing Medical Research Paper Abstracts

Overview

Dataset

Features

Model Details

Training

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages