Substance-Information-Extractor-FH

Extract patient tobacco and alcohol usage information from medical documents.

Information Identified

Substance usage status:

Current user
Former user
History of use
User (details indeterminate)
Non-user / Never-user
Unknown

Usage attributes:

Type (e.g. cigarette, cigar, etc.)
Amount of Use (e.g. 1 pack per week)
Duration of Use
Quit Date
Time Passed since Quitting
Age at Time of Quitting

Method

The project leverages machine learning. Scikit-learn SVM LinearSVC is used for most classification with the exception of substance usage attributes, which use Stanford NER.

For each substance:

Identify sentences that contain substance information
Classify usage status of patient
Identify usage attributes within identified sentences
Link attributes to the appropriate substance

Features

The project currently leverages N-grams as features. The grams are normalized, including some unigrams being replaced with a single standardized label (or word class) for dimensionality reduction:

integers
decimals
USD money values
percentages

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
.idea		.idea
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Substance-Information-Extractor-FH

Information Identified

Method

Features

About

Releases

Packages

Languages

SpenDM/Substance-Information-Extractor-FH

Folders and files

Latest commit

History

Repository files navigation

Substance-Information-Extractor-FH

Information Identified

Method

Features

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages