Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved paragraph algorithm #118

Merged
merged 88 commits into from
Aug 1, 2024
Merged

Conversation

jenniferjiangkells
Copy link
Member

@jenniferjiangkells jenniferjiangkells commented Apr 22, 2024

This PR implements improved paragraph detection algorithm.

  1. Implements refine option in .process method of Note. This is configurable through refine_paragraphs option in AnnotatorConfig. The refine option merges any consecutive prose sections into one paragraph and merges a header with empty body with the next prose section. Address point below:

Header + blank line: if a paragraph has header and no body and no concepts detected, and the following paragraph is prose (i.e. does not match a header type), the paragraph type should apply to the following paragraph as well. e.g.

Problems:

- Aortic stenosis
- Diabetes
- IHD 
  1. Implements .filter_concepts_in_numbered_list() in Annotator. This pipeline component (list_cleaner) filters out extraneous concepts in a numbered list and only returns the first concept detected in a line after a numbered list item. Addresses point below (in this example only CCF, IHD, Gallstones would be returned):

Numbered list: users may mark entries with numbers with varying paragraph sizes and location of line breaks. If detected as a numbered list then use only numbered items, keep only the first concept per numbered item. e.g.

Problems:

1. CCF -
- had echo on 15/6
- on diuretics

- awaiting pacemaker

2). IHD
3. Diabetes type 2

HbA1c = 78mmol/L
4 Gallstones
  1. Makes loading of paragraph regex configurable with the rest of the lookups.

jenniferajiang and others added 30 commits January 9, 2023 16:10
@jenniferjiangkells jenniferjiangkells changed the base branch from dev to master July 30, 2024 17:14
@jenniferjiangkells jenniferjiangkells merged commit 77dfb39 into master Aug 1, 2024
2 checks passed
@jenniferjiangkells jenniferjiangkells deleted the improved-paragraph-algorithm branch August 1, 2024 10:20
@jenniferjiangkells jenniferjiangkells added the type: enhancement New feature or request label Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants