From 191cb607dd2f19709c6077aba7941ac14b27d402 Mon Sep 17 00:00:00 2001 From: Angele Zamarron Date: Tue, 15 Aug 2023 14:03:59 -0700 Subject: [PATCH 1/4] add return... --- .../e5910c027af0ee9c1901c57f6579d903aedee7f4.xml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/fixtures/grobid_augment_existing_document_parser/e5910c027af0ee9c1901c57f6579d903aedee7f4.xml b/tests/fixtures/grobid_augment_existing_document_parser/e5910c027af0ee9c1901c57f6579d903aedee7f4.xml index 3d8cb808..3c453adf 100644 --- a/tests/fixtures/grobid_augment_existing_document_parser/e5910c027af0ee9c1901c57f6579d903aedee7f4.xml +++ b/tests/fixtures/grobid_augment_existing_document_parser/e5910c027af0ee9c1901c57f6579d903aedee7f4.xml @@ -88,7 +88,8 @@ xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.co
G-pooling and state-of-the-art methods

In order to verify that our proposed G-pooling is able to improve state-of-the-art segmentation approaches, we select DeepLab [6] and SegNet [3] as additional network architectures to test G-pooling.As mentioned above, the models in Section 5 use FCN as the network architecture and VGG-16 as the backbone.For fair comparison with FCN, VGG-16 is also used as the backbone in DeepLab and Seg-Net.

DeepLab [6] uses a large receptive fields through dilated convolution.For the baseline DeepLab itself, pool4 and pool5 from the backbone VGG-16 are removed and followed by [32] and the dilated conv layers with a dilation rate of 2 are replaced with conv5 layers.For the G-pooling version, pool1,pool2 are replaced with G-pooling and we keep pool3.Thus there are three max pooling layers in the baseline and one G-pooling layer and one max pooling layer in our proposed version.SegNet uses an encoder-decoder architecture and preserves the max pooling index for unpooling in the decoder.Similar to Deeplab, there are 5 max pooling layers in total in the encoder of SegNet so pool1,pool2 are replaced with the proposed G pool1 and pool3,pool4 are replaced with G pool2, and pool5 is kept.This leads us to use a 4 × 4 unpooling window to recover the spatial resolution where the original ones are just 2 × 2. Thus there are two G-pooling and one max pooling layers in our SegNet version.

As can be seen in Table 4, G-pooling is able to improve the model accuracy for Potsdam, 67.97% → 68.33%.And the improvement on the generalization test Potsdam→Vaihingen is even more obvious, G-pooling improves mIoU from 38.57 to 40.04.Similar observations can be made for SegNet and FCN.For Vaihingen, even though the model accuracy is not as high as the baseline, the difference is small.The mIoU of our versions of DeepLab, SegNet and FCN is less than 1% lower.We note that Vaihingen is an easier dataset than Potsdam, since it only includes urban scenes while Potsdam includes both urban and nonurban.However, the generalizability of our model using G-pooling is much better.As shown, when testing Potsdam using a model trained on Vaihingen, FCN with G-pooling is able to achieve 23.02% mIoU which is an improvement of 7.54% IoU.The same observations can be made for DeepLab and SegNet.

Discussion

Incorporating knowledge is not a novel approach for neural networks.Before deep learning, there was work on rule-based neural networks which required expert knowledge to design the network for specific applications.Due to the large capacity of deep models, deep learning has become the primary approach to address vision problems.However, deep learning is a data-driven approach which relies significantly on the amount of training data.If the model is trained with a large amount of data then it will have good generalization.But the case is often, particularly in overhead image segmentation, that the dataset is not large enough like it is in ImageNet/Cityscapes.This causes overfitting.Early stopping, cross-validation, etc. can help to avoid overfitting.Still, if domain shift exists between the training and test sets, the deep models do not perform well.In this work, we propose a knowledge-incorporated approach to reduce overfitting.We address the question of how to incorporate the knowledge directly into the deep models by proposing a novel pooling method for overhead image segmentation.But some issues still need discussing as follows.Scenarios using G-pooling As mentioned in section 3, Gpooling is developed using Getis-Ord G * i analysis which quantifies how the spatial convergence occurs.This is a simulated process design for geospatial data downsampling.Thus it's not necessarily appropriate for other image datasets.This is more general restriction of incorporating of knowledge.The Getis-Ord G * i provides a method to identify spatial clusters while training.The effect is similar to conditional random fields/Markov random fields in standard computer vision post-processing methods.However, it is different from them since the spatial clustering is dynamically changing based on the feature maps and the geospatial location while post-processing methods rely on the prediction of the models.

Local geospatial pattern

We now explain how G-pooling works in deep neural networks.Getis-Ord G * i analysis is usually used to analyze a global region hotspot detection which describes the geospatial convergence.As shown in Figure 3, G-pooling will be applied twice to downsample the feature map.The spatial size of the G-pooling will be 64 × 64 and 16 × 16 respectively.And the max-pooling will lead to the size of feature map being reduced by 1/2 while ours it will be by 1/4.This is because we want to compute G * i over a larger region.

Even though G * i is usually computed over a larger region than in our framework, it still provides captures spatial convergence within a small region.Also, two G-pooling operations are applied at different scales of feature map and so a larger region in the input image is really considered.Specifically, the first 4 × 4 pooling window is slid over the 256 × 256 feature map and the output feature map has size 64 × 64.This is fed through the next conv layers and a second G-pooling is applied.At this stage, the input feature map is 64 × 64 and so when a 4 × 4 sliding window is now used, a region of 16 × 16 is really considered, which is 1/16 of the whole image.

Limitations There are some limitations of our work.For example, we didn't investigate the optimal window size for performing Getis-Ord G * i analysis.We also only consider one kind of spatial pattern, clusters.And, there might be better places than pooling to incorporate knowledge in CNN architectures.

-
Conclusion

In this paper, we investigate how geospatial knowledge can be incorporated into deep learning for geospatial image analysis.We demonstrate that incorporating geospatial rules improves performance.We realize, though, that ours is just preliminary work into geospatial guided deep learning.We note the limitations of our approach, for example, that the prior distribution does not provide benefits for classes in which this prior knowledge is not relevant.Our proposed approach does not show much improvement on the single dataset case especially a small dataset.ISPRS Vaihingen is a very small dataset which contains around only 500 images of size of 256 × 256.In the future, we will explore other ways to encode geographic rules so they can be incorporated into deep learning models.

Figure 2 :

Figure 2: Given a feature map as an input, max pooling (top right) and the proposed G-pooling (bottom right) create different output downsampled feature map based on the characteristics of spatial cluster.The feature map within the sliding window (blue dot line) indicates a spatial cluster.Max pooling takes the max value ignoring the spatial cluster, while our G-pooling takes the interpolated value at the center location.(White, gray and black represent three values range from low to high.)

+
Conclusion

In this paper, we investigate how geospatial knowledge can be incorporated into deep learning for geospatial image analysis.We demonstrate that incorporating geospatial rules improves performance.We realize, though, that ours is just preliminary work into geospatial guided deep learning.We note the limitations of our approach, for example, that the prior distribution does not provide benefits for classes in which this prior knowledge is not relevant.Our proposed approach does not show much improvement on the single dataset case especially a small dataset.ISPRS Vaihingen is a very small dataset which contains around only 500 images of size of 256 × 256.In the future, we will explore other ways to encode geographic rules so they can be incorporated into deep learning models.

+
Figure 2 :

Figure 2: Given a feature map as an input, max pooling (top right) and the proposed G-pooling (bottom right) create different output downsampled feature map based on the characteristics of spatial cluster.The feature map within the sliding window (blue dot line) indicates a spatial cluster.Max pooling takes the max value ignoring the spatial cluster, while our G-pooling takes the interpolated value at the center location.(White, gray and black represent three values range from low to high.)

Figure 3 :

Figure 3: A FCN network architecture with G-pooling.

Figure 4 :

Figure 4: Qualitative results of ISPRS Potsdam.White: road, blue: building, cyan: low vegetation, green: trees, yellow: cars, red: clutter.

Table 1 :

Experimental results of FCN using VGG-16 as backbone.Stride conv, P-pooling and ours G-pooling are used to replaced the standard max/average pooling.

PotsdamMethodsRoads Buildings Low Veg. Trees Cars mIoU Pixel Acc.Max70.6274.2865.9461.36 61.40 66.7279.55Average69.3474.4963.9460.06 60.28 65.6278.08Stride67.2273.9763.0160.09 59.39 64.7477.54P-pooling71.9775.5566.8062.03 62.39 67.7581.02G-pooling-1.0 (ours) 68.5977.3967.4855.56 62.18 66.2479.43G-pooling-1.5 (ours) 70.0676.1267.6762.12 63.91 67.9881.63G-pooling-2.0 (ours) 70.9974.8965.3461.57 60.77 66.7179.46VaihingenMax70.6380.4251.5770.12 55.32 65.6181.88Average70.5479.8650.4969.18 54.83 64.9879.98Strde conv68.3677.6549.2167.34 53.29 63.1779.44P-pooling71.0680.5251.7070.93 53.65 65.5782.44G-pooling-1.0 (ours) 72.1579.6953.2870.89 53.72 65.9581.78G-pooling-1.5 (ours) 71.6178.7448.1868.53 55.64 64.5480.42G-pooling-2.0 (ours) 71.0978.8850.6268.32 54.01 64.5880.75
From 63ede452b78a0bef7d6686a6541c2763309e4f4e Mon Sep 17 00:00:00 2001 From: Angele Zamarron Date: Tue, 15 Aug 2023 14:17:15 -0700 Subject: [PATCH 2/4] body sections paragraphs sentences and test --- ...grobid_augment_existing_document_parser.py | 77 +++++++++++++++++++ ...grobid_augment_existing_document_parser.py | 23 ++++++ 2 files changed, 100 insertions(+) diff --git a/src/mmda/parsers/grobid_augment_existing_document_parser.py b/src/mmda/parsers/grobid_augment_existing_document_parser.py index 3f73284d..452ee5f3 100644 --- a/src/mmda/parsers/grobid_augment_existing_document_parser.py +++ b/src/mmda/parsers/grobid_augment_existing_document_parser.py @@ -99,6 +99,34 @@ def _parse_xml_onto_doc(self, xml: str, doc: Document) -> Document: ) ) + # sections + # Grobid provides coordinates and number attributes for section headers, and coordinates for + # sentences within the body text, also tagged by paragraphs. + # We use these to annotate the document in order to provide a hierarchical structure: + # e.g. doc.sections.header, doc.sections[0].paragraphs[0].sentences[0] + section_box_groups, heading_box_groups, paragraph_box_groups, sentence_box_groups = \ + self._get_structured_body_text_box_groups(xml_root) + doc.annotate( + sections=box_groups_to_span_groups( + section_box_groups, doc, center=True + ) + ) + doc.annotate( + headings=box_groups_to_span_groups( + heading_box_groups, doc, center=False + ) + ) + doc.annotate( + paragraphs=box_groups_to_span_groups( + paragraph_box_groups, doc, center=True + ) + ) + doc.annotate( + sentences=box_groups_to_span_groups( + sentence_box_groups, doc, center=True + ) + ) + return doc def _xml_coords_to_boxes(self, coords_attribute: str): @@ -172,3 +200,52 @@ def _get_box_groups( else: box_groups.append(BoxGroup(boxes=boxes)) return box_groups + + def _get_heading_box_group( + self, + section_div: et.Element + ) -> Optional[BoxGroup]: + box_group = None + heading_element = section_div.find(f".//tei:head", NS) + if heading_element is not None: # elements evaluate as False if no children + coords_string = heading_element.attrib["coords"] + boxes = self._xml_coords_to_boxes(coords_string) + number = heading_element.attrib["n"] if "n" in heading_element.keys() else None + section_title = heading_element.text + box_group = BoxGroup( + boxes=boxes, + metadata=Metadata(number=number, title=section_title), + ) + return box_group + + def _get_structured_body_text_box_groups( + self, + root: et.Element + ) -> (List[BoxGroup], List[BoxGroup], List[BoxGroup], List[BoxGroup]): + section_list_root = root.find(f".//tei:body", NS) + + body_sections: List[BoxGroup] = [] + body_headings: List[BoxGroup] = [] + body_paragraphs: List[BoxGroup] = [] + body_sentences: List[BoxGroup] = [] + + section_divs = section_list_root.findall(f"./tei:div", NS) + for div in section_divs: + section_boxes: List[Box] = [] + heading_box_group = self._get_heading_box_group(div) + if heading_box_group: + body_headings.append(heading_box_group) + section_boxes.extend(heading_box_group.boxes) + for p in div.findall(f"./tei:p", NS): + paragraph_boxes: List[Box] = [] + paragraph_sentences: List[BoxGroup] = [] + for s in p.findall(f"./tei:s", NS): + sentence_boxes = self._xml_coords_to_boxes(s.attrib["coords"]) + paragraph_sentences.append(BoxGroup(boxes=sentence_boxes)) + paragraph_boxes.extend(sentence_boxes) + body_paragraphs.append(BoxGroup(boxes=paragraph_boxes)) + section_boxes.extend(paragraph_boxes) + body_sentences.extend(paragraph_sentences) + body_sections.append(BoxGroup(boxes=section_boxes)) + + return body_sections, body_headings, body_paragraphs, body_sentences diff --git a/tests/test_parsers/test_grobid_augment_existing_document_parser.py b/tests/test_parsers/test_grobid_augment_existing_document_parser.py index 73031283..f16256f6 100644 --- a/tests/test_parsers/test_grobid_augment_existing_document_parser.py +++ b/tests/test_parsers/test_grobid_augment_existing_document_parser.py @@ -89,6 +89,29 @@ def test_processes_full_text(self, mock_request): assert m.box_group.metadata.target_id in bib_entry_grobid_ids assert mentions_with_targets == 66 + # structured body text (sections, paragraphs, sentences) + assert len(augmented_doc.sections) is 20 + assert len(augmented_doc.paragraphs) is 40 + assert len(augmented_doc.sentences) is 249 + + for section in augmented_doc.sections: + assert len(section.headings) == 1 + if section.id == 0: + assert section.headings[0].text == "1. Introduction" + assert section.headings[0].box_group.metadata.number == "1." + assert section.headings[0].box_group.metadata.title == "Introduction" + for paragraph in section.paragraphs: + if paragraph.id == 0: + assert paragraph.text.startswith( + "Research in remote sensing has been steadily increasing" + ) + assert paragraph.sentences[-1].text.endswith(", etc.") + for sentence in paragraph.sentences: + if sentence.id == 0: + assert sentence.text.startswith( + "Research in remote sensing has been steadily increasing" + ) + @um.patch("requests.request", side_effect=mock_request) def test_passes_if_xml_missing_authors(self, mock_request): with open(PDFPLUMBER_DOC_PATH) as f_in: From c102d355650f2680e9ab39675e2ef773c10f07a9 Mon Sep 17 00:00:00 2001 From: Angele Zamarron Date: Tue, 15 Aug 2023 14:26:22 -0700 Subject: [PATCH 3/4] versionne --- pyproject.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pyproject.toml b/pyproject.toml index 8a0523ea..a1371dbb 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = 'mmda' -version = '0.9.10' +version = '0.9.11' description = 'MMDA - multimodal document analysis' authors = [ {name = 'Allen Institute for Artificial Intelligence', email = 'contact@allenai.org'}, From 396e739c948c7e02d282d03e14d4a67fab23d3e8 Mon Sep 17 00:00:00 2001 From: Angele Zamarron Date: Tue, 15 Aug 2023 15:16:12 -0700 Subject: [PATCH 4/4] meant for this to be True --- src/mmda/parsers/grobid_augment_existing_document_parser.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mmda/parsers/grobid_augment_existing_document_parser.py b/src/mmda/parsers/grobid_augment_existing_document_parser.py index 452ee5f3..24f3ca27 100644 --- a/src/mmda/parsers/grobid_augment_existing_document_parser.py +++ b/src/mmda/parsers/grobid_augment_existing_document_parser.py @@ -113,7 +113,7 @@ def _parse_xml_onto_doc(self, xml: str, doc: Document) -> Document: ) doc.annotate( headings=box_groups_to_span_groups( - heading_box_groups, doc, center=False + heading_box_groups, doc, center=True ) ) doc.annotate(