Skip to content

360AILABNLP/FlowCE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

FlowCE: First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models

Code License Data License

With the development of Multimodal Large Language Models (MLLMs) technology, its general capabilities are increasingly powerful. To evaluate the various abilities of MLLMs, numerous evaluation systems have emerged. But now there is still a lack of a comprehensive method to evaluate MLLMs in the tasks related to flowcharts, which are very important in daily life and work. We propose the first comprehensive method, FlowCE, to assess MLLMs across various dimensions for tasks related to flowcharts. It encompasses evaluating MLLMs’ abilities in Reasoning, Localization Recognition, Information Extraction, Logical Verification, and Summarization on flowcharts. However, we find that even the GPT4o model achieves only a score of 56.63. Among open-source models, Phi-3-Vision obtained the highest score of 49.97. We hope that FlowCE can contribute to future research on MLLMs for tasks based on flowcharts.

Release

Usage and License Notices: The data, code, and checkpoint are intended and licensed for research use only. They are also restricted to use that follow the license agreement of LLaMA, Vicuna, GPT-4, Qwen, and LLaVA.

FlowCE Benchmark Dataset

FlowCE is built upon 500 real-world flowcharts, ensuring an ample diversity in each chart and across five dimensions in real flowchart scenarios, including reasoning, information extraction, localization recognition, summarization, and logical verification.

image

The full datasets can be obtained from the following address: https://github.com/360AILABNLP/FlowCE/tree/main/flowce_benchmark

For your attention: The image data used in this work is solely for scientific research purposes, and only the source link address for each image is made available. if you want to get the images, just download these images from the 5 different task image urls file given.

Process of creating and evaluating FlowCE

route

Data samples of FlowCE

image

covers 5 evaluation dimensions. Each evaluation dimension contains human-annotated question-answer pairs.

Result

1. Statistics of compared API-based and open-source MLLMs

image

2.Detailed evaluation results on FlowCE across different models

image

License

The content of this project itself is licensed under LICENSE.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published