Update README.md

dataforgoodfr · May 1, 2024 · 60c421d · 60c421d
1 parent b7483c5
commit 60c421d
Showing 1 changed file with 21 additions and 19 deletions.
diff --git a/eval/README.md b/eval/README.md
@@ -1,40 +1,42 @@
 # Evaluation of the table extraction
 
-## Setup
+## Evaluate extractions with the streamlit eval app
 
-To run the evaluation scripts, you need some additional requirements that are not listed in the project dependencies.
+To get started, run the streamlit eval app as:
 
 ```
-apt-get install wkhtmltopdf
+streamlit run eval/eval_app.py eval/data/data_step2_before-currency-unit_eval.csv
 ```
 
-## Generate evaluation data
+This app allows you to visually compare tables extracted via multiples methodologies and for multiple reports. It needs two input files (only one mandatory):
+- *[Optional]* The optional REF data file `data_step2_before-currency-unit_eval.csv` is a cleaned up version of `data_step2_before-currency-unit.csv`. The latter file contains reference data extracted and manually cleaned up by the TaxObservatory team and allows you to benchmark the extractions against it.
+- *[Mandatory]* At launch, the app will request you to provide a pickle file with extracted data. Select `eval_20240408_200249.plk` in the `eval/data/` directory to not have to generate evaluation data yourself and get started easily!
+
+## Generate your own evaluation data
 
-First, you need to generate evaluation data with the `eval_table_extraction.py` script. This script will iterate through several reports and apply the set of table extraction algorithms you provided in your yaml configuration. 
+You can instead generate your own picke file containing extracted data.
 
-Check out `configs/eval_table_extraction.yaml` for a suitable yaml configuration.
+### Setup
 
-You can then call the script as :
+Install the following package that is used to generate PDF output files.
 
 ```
-python eval/eval_table_extraction.py configs/eval_table_extraction.yaml
-./example_set/inputs/ ./example_set/extractions
+apt-get install wkhtmltopdf
 ```
 
-This will apply the pipeline for all the reports in the `./example_set/inputs` directory and save :
-
-- the extracted tables with all the algorithms in one output PDF file per input report in the
-  `./example_set/extractions` directory
-- all the extracted assets in a pickle file `eval_xxxx.pkl` located in the `eval/data/` directory
+### Data generation
 
-## Evaluation with a streamlit app
+Run the `eval_table_extraction.py` script. This script will iterate through several reports and apply the set of table extraction algorithms you provided in your yaml configuration. Check out `configs/eval_table_extraction.yaml` for a suitable yaml configuration.
 
-To facilitate the evaluation of the extractions, you can run the streamlit app `eval/eval_app.py` as:
+You can run the script as:
 
 ```
-streamlit run eval/eval_app.py eval/data/data_step2_before-currency-unit_eval.csv
+python eval/eval_table_extraction.py configs/eval_table_extraction.yaml
+./example_set/inputs/ ./example_set/extractions
 ```
 
-`data_step2_before-currency-unit_eval.csv` is a cleaned up version of the `data_step2_before-currency-unit.csv` file which contains reference data extracted and manually cleaned up by the TaxObservatory team.
+This will apply the pipeline for all the reports in the `./example_set/inputs` directory and save :
 
-At launch, you will be requested to provide a pickle file with extracted data. You might select `eval_20240408_200249.plk` from the `eval/data/` directory. It contains extracted tables for multiple reports and extractions and is a great way to get started.
+- the extracted tables with all the algorithms in one output PDF file per input report in the
+  `./example_set/extractions` directory
+- all the extracted assets in a pickle file `eval_xxxx.pkl` located in the `eval/data/` directory