GitHub - AtlasPublicPolicy/wetlands-tracker: Public repository for tracking permits published by the U.S. Army Corps in certain districts.

Wetlands Tracker

Overview

The U.S. Army Corps of Engineers (USACE) evaluates permit applications for any work, including construction and dredging, in the Nation’s navigable waters. The Wetlands Impact Tracker compiles public notices of those permit applications from USACE, typically stored in PDFs. The data pulled from these notices can help users better understand the impact of development projects on sensitive areas by revealing and summarizing individual notices, and aggregating data. We encourage you to use this tool to explore and better understand how development projects are impacting the communities you work with and live in.

Data

All data powering the dashboard is located on an AWS S3 bucket with public read access. Here are the links to the CSVs:

Prerequisite and Installation

Install Python: If you don't have Python installed, download and install it from the official Python website.
Clone or Download the Repository:
- If Git is installed, clone the repository using the following command in Git Bash on Windows or terminal in other systems:
```
cd the-path-you-would-like-to-hold-the-repository
git clone https://github.com/AtlasPublicPolicy/wetlands-tracker.git
```
- If you don't have Git installed, you can download the repository as a ZIP file from the GitHub page. Click on the "Code" button and select "Download ZIP." After downloading, extract the ZIP file to the directory of your choice.
Set Up and Activate a Virtual Environment in PowerShell or Command Prompt on Windows or in terminal in Other Systems:
- Create a new virtual environment:
```
# Navigate to the project directory:
cd the-path-you-hold-the-repository

# Create a virtual environment:
virtualenv venv

# or
python -m venv venv
```
- Activate the virtual environment:
  Windows:
```
.\venv\Scripts\activate
```
  macOS and Linux
```
source venv/bin/activate
```
Set up an AWS S3 bucket:
- A folder to place the scrapped data: dashboard-data
- A folder to place notice PDFs: full-pdf
  NOTE: If you do not want to use AWS S3 bucket, you may uncomment the 8th parameter in the configuration self.directory = "data_schema/" and the function to export tables to your directory in the main() [main_extractor.dataframe_to_csv(main_tbls[df_name], df_name, config.directory) for df_name in main_tbls] to store data locally.

Usage

In an active virtual environment

Set up the configuration in main.py:
- Create a file named "api_key.env" and provide the following keys:
  - AZURE_API_KEY=your_azure_api_key
  - AZURE_ENDPOINT=https://your-azure-endpoint.com
  - AWS_ACCESS_KEY_ID=your_aws_access_key_id
  - AWS_SECRET_ACCESS_KEY=your_aws_secret_access_key
  - OPENAI_API_KEY=your_openai_api_key
  - REDIVIS_API_KEY=your_redivis_api_key
- Modify the following parameters as needed:
  - update: Do you want to scrape all historical notices or only recently updated ones. 1, update; 0, first-time-scraping; default as 1. Note: Be cautious when setting the update parameter to 1 (scrape all historical notices), as it might run for an extended period and incur high costs for Azure and LLM services.
  - n_days: How many days in the past would you like to search for updated notices: numeric # from 0 to 500; default as 14.
  - max_notices: How many maximum notices (sorted by date) to download?
  - district: which district you would like to scrape: "New Orleans", "Galveston", "Jacksonville", "Mobile", or "all"; default as "all".
  - tbl_to_upload: which table you would like to upload to Redivis? Any of tables in the list = ["main", "manager", "character", "mitigation", "location", "fulltext", "summary", "wetland", "embed", "validation", "aws", "geocoded"], "none" or "all"; defaul as "all".
  - price_cap: For Azure summarization, please set a price cap; defaul as 5 ($).
  - n_sentences: How many sentences you would like to have for summarization; defaul as 4.
  - directory: file directory; default as "data_schema/".
  - overwrite_redivis: Overwrite file with same name on Redivis; 1, yes; 0, no; default as no.
  - skipPaid: Skip paid services including OpenAI and Azure Summaries. 1, skip; 0, do not skip; default = 0
  - tesseract_path: If you have problem running OCR(Optical Character Recognition), please specify the path for tesseract.exe such as "C:/Program Files/Tesseract-OCR/tesseract.exe"; default as None.
  - GPT_MODEL_SET: Set GPT model; default as "gpt-3.5-turbo-0613"
Run main.py in the virtual environment:
```
(venv) $ python main.py
```

File Descriptions

requirements.txt: Lists all Python dependencies required for running the project.
Other scripts:

Troubleshooting

log.txt: You can find messages, warnings, and errors here.
error_report.md: This file captures the potential problems with the PDF reading process, special notices, Regex patterns, and LLM performance, which will not break the running and will not be reported in log.txt.

Contributing

Users are encouraged to report issues directly in the GitHub repository. We plan to maintain this repository at-least through 2024. While we welcome pull requests, we cannot guarantee that they will be reviewed or accepted in a timely manner.

Contact

Please reach out to us at [email protected] with any questions or comments.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
analysis_dec2023/.Rproj.user		analysis_dec2023/.Rproj.user
finetune_openAI		finetune_openAI
tempdir/data-gym-cache		tempdir/data-gym-cache
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
error_report.py		error_report.py
llmFunctions.py		llmFunctions.py
main.py		main.py
main_extractor.py		main_extractor.py
requirements.txt		requirements.txt
scrape_pdf.py		scrape_pdf.py
scrape_rss_webpage.py		scrape_rss_webpage.py
workflow.png		workflow.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wetlands Tracker

Overview

Data

Prerequisite and Installation

Usage

File Descriptions

Troubleshooting

Contributing

Contact

About

Releases

Packages

Contributors 3

Languages

License

AtlasPublicPolicy/wetlands-tracker

Folders and files

Latest commit

History

Repository files navigation

Wetlands Tracker

Overview

Data

Prerequisite and Installation

Usage

File Descriptions

Troubleshooting

Contributing

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages