Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collecte/fix #72

Merged
merged 20 commits into from
Apr 25, 2024
Merged

Collecte/fix #72

merged 20 commits into from
Apr 25, 2024

Conversation

FS-CS
Copy link
Collaborator

@FS-CS FS-CS commented Apr 22, 2024

Fixes:

New features:

  • CLI args to run the download script with desired parameters
  • URL search cache when running the script with identical search parameters to reduce Google JSON API credit usage
  • Improved Google API usage for more relevant results
  • Detailed logs and results summary of the download run

The README.md was updated to reflect those changes and guide the user

FS-CS and others added 15 commits April 11, 2024 16:08
# Aider chat conversation:

USER: add tqdm==4.66.1 to the "tool.poetry.dependencies" section

ASSISTANT: No changes are needed.
…e pfd is in "www.shell.com" the folder name will be "shell") and add some parameters to the google search (restrict time and restrict serch to pdf)
…the website hosting the pdf. changes to select results for the lat year and only search PDFs
…taclass for query comparison and URL cache reloads
…taclass for query comparison and URL cache reloads
Implement the new destination folder structure based on the URL source website in order to adress issue no.52 + other API improvements.
README.md Show resolved Hide resolved
collecte/pdf_downloader.py Outdated Show resolved Hide resolved
@RonanMorgan
Copy link
Contributor

RonanMorgan commented Apr 22, 2024

could you resolve conflicts before I merge pls ? (if it's ok for you @FS-CS @Phiphigengen

FS-CS and others added 5 commits April 24, 2024 22:06
- Add browser-like headers to GET request to avoid some timeouts
- Add test/data/company_names.csv for newcomers comfort
@FS-CS
Copy link
Collaborator Author

FS-CS commented Apr 24, 2024

PR prête pour merge

@RonanMorgan RonanMorgan merged commit 233f116 into main Apr 25, 2024
2 checks passed
@RonanMorgan RonanMorgan deleted the collecte/fix branch April 25, 2024 08:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants