PDF Text Extractor

A simple Python script to extract text from a PDF file and save it as a plain text file. This tool utilizes the PyMuPDF library (also known as fitz) for efficient text extraction from PDF documents.

Features

Extracts text from each page of a PDF.
Saves the extracted text to a .txt file.
Handles different text encodings and complex layouts.

Prerequisites

Before running the script, make sure you have the following installed:

Python 3.x
PyMuPDF library

You can install the necessary libraries using:

pip install PyMuPDF

Clone the Repository:

bash

git clone https://github.com/yourusername/pdf-text-extractor.git
cd pdf-text-extractor

Place Your PDF File: Place the PDF file you want to extract text from in the root directory of the repository.

Run the Script: Update the file path in the script and run it using:

bash

python extract_text.py

Make sure to update the path to your PDF file in the script:

python

pdf_path = r'C:\path\to\your\PDF\file.pdf'

Check the Output: The extracted text will be saved in a file named extracted_text.txt in the same directory.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Attention all u need.pdf		Attention all u need.pdf
ExtractPDF.ipynb		ExtractPDF.ipynb
README.md		README.md
extracted_text.txt		extracted_text.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Text Extractor

Features

Prerequisites

About

Releases

Packages

Languages

JadejaBhagirath/PDF_Extractor

Folders and files

Latest commit

History

Repository files navigation

PDF Text Extractor

Features

Prerequisites

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages