python将pdf转换成word

发布于:2025-03-07 ⋅ 阅读:(71) ⋅ 点赞:(0)

说明:
我计划用python,把pdf文件转换成word文件
step1:把python环境安装好,然后把helloworld跑起来

step2:安装依赖: 首先需要安装必要的Python库,在终端中运行,会开始下载依赖包,等待下载完成

C:\Users\Administrator>pip --version
pip 25.0.1 from C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\site-packages\pip (python 3.12)

C:\Users\Administrator>pip install pdf2docx python-docx

Collecting pdf2docx
  Using cached pdf2docx-0.5.8-py3-none-any.whl.metadata (3.2 kB)
  Downloading fonttools-4.56.0-cp311-cp311-win_amd64.whl.metadata (103 kB)
Collecting numpy>=1.17.2 (from pdf2docx)
  Downloading numpy-2.2.3-cp311-cp311-win_amd64.whl.metadata (60 kB)
Collecting opencv-python-headless>=4.5 (from pdf2docx)
  Using cached opencv_python_headless-4.11.0.86-cp37-abi3-win_amd64.whl.metadata (20 kB)
Collecting fire>=0.3.0 (from pdf2docx)
  Using cached fire-0.7.0.tar.gz (87 kB)
  Preparing metadata (setup.py) ... done
Collecting lxml>=3.1.0 (from python-docx)
  Downloading lxml-5.3.1-cp311-cp311-win_amd64.whl.metadata (3.8 kB)
Collecting typing-extensions>=4.9.0 (from python-docx)
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting termcolor (from fire>=0.3.0->pdf2docx)
  Using cached termcolor-2.5.0-py3-none-any.whl.metadata (6.1 kB)
Using cached pdf2docx-0.5.8-py3-none-any.whl (132 kB)
Using cached python_docx-1.1.2-py3-none-any.whl (244 kB)
Downloading fonttools-4.56.0-cp311-cp311-win_amd64.whl (2.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 104.8 kB/s eta 0:00:00
Downloading lxml-5.3.1-cp311-cp311-win_amd64.whl (3.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.8/3.8 MB 91.1 kB/s eta 0:00:00
Downloading numpy-2.2.3-cp311-cp311-win_amd64.whl (12.9 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.9/12.9 MB 95.3 kB/s eta 0:00:00
Downloading opencv_python_headless-4.11.0.86-cp37-abi3-win_amd64.whl (39.4 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 39.4/39.4 MB 94.9 kB/s eta 0:00:00
Downloading pymupdf-1.25.3-cp39-abi3-win_amd64.whl (16.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.5/16.5 MB 91.1 kB/s eta 0:00:00
Downloading typing_extensions-4.12.2-py3-none-any.whl (37 kB)
Downloading termcolor-2.5.0-py3-none-any.whl (7.8 kB)
Building wheels for collected packages: fire
  Building wheel for fire (setup.py) ... done
  Created wheel for fire: filename=fire-0.7.0-py3-none-any.whl size=114261 sha256=d635b5056afde4d2d8f1b04a79ad6c18207aef0d16c499c683ef7eac73194231
  Stored in directory: c:\users\administrator\appdata\local\pip\cache\wheels\46\54\24\1624fd5b8674eb1188623f7e8e17cdf7c0f6c24b609dfb8a89
Successfully built fire
Installing collected packages: typing-extensions, termcolor, PyMuPDF, numpy, lxml, fonttools, python-docx, opencv-python-headless, fire, pdf2docx
Successfully installed PyMuPDF-1.25.3 fire-0.7.0 fonttools-4.56.0 lxml-5.3.1 numpy-2.2.3 opencv-python-headless-4.11.0.86 pdf2docx-0.5.8 python-docx-1.1.2 termcolor-2.5.0 typing-extensions-4.12.2


step3:C:\Users\Administrator\PycharmProjects\PythonProject2.venv\Scripts\activate_this.py

# pdf_to_word.py
import os
from pdf2docx import Converter
from datetime import datetime


def convert_pdf_to_word(input_path, output_folder):
    """
    将PDF转换为Word文档

    参数:
    input_path (str): 输入的PDF文件路径
    output_folder (str): 输出文件夹路径

    返回:
    str: 生成的Word文件路径
    """
    try:
        # 生成输出文件名(带时间戳)
        filename = os.path.basename(input_path)
        output_name = f"{os.path.splitext(filename)[0]}_converted_{datetime.now().strftime('%Y%m%d%H%M%S')}.docx"
        output_path = os.path.join(output_folder, output_name)
        # 初始化转换器
        cv = Converter(input_path)

        # 设置进度回调函数
        def progress_callback(percentage):
            print(f"\r转换进度: {percentage:.0f}%", end='', flush=True)

        # 执行转换(多线程优化)
        cv.convert(output_path,
                   start=0,
                   end=None,
                   progress_callback=progress_callback)

        cv.close()
        print(f"\n转换成功!文件已保存至:{output_path}")
        return output_path
    except Exception as e:
        print(f"\n转换失败:{str(e)}")
        return None


if __name__ == "__main__":
    # 输入输出路径配置
    desktop = os.path.join(os.path.expanduser("~"), "Desktop")
    input_pdf = r"C:\Users\Administrator\Desktop\qe.pdf"
    output_dir = desktop
    # 检查输入文件
    if not os.path.exists(input_pdf):
        print(f"错误:输入的PDF文件不存在 - {input_pdf}")
        exit(1)
    # 创建输出目录(如果不存在)
    os.makedirs(output_dir, exist_ok=True)
    # 执行转换
    print("开始PDF转换...")
    result = convert_pdf_to_word(input_pdf, output_dir)

    if result:
        print("操作完成!")
    else:
        print("转换过程中出现问题。")

step4:点击run运行

C:\Users\Administrator\PycharmProjects\PythonProject2\.venv\Scripts\python.exe C:\Users\Administrator\PycharmProjects\PythonProject2\.venv\Scripts\activate_this.py 
[INFO] Start to convert C:\Users\Administrator\Desktop\files.pdf
[INFO] [1/4] Opening document...
[INFO] [2/4] Analyzing document...
开始PDF转换...
[INFO] [3/4] Parsing pages...
[INFO] (1/1) Page 1
[INFO] [4/4] Creating pages...
[INFO] (1/1) Page 1
[INFO] Terminated in 0.10s.

转换成功!文件已保存至:C:\Users\Administrator\Desktop\files_converted_20250306142813.docx
操作完成!

Process finished with exit code 0

end


网站公告

今日签到

点亮在社区的每一天
去签到