Python全功能PDF工具箱GUI:支持转换、加密、旋转、图片提取、日志记录等多功能操作

发布于:2025-04-20 ⋅ 阅读:(35) ⋅ 点赞:(0)

使用Python打造一款集成 PDF转换、编辑、加密、解密、图片提取、日志追踪 等多个功能于一体的桌面工具应用(Tkinter + ttkbootstrap + PyPDF2 等库)。

✨项目背景与开发动机

在日常办公或学习中,我们经常会遇到各种关于PDF文件的操作需求,比如:

  • 将 PDF 转换成 Word 文档编辑

  • 合并多个PDF为一个

  • 分割PDF,按页导出

  • 去掉PDF水印、加密、解密

  • 从PDF中提取图片

  • 将PDF每一页转为图片

  • 旋转/删除指定页面

  • 记录最近处理文件和操作日志

虽然市场上存在很多PDF软件,但免费+轻量+跨平台的桌面工具仍较少。于是我决定用 Python 打造一个集所有功能于一身的 PDF 工具箱,方便自己和他人使用!

🛠️主要技术栈与依赖

模块 功能
tkinter + ttkbootstrap GUI 构建
tkinterDnD2 拖拽支持
PyPDF2, pdfplumber, fitz(PyMuPDF) PDF编辑核心
pdf2image, PIL PDF转图片
docx Word文档生成
threading, json, os 多线程处理和状态记录

📂功能结构总览

该程序结构如下图(部分简化):

PDFToolboxApp
├── PDF 转 Word(支持去水印)
├── PDF 分割(每页单独导出)
├── PDF 合并
├── PDF 加密 / 解密
├── 提取 PDF 中的图片
├── 页面旋转
├── 删除指定页面
├── PDF 转图片
└── 查看日志记录

接下来我们分模块详细介绍每个功能实现。


🧾基础组件与通用功能

🔹 日志与历史文件记录

  • log_action(action, file)
    每次用户进行操作(如打开、转换文件)都记录到日志文件 pdf_toolbox_logs.txt 中。

  • add_to_history(filepath)
    最近打开文件记录在 pdf_toolbox_history.json 中,便于下次快速访问。

  • get_recent_history()
    读取最近10个操作过的文件。

🔹 去水印功能

def remove_watermark(text):
    WATERMARK_KEYWORDS = ["Confidential", "Watermark", "Do Not Copy", "内部资料", "机密"]
    return '\n'.join(line for line in text.split('\n') if not any(wm in line for wm in WATERMARK_KEYWORDS))

📄PDF 转 Word(支持去水印)

pdfplumber.open(path) → page.extract_text() → remove_watermark() → Word文档

  • 使用 pdfplumber 提取 PDF 每一页文本内容

  • 可勾选“去水印”选项过滤掉机密/水印文本

  • 保存为 .docx 格式,支持拖拽上传PDF


✂️PDF 分割

每页导出为独立文件:

for i, page in enumerate(reader.pages): 
    writer.add_page(page) 
    writer.write(f"page_{i+1}.pdf")

用户选择PDF并设置导出目录后,将每一页保存为独立PDF文件。


➕PDF 合并

多文件合并为一个:

for f in files: 
    for p in PdfReader(f).pages:     
        writer.add_page(p)

支持多选文件,按照选择顺序合并。


🔐PDF 加密 / 解密

  • 加密: 输入密码 → 写入新PDF

  • 解密: 输入正确密码 → 导出解密版本

writer.encrypt(password) reader.decrypt(password)


🖼️提取PDF图片

使用 fitz(PyMuPDF)逐页扫描PDF,提取嵌入图片并保存为 PNG:

for page_index in range(len(doc)): 
    for img in doc[page_index].get_images(full=True): 
        pix = fitz.Pixmap(doc, img[0]) pix.save(...)

🔄页面旋转

用户输入:

  • 页码:1,2,3

  • 角度:90

代码示例:

if i + 1 in pages: 
    page.rotate(angle)


🗑️删除指定页面

输入页码列表,如 2,5,7,将这些页从PDF中删除并保存新版本。


🖼️PDF 转图片(每页转 PNG)

images = convert_from_path(file) 
for i, img in enumerate(images): 
    img.save(f"page_{i+1}.png")

适合需要提取PDF为PPT使用或网页插图等。


🧾日志查看

简单读取 pdf_toolbox_logs.txt 文件,展示操作记录,便于溯源。

with open(LOG_FILE) as f: self.log_text.insert(tk.END, f.read())


🧵多线程支持

通过 @threaded 装饰器包装耗时操作(如转换)避免 GUI 卡死:

def threaded(fn): 
    def wrapper(*args, **kwargs):     
        threading.Thread(target=fn, args=args, kwargs=kwargs).start()


🎬运行入口

if __name__ == '__main__': 
    app = TkinterDnD.Tk() 
    PDFToolboxApp(app) 
    app.mainloop()

使用 TkinterDnD2 支持拖拽,整体体验更为现代化。


✅总结亮点

  • 支持所有常用 PDF 操作(转换、编辑、处理、日志)

  • 界面清晰、操作直观、支持拖拽上传

  • 轻量级、不依赖大型软件,可本地私有化使用

  • 拥有日志与历史记录,适合办公自动化场景


📦部署建议

安装依赖:

pip install tkinterdnd2 ttkbootstrap PyPDF2 pdfplumber python-docx pymupdf pdf2image pillow

如果你要将其打包为桌面可执行程序:

pyinstaller -F -w pdf_toolbox.py

如果你觉得这个项目实用,欢迎点赞、评论支持 🧡
如需源码或扩展功能可以私信我,我会继续更新!

以下为完整代码:

import tkinter as tk
from tkinter import filedialog, messagebox, simpledialog
from tkinterdnd2 import TkinterDnD, DND_FILES
import ttkbootstrap as ttk
from ttkbootstrap.constants import *
from PyPDF2 import PdfReader, PdfWriter
import pdfplumber
from docx import Document
import fitz  # PyMuPDF
from pdf2image import convert_from_path
from PIL import Image
import os
import threading
import json
from datetime import datetime

# 日志 & 历史记录配置
LOG_FILE = "pdf_toolbox_logs.txt"
HISTORY_FILE = "pdf_toolbox_history.json"

def log_action(action: str, file: str = ""):
    with open(LOG_FILE, "a", encoding="utf-8") as f:
        now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        f.write(f"[{now}] {action}: {file}\n")

def add_to_history(filepath: str):
    filepath = os.path.abspath(filepath)
    history = []
    if os.path.exists(HISTORY_FILE):
        with open(HISTORY_FILE, "r", encoding="utf-8") as f:
            try:
                data = json.load(f)
                history = data.get("recent_files", [])
            except:
                history = []
    if filepath not in history:
        history.insert(0, filepath)
        history = history[:10]
    with open(HISTORY_FILE, "w", encoding="utf-8") as f:
        json.dump({"recent_files": history}, f, indent=2, ensure_ascii=False)

def get_recent_history():
    if os.path.exists(HISTORY_FILE):
        with open(HISTORY_FILE, "r", encoding="utf-8") as f:
            try:
                data = json.load(f)
                return data.get("recent_files", [])
            except:
                return []
    return []

def remove_watermark(text):
    WATERMARK_KEYWORDS = ["Confidential", "Watermark", "Do Not Copy", "内部资料", "机密"]
    lines = text.split('\n')
    return '\n'.join(line for line in lines if not any(wm in line for wm in WATERMARK_KEYWORDS))

def threaded(fn):
    def wrapper(*args, **kwargs):
        threading.Thread(target=fn, args=args, kwargs=kwargs).start()
    return wrapper
class PDFToolboxApp:
    def __init__(self, root):
        self.root = root
        self.root.title("PDF 工具箱 🧰 全功能版")
        self.root.geometry("900x650")
        self.root.resizable(True, True)
        self.build_tabs()

    def build_tabs(self):
        self.notebook = ttk.Notebook(self.root, bootstyle="primary")
        self.notebook.pack(fill=tk.BOTH, expand=True, padx=10, pady=10)
        self.pages = {}
        features = {
            "PDF 转 Word": self.build_pdf_to_word_tab,
            "PDF 分割": self.build_split_tab,
            "PDF 合并": self.build_merge_tab,
            "PDF 加密": self.build_encrypt_tab,
            "PDF 解密": self.build_decrypt_tab,
            "提取图片": self.build_extract_images_tab,
            "页面旋转": self.build_rotate_tab,
            "删除页面": self.build_delete_pages_tab,
            "PDF 转图片": self.build_pdf_to_image_tab,
            "查看日志": self.build_log_viewer_tab
        }
        for title, builder in features.items():
            frame = ttk.Frame(self.notebook, padding=20)
            self.pages[title] = frame
            self.notebook.add(frame, text=title)
            builder(frame)

    def select_file(self, variable):
        file = filedialog.askopenfilename(filetypes=[("PDF files", "*.pdf")])
        if file:
            variable.set(file)
            add_to_history(file)
            log_action("选择文件", file)

    def on_drop_file(self, event, variable):
        filepath = event.data.strip('{}')
        if os.path.isfile(filepath):
            variable.set(filepath)
            add_to_history(filepath)
            log_action("拖拽导入", filepath)
    def build_pdf_to_word_tab(self, frame):
        ttk.Label(frame, text="PDF 转 Word(支持拖拽、可去水印)", font=("Arial", 12, "bold")).pack(pady=10)
        self.pdf_path_var = tk.StringVar()
        entry = ttk.Entry(frame, textvariable=self.pdf_path_var)
        entry.pack(fill='x', expand=True, padx=10, pady=5)

        # 拖拽支持
        entry.drop_target_register(DND_FILES)
        entry.dnd_bind('<<Drop>>', lambda e: self.on_drop_file(e, self.pdf_path_var))

        ttk.Button(frame, text="选择 PDF 文件", command=lambda: self.select_file(self.pdf_path_var)).pack(pady=5)

        self.remove_wm_var = tk.BooleanVar(value=True)
        ttk.Checkbutton(frame, text="去除常见水印", variable=self.remove_wm_var).pack(pady=5)

        self.progress = ttk.Progressbar(frame, bootstyle="info-striped")
        self.progress.pack(fill='x', expand=True, padx=10, pady=10)

        ttk.Button(frame, text="转换为 Word", command=self.run_pdf_to_word).pack(pady=5)

    @threaded
    def run_pdf_to_word(self):
        path = self.pdf_path_var.get()
        if not path.endswith(".pdf"):
            messagebox.showwarning("警告", "请选择 PDF 文件")
            return
        save_path = filedialog.asksaveasfilename(defaultextension=".docx")
        if not save_path:
            return
        def update_progress(val): self.progress['value'] = val
        try:
            document = Document()
            with pdfplumber.open(path) as pdf:
                for i, page in enumerate(pdf.pages):
                    text = page.extract_text() or ""
                    if self.remove_wm_var.get():
                        text = remove_watermark(text)
                    document.add_paragraph(text)
                    update_progress((i + 1) / len(pdf.pages) * 100)
            document.save(save_path)
            messagebox.showinfo("完成", "Word 文件已保存!")
            log_action("PDF 转 Word", path)
        except Exception as e:
            messagebox.showerror("错误", str(e))
    def build_split_tab(self, frame):
        ttk.Label(frame, text="PDF 分割(每页导出为独立文件)", font=("Arial", 12)).pack(pady=10)
        ttk.Button(frame, text="选择 PDF 文件", command=self.split_pdf_gui).pack(pady=5)

    def split_pdf_gui(self):
        file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
        if not file:
            return
        output_dir = filedialog.askdirectory()
        if not output_dir:
            return
        try:
            reader = PdfReader(file)
            for i, page in enumerate(reader.pages):
                writer = PdfWriter()
                writer.add_page(page)
                out_path = os.path.join(output_dir, f"page_{i+1}.pdf")
                with open(out_path, "wb") as f:
                    writer.write(f)
            messagebox.showinfo("完成", "分割成功!")
            log_action("PDF 分割", file)
        except Exception as e:
            messagebox.showerror("错误", str(e))

    def build_merge_tab(self, frame):
        ttk.Label(frame, text="PDF 合并(多个 PDF 合为一个)", font=("Arial", 12)).pack(pady=10)
        ttk.Button(frame, text="选择多个 PDF 文件", command=self.merge_pdfs_gui).pack(pady=5)

    def merge_pdfs_gui(self):
        files = filedialog.askopenfilenames(filetypes=[("PDF Files", "*.pdf")])
        if not files:
            return
        output = filedialog.asksaveasfilename(defaultextension=".pdf")
        if not output:
            return
        try:
            writer = PdfWriter()
            for f in files:
                reader = PdfReader(f)
                for p in reader.pages:
                    writer.add_page(p)
            with open(output, "wb") as f:
                writer.write(f)
            messagebox.showinfo("完成", "合并成功!")
            log_action("PDF 合并", ",".join(files))
        except Exception as e:
            messagebox.showerror("错误", str(e))

    def build_encrypt_tab(self, frame):
        ttk.Label(frame, text="PDF 加密(设置访问密码)", font=("Arial", 12)).pack(pady=10)
        ttk.Button(frame, text="选择 PDF 文件", command=self.encrypt_pdf_gui).pack(pady=5)

    def encrypt_pdf_gui(self):
        file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
        if not file:
            return
        password = simpledialog.askstring("加密密码", "请输入加密密码:", show="*")
        if not password:
            return
        output = filedialog.asksaveasfilename(defaultextension=".pdf")
        if not output:
            return
        try:
            reader = PdfReader(file)
            writer = PdfWriter()
            for page in reader.pages:
                writer.add_page(page)
            writer.encrypt(password)
            with open(output, "wb") as f:
                writer.write(f)
            messagebox.showinfo("完成", "加密成功!")
            log_action("PDF 加密", file)
        except Exception as e:
            messagebox.showerror("错误", str(e))

    def build_decrypt_tab(self, frame):
        ttk.Label(frame, text="PDF 解密(需要密码)", font=("Arial", 12)).pack(pady=10)
        ttk.Button(frame, text="选择加密 PDF 文件", command=self.decrypt_pdf_gui).pack(pady=5)

    def decrypt_pdf_gui(self):
        file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
        if not file:
            return
        password = simpledialog.askstring("解密密码", "请输入密码:", show="*")
        if not password:
            return
        output = filedialog.asksaveasfilename(defaultextension=".pdf")
        if not output:
            return
        try:
            reader = PdfReader(file)
            if reader.is_encrypted:
                reader.decrypt(password)
            writer = PdfWriter()
            for page in reader.pages:
                writer.add_page(page)
            with open(output, "wb") as f:
                writer.write(f)
            messagebox.showinfo("完成", "解密成功!")
            log_action("PDF 解密", file)
        except Exception as e:
            messagebox.showerror("错误", str(e))

    def build_extract_images_tab(self, frame):
        ttk.Label(frame, text="提取 PDF 中的所有图片", font=("Arial", 12)).pack(pady=10)
        ttk.Button(frame, text="选择 PDF 文件", command=self.extract_images_gui).pack(pady=5)

    def extract_images_gui(self):
        file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
        if not file:
            return
        output_dir = filedialog.askdirectory()
        if not output_dir:
            return
        try:
            doc = fitz.open(file)
            for page_index in range(len(doc)):
                for img_index, img in enumerate(doc[page_index].get_images(full=True)):
                    xref = img[0]
                    pix = fitz.Pixmap(doc, xref)
                    output_path = os.path.join(output_dir, f"page{page_index+1}_img{img_index+1}.png")
                    if pix.n < 5:
                        pix.save(output_path)
                    else:
                        pix = fitz.Pixmap(fitz.csRGB, pix)
                        pix.save(output_path)
                    pix = None
            messagebox.showinfo("完成", "图片提取成功!")
            log_action("提取图片", file)
        except Exception as e:
            messagebox.showerror("错误", str(e))

    def build_rotate_tab(self, frame):
        ttk.Label(frame, text="页面旋转(输入页码如 1,2,5)", font=("Arial", 12)).pack(pady=10)
        ttk.Button(frame, text="选择 PDF 文件", command=self.choose_rotate_file).pack(pady=5)
        self.rotate_pages_var = tk.StringVar()
        ttk.Entry(frame, textvariable=self.rotate_pages_var).pack(fill='x', padx=10, pady=5)
        self.rotate_angle_var = tk.StringVar(value="90")
        ttk.Entry(frame, textvariable=self.rotate_angle_var).pack(fill='x', padx=10, pady=5)
        ttk.Button(frame, text="执行旋转", command=self.rotate_pdf_gui).pack(pady=5)

    def choose_rotate_file(self):
        self.rotate_file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])

    def rotate_pdf_gui(self):
        if not hasattr(self, "rotate_file"):
            return
        pages = list(map(int, self.rotate_pages_var.get().split(",")))
        angle = int(self.rotate_angle_var.get())
        output = filedialog.asksaveasfilename(defaultextension=".pdf")
        if not output:
            return
        try:
            reader = PdfReader(self.rotate_file)
            writer = PdfWriter()
            for i, page in enumerate(reader.pages):
                if i + 1 in pages:
                    page.rotate(angle)
                writer.add_page(page)
            with open(output, "wb") as f:
                writer.write(f)
            messagebox.showinfo("完成", "旋转成功!")
            log_action("页面旋转", self.rotate_file)
        except Exception as e:
            messagebox.showerror("错误", str(e))
    def build_delete_pages_tab(self, frame):
        ttk.Label(frame, text="删除指定页(如 2,4,5)", font=("Arial", 12)).pack(pady=10)
        ttk.Button(frame, text="选择 PDF 文件", command=self.choose_delete_file).pack(pady=5)
        self.delete_pages_var = tk.StringVar()
        ttk.Entry(frame, textvariable=self.delete_pages_var).pack(fill='x', padx=10, pady=5)
        ttk.Button(frame, text="删除并保存", command=self.delete_pages_gui).pack(pady=5)

    def choose_delete_file(self):
        self.delete_file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])

    def delete_pages_gui(self):
        if not hasattr(self, "delete_file"):
            return
        pages_to_delete = list(map(int, self.delete_pages_var.get().split(",")))
        output = filedialog.asksaveasfilename(defaultextension=".pdf")
        if not output:
            return
        try:
            reader = PdfReader(self.delete_file)
            writer = PdfWriter()
            for i, page in enumerate(reader.pages):
                if i + 1 not in pages_to_delete:
                    writer.add_page(page)
            with open(output, "wb") as f:
                writer.write(f)
            messagebox.showinfo("完成", "删除成功!")
            log_action("删除页面", self.delete_file)
        except Exception as e:
            messagebox.showerror("错误", str(e))

    def build_pdf_to_image_tab(self, frame):
        ttk.Label(frame, text="PDF 转图片(每页保存为 PNG)", font=("Arial", 12)).pack(pady=10)
        ttk.Button(frame, text="选择 PDF 文件", command=self.convert_pdf_to_images).pack(pady=5)

    def convert_pdf_to_images(self):
        file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
        if not file:
            return
        outdir = filedialog.askdirectory()
        if not outdir:
            return
        try:
            images = convert_from_path(file)
            for i, img in enumerate(images):
                img.save(os.path.join(outdir, f"page_{i+1}.png"), "PNG")
            messagebox.showinfo("完成", "转换成功!")
            log_action("PDF 转图片", file)
        except Exception as e:
            messagebox.showerror("错误", str(e))

    def build_log_viewer_tab(self, frame):
        ttk.Label(frame, text="操作日志查看", font=("Arial", 12)).pack(pady=10)
        self.log_text = tk.Text(frame, wrap="word", height=30)
        self.log_text.pack(fill='both', expand=True)
        ttk.Button(frame, text="刷新日志", command=self.load_logs).pack(pady=5)

    def load_logs(self):
        if os.path.exists(LOG_FILE):
            with open(LOG_FILE, "r", encoding="utf-8") as f:
                self.log_text.delete(1.0, tk.END)
                self.log_text.insert(tk.END, f.read())
        else:
            self.log_text.insert(tk.END, "暂无日志记录。")

# 主程序入口
if __name__ == '__main__':
    app = TkinterDnD.Tk()
    PDFToolboxApp(app)
    app.mainloop()

网站公告

今日签到

点亮在社区的每一天
去签到