使用Python打造一款集成 PDF转换、编辑、加密、解密、图片提取、日志追踪 等多个功能于一体的桌面工具应用(Tkinter + ttkbootstrap + PyPDF2 等库)。
✨项目背景与开发动机
在日常办公或学习中,我们经常会遇到各种关于PDF文件的操作需求,比如:
将 PDF 转换成 Word 文档编辑
合并多个PDF为一个
分割PDF,按页导出
去掉PDF水印、加密、解密
从PDF中提取图片
将PDF每一页转为图片
旋转/删除指定页面
记录最近处理文件和操作日志
虽然市场上存在很多PDF软件,但免费+轻量+跨平台的桌面工具仍较少。于是我决定用 Python 打造一个集所有功能于一身的 PDF 工具箱,方便自己和他人使用!
🛠️主要技术栈与依赖
模块 | 功能 |
---|---|
tkinter + ttkbootstrap |
GUI 构建 |
tkinterDnD2 |
拖拽支持 |
PyPDF2 , pdfplumber , fitz(PyMuPDF) |
PDF编辑核心 |
pdf2image , PIL |
PDF转图片 |
docx |
Word文档生成 |
threading , json , os |
多线程处理和状态记录 |
📂功能结构总览
该程序结构如下图(部分简化):
PDFToolboxApp
├── PDF 转 Word(支持去水印)
├── PDF 分割(每页单独导出)
├── PDF 合并
├── PDF 加密 / 解密
├── 提取 PDF 中的图片
├── 页面旋转
├── 删除指定页面
├── PDF 转图片
└── 查看日志记录
接下来我们分模块详细介绍每个功能实现。
🧾基础组件与通用功能
🔹 日志与历史文件记录
log_action(action, file)
每次用户进行操作(如打开、转换文件)都记录到日志文件pdf_toolbox_logs.txt
中。add_to_history(filepath)
最近打开文件记录在pdf_toolbox_history.json
中,便于下次快速访问。get_recent_history()
读取最近10个操作过的文件。
🔹 去水印功能
def remove_watermark(text):
WATERMARK_KEYWORDS = ["Confidential", "Watermark", "Do Not Copy", "内部资料", "机密"]
return '\n'.join(line for line in text.split('\n') if not any(wm in line for wm in WATERMARK_KEYWORDS))
📄PDF 转 Word(支持去水印)
pdfplumber.open(path) → page.extract_text() → remove_watermark() → Word文档
使用
pdfplumber
提取 PDF 每一页文本内容可勾选“去水印”选项过滤掉机密/水印文本
保存为
.docx
格式,支持拖拽上传PDF
✂️PDF 分割
每页导出为独立文件:
for i, page in enumerate(reader.pages):
writer.add_page(page)
writer.write(f"page_{i+1}.pdf")
用户选择PDF并设置导出目录后,将每一页保存为独立PDF文件。
➕PDF 合并
多文件合并为一个:
for f in files:
for p in PdfReader(f).pages:
writer.add_page(p)
支持多选文件,按照选择顺序合并。
🔐PDF 加密 / 解密
加密: 输入密码 → 写入新PDF
解密: 输入正确密码 → 导出解密版本
writer.encrypt(password) reader.decrypt(password)
🖼️提取PDF图片
使用 fitz
(PyMuPDF)逐页扫描PDF,提取嵌入图片并保存为 PNG:
for page_index in range(len(doc)):
for img in doc[page_index].get_images(full=True):
pix = fitz.Pixmap(doc, img[0]) pix.save(...)
🔄页面旋转
用户输入:
页码:
1,2,3
角度:
90
代码示例:
if i + 1 in pages:
page.rotate(angle)
🗑️删除指定页面
输入页码列表,如 2,5,7
,将这些页从PDF中删除并保存新版本。
🖼️PDF 转图片(每页转 PNG)
images = convert_from_path(file)
for i, img in enumerate(images):
img.save(f"page_{i+1}.png")
适合需要提取PDF为PPT使用或网页插图等。
🧾日志查看
简单读取 pdf_toolbox_logs.txt
文件,展示操作记录,便于溯源。
with open(LOG_FILE) as f: self.log_text.insert(tk.END, f.read())
🧵多线程支持
通过 @threaded
装饰器包装耗时操作(如转换)避免 GUI 卡死:
def threaded(fn):
def wrapper(*args, **kwargs):
threading.Thread(target=fn, args=args, kwargs=kwargs).start()
🎬运行入口
if __name__ == '__main__':
app = TkinterDnD.Tk()
PDFToolboxApp(app)
app.mainloop()
使用 TkinterDnD2
支持拖拽,整体体验更为现代化。
✅总结亮点
支持所有常用 PDF 操作(转换、编辑、处理、日志)
界面清晰、操作直观、支持拖拽上传
轻量级、不依赖大型软件,可本地私有化使用
拥有日志与历史记录,适合办公自动化场景
📦部署建议
安装依赖:
pip install tkinterdnd2 ttkbootstrap PyPDF2 pdfplumber python-docx pymupdf pdf2image pillow
如果你要将其打包为桌面可执行程序:
pyinstaller -F -w pdf_toolbox.py
如果你觉得这个项目实用,欢迎点赞、评论支持 🧡
如需源码或扩展功能可以私信我,我会继续更新!
以下为完整代码:
import tkinter as tk
from tkinter import filedialog, messagebox, simpledialog
from tkinterdnd2 import TkinterDnD, DND_FILES
import ttkbootstrap as ttk
from ttkbootstrap.constants import *
from PyPDF2 import PdfReader, PdfWriter
import pdfplumber
from docx import Document
import fitz # PyMuPDF
from pdf2image import convert_from_path
from PIL import Image
import os
import threading
import json
from datetime import datetime
# 日志 & 历史记录配置
LOG_FILE = "pdf_toolbox_logs.txt"
HISTORY_FILE = "pdf_toolbox_history.json"
def log_action(action: str, file: str = ""):
with open(LOG_FILE, "a", encoding="utf-8") as f:
now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
f.write(f"[{now}] {action}: {file}\n")
def add_to_history(filepath: str):
filepath = os.path.abspath(filepath)
history = []
if os.path.exists(HISTORY_FILE):
with open(HISTORY_FILE, "r", encoding="utf-8") as f:
try:
data = json.load(f)
history = data.get("recent_files", [])
except:
history = []
if filepath not in history:
history.insert(0, filepath)
history = history[:10]
with open(HISTORY_FILE, "w", encoding="utf-8") as f:
json.dump({"recent_files": history}, f, indent=2, ensure_ascii=False)
def get_recent_history():
if os.path.exists(HISTORY_FILE):
with open(HISTORY_FILE, "r", encoding="utf-8") as f:
try:
data = json.load(f)
return data.get("recent_files", [])
except:
return []
return []
def remove_watermark(text):
WATERMARK_KEYWORDS = ["Confidential", "Watermark", "Do Not Copy", "内部资料", "机密"]
lines = text.split('\n')
return '\n'.join(line for line in lines if not any(wm in line for wm in WATERMARK_KEYWORDS))
def threaded(fn):
def wrapper(*args, **kwargs):
threading.Thread(target=fn, args=args, kwargs=kwargs).start()
return wrapper
class PDFToolboxApp:
def __init__(self, root):
self.root = root
self.root.title("PDF 工具箱 🧰 全功能版")
self.root.geometry("900x650")
self.root.resizable(True, True)
self.build_tabs()
def build_tabs(self):
self.notebook = ttk.Notebook(self.root, bootstyle="primary")
self.notebook.pack(fill=tk.BOTH, expand=True, padx=10, pady=10)
self.pages = {}
features = {
"PDF 转 Word": self.build_pdf_to_word_tab,
"PDF 分割": self.build_split_tab,
"PDF 合并": self.build_merge_tab,
"PDF 加密": self.build_encrypt_tab,
"PDF 解密": self.build_decrypt_tab,
"提取图片": self.build_extract_images_tab,
"页面旋转": self.build_rotate_tab,
"删除页面": self.build_delete_pages_tab,
"PDF 转图片": self.build_pdf_to_image_tab,
"查看日志": self.build_log_viewer_tab
}
for title, builder in features.items():
frame = ttk.Frame(self.notebook, padding=20)
self.pages[title] = frame
self.notebook.add(frame, text=title)
builder(frame)
def select_file(self, variable):
file = filedialog.askopenfilename(filetypes=[("PDF files", "*.pdf")])
if file:
variable.set(file)
add_to_history(file)
log_action("选择文件", file)
def on_drop_file(self, event, variable):
filepath = event.data.strip('{}')
if os.path.isfile(filepath):
variable.set(filepath)
add_to_history(filepath)
log_action("拖拽导入", filepath)
def build_pdf_to_word_tab(self, frame):
ttk.Label(frame, text="PDF 转 Word(支持拖拽、可去水印)", font=("Arial", 12, "bold")).pack(pady=10)
self.pdf_path_var = tk.StringVar()
entry = ttk.Entry(frame, textvariable=self.pdf_path_var)
entry.pack(fill='x', expand=True, padx=10, pady=5)
# 拖拽支持
entry.drop_target_register(DND_FILES)
entry.dnd_bind('<<Drop>>', lambda e: self.on_drop_file(e, self.pdf_path_var))
ttk.Button(frame, text="选择 PDF 文件", command=lambda: self.select_file(self.pdf_path_var)).pack(pady=5)
self.remove_wm_var = tk.BooleanVar(value=True)
ttk.Checkbutton(frame, text="去除常见水印", variable=self.remove_wm_var).pack(pady=5)
self.progress = ttk.Progressbar(frame, bootstyle="info-striped")
self.progress.pack(fill='x', expand=True, padx=10, pady=10)
ttk.Button(frame, text="转换为 Word", command=self.run_pdf_to_word).pack(pady=5)
@threaded
def run_pdf_to_word(self):
path = self.pdf_path_var.get()
if not path.endswith(".pdf"):
messagebox.showwarning("警告", "请选择 PDF 文件")
return
save_path = filedialog.asksaveasfilename(defaultextension=".docx")
if not save_path:
return
def update_progress(val): self.progress['value'] = val
try:
document = Document()
with pdfplumber.open(path) as pdf:
for i, page in enumerate(pdf.pages):
text = page.extract_text() or ""
if self.remove_wm_var.get():
text = remove_watermark(text)
document.add_paragraph(text)
update_progress((i + 1) / len(pdf.pages) * 100)
document.save(save_path)
messagebox.showinfo("完成", "Word 文件已保存!")
log_action("PDF 转 Word", path)
except Exception as e:
messagebox.showerror("错误", str(e))
def build_split_tab(self, frame):
ttk.Label(frame, text="PDF 分割(每页导出为独立文件)", font=("Arial", 12)).pack(pady=10)
ttk.Button(frame, text="选择 PDF 文件", command=self.split_pdf_gui).pack(pady=5)
def split_pdf_gui(self):
file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
if not file:
return
output_dir = filedialog.askdirectory()
if not output_dir:
return
try:
reader = PdfReader(file)
for i, page in enumerate(reader.pages):
writer = PdfWriter()
writer.add_page(page)
out_path = os.path.join(output_dir, f"page_{i+1}.pdf")
with open(out_path, "wb") as f:
writer.write(f)
messagebox.showinfo("完成", "分割成功!")
log_action("PDF 分割", file)
except Exception as e:
messagebox.showerror("错误", str(e))
def build_merge_tab(self, frame):
ttk.Label(frame, text="PDF 合并(多个 PDF 合为一个)", font=("Arial", 12)).pack(pady=10)
ttk.Button(frame, text="选择多个 PDF 文件", command=self.merge_pdfs_gui).pack(pady=5)
def merge_pdfs_gui(self):
files = filedialog.askopenfilenames(filetypes=[("PDF Files", "*.pdf")])
if not files:
return
output = filedialog.asksaveasfilename(defaultextension=".pdf")
if not output:
return
try:
writer = PdfWriter()
for f in files:
reader = PdfReader(f)
for p in reader.pages:
writer.add_page(p)
with open(output, "wb") as f:
writer.write(f)
messagebox.showinfo("完成", "合并成功!")
log_action("PDF 合并", ",".join(files))
except Exception as e:
messagebox.showerror("错误", str(e))
def build_encrypt_tab(self, frame):
ttk.Label(frame, text="PDF 加密(设置访问密码)", font=("Arial", 12)).pack(pady=10)
ttk.Button(frame, text="选择 PDF 文件", command=self.encrypt_pdf_gui).pack(pady=5)
def encrypt_pdf_gui(self):
file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
if not file:
return
password = simpledialog.askstring("加密密码", "请输入加密密码:", show="*")
if not password:
return
output = filedialog.asksaveasfilename(defaultextension=".pdf")
if not output:
return
try:
reader = PdfReader(file)
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
writer.encrypt(password)
with open(output, "wb") as f:
writer.write(f)
messagebox.showinfo("完成", "加密成功!")
log_action("PDF 加密", file)
except Exception as e:
messagebox.showerror("错误", str(e))
def build_decrypt_tab(self, frame):
ttk.Label(frame, text="PDF 解密(需要密码)", font=("Arial", 12)).pack(pady=10)
ttk.Button(frame, text="选择加密 PDF 文件", command=self.decrypt_pdf_gui).pack(pady=5)
def decrypt_pdf_gui(self):
file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
if not file:
return
password = simpledialog.askstring("解密密码", "请输入密码:", show="*")
if not password:
return
output = filedialog.asksaveasfilename(defaultextension=".pdf")
if not output:
return
try:
reader = PdfReader(file)
if reader.is_encrypted:
reader.decrypt(password)
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
with open(output, "wb") as f:
writer.write(f)
messagebox.showinfo("完成", "解密成功!")
log_action("PDF 解密", file)
except Exception as e:
messagebox.showerror("错误", str(e))
def build_extract_images_tab(self, frame):
ttk.Label(frame, text="提取 PDF 中的所有图片", font=("Arial", 12)).pack(pady=10)
ttk.Button(frame, text="选择 PDF 文件", command=self.extract_images_gui).pack(pady=5)
def extract_images_gui(self):
file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
if not file:
return
output_dir = filedialog.askdirectory()
if not output_dir:
return
try:
doc = fitz.open(file)
for page_index in range(len(doc)):
for img_index, img in enumerate(doc[page_index].get_images(full=True)):
xref = img[0]
pix = fitz.Pixmap(doc, xref)
output_path = os.path.join(output_dir, f"page{page_index+1}_img{img_index+1}.png")
if pix.n < 5:
pix.save(output_path)
else:
pix = fitz.Pixmap(fitz.csRGB, pix)
pix.save(output_path)
pix = None
messagebox.showinfo("完成", "图片提取成功!")
log_action("提取图片", file)
except Exception as e:
messagebox.showerror("错误", str(e))
def build_rotate_tab(self, frame):
ttk.Label(frame, text="页面旋转(输入页码如 1,2,5)", font=("Arial", 12)).pack(pady=10)
ttk.Button(frame, text="选择 PDF 文件", command=self.choose_rotate_file).pack(pady=5)
self.rotate_pages_var = tk.StringVar()
ttk.Entry(frame, textvariable=self.rotate_pages_var).pack(fill='x', padx=10, pady=5)
self.rotate_angle_var = tk.StringVar(value="90")
ttk.Entry(frame, textvariable=self.rotate_angle_var).pack(fill='x', padx=10, pady=5)
ttk.Button(frame, text="执行旋转", command=self.rotate_pdf_gui).pack(pady=5)
def choose_rotate_file(self):
self.rotate_file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
def rotate_pdf_gui(self):
if not hasattr(self, "rotate_file"):
return
pages = list(map(int, self.rotate_pages_var.get().split(",")))
angle = int(self.rotate_angle_var.get())
output = filedialog.asksaveasfilename(defaultextension=".pdf")
if not output:
return
try:
reader = PdfReader(self.rotate_file)
writer = PdfWriter()
for i, page in enumerate(reader.pages):
if i + 1 in pages:
page.rotate(angle)
writer.add_page(page)
with open(output, "wb") as f:
writer.write(f)
messagebox.showinfo("完成", "旋转成功!")
log_action("页面旋转", self.rotate_file)
except Exception as e:
messagebox.showerror("错误", str(e))
def build_delete_pages_tab(self, frame):
ttk.Label(frame, text="删除指定页(如 2,4,5)", font=("Arial", 12)).pack(pady=10)
ttk.Button(frame, text="选择 PDF 文件", command=self.choose_delete_file).pack(pady=5)
self.delete_pages_var = tk.StringVar()
ttk.Entry(frame, textvariable=self.delete_pages_var).pack(fill='x', padx=10, pady=5)
ttk.Button(frame, text="删除并保存", command=self.delete_pages_gui).pack(pady=5)
def choose_delete_file(self):
self.delete_file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
def delete_pages_gui(self):
if not hasattr(self, "delete_file"):
return
pages_to_delete = list(map(int, self.delete_pages_var.get().split(",")))
output = filedialog.asksaveasfilename(defaultextension=".pdf")
if not output:
return
try:
reader = PdfReader(self.delete_file)
writer = PdfWriter()
for i, page in enumerate(reader.pages):
if i + 1 not in pages_to_delete:
writer.add_page(page)
with open(output, "wb") as f:
writer.write(f)
messagebox.showinfo("完成", "删除成功!")
log_action("删除页面", self.delete_file)
except Exception as e:
messagebox.showerror("错误", str(e))
def build_pdf_to_image_tab(self, frame):
ttk.Label(frame, text="PDF 转图片(每页保存为 PNG)", font=("Arial", 12)).pack(pady=10)
ttk.Button(frame, text="选择 PDF 文件", command=self.convert_pdf_to_images).pack(pady=5)
def convert_pdf_to_images(self):
file = filedialog.askopenfilename(filetypes=[("PDF Files", "*.pdf")])
if not file:
return
outdir = filedialog.askdirectory()
if not outdir:
return
try:
images = convert_from_path(file)
for i, img in enumerate(images):
img.save(os.path.join(outdir, f"page_{i+1}.png"), "PNG")
messagebox.showinfo("完成", "转换成功!")
log_action("PDF 转图片", file)
except Exception as e:
messagebox.showerror("错误", str(e))
def build_log_viewer_tab(self, frame):
ttk.Label(frame, text="操作日志查看", font=("Arial", 12)).pack(pady=10)
self.log_text = tk.Text(frame, wrap="word", height=30)
self.log_text.pack(fill='both', expand=True)
ttk.Button(frame, text="刷新日志", command=self.load_logs).pack(pady=5)
def load_logs(self):
if os.path.exists(LOG_FILE):
with open(LOG_FILE, "r", encoding="utf-8") as f:
self.log_text.delete(1.0, tk.END)
self.log_text.insert(tk.END, f.read())
else:
self.log_text.insert(tk.END, "暂无日志记录。")
# 主程序入口
if __name__ == '__main__':
app = TkinterDnD.Tk()
PDFToolboxApp(app)
app.mainloop()