文章目录
1. PDF基础知识
1.1 什么是PDF
PDF(Portable Document Format,便携式文档格式)是由Adobe公司开发的一种电子文件格式,旨在独立于应用软件、硬件和操作系统,呈现文档的固定布局。PDF具有以下特点:
- 跨平台兼容性:可以在任何操作系统上查看,保持相同的外观
- 文档完整性:包含文本、图像、表格、字体等所有文档元素
- 紧凑性:支持多种压缩技术
- 安全性:可以设置密码和权限
- 交互性:支持超链接、表单、多媒体等交互元素
1.2 PDF文件结构
PDF文件包含四个主要部分:
- 头部(Header):标识PDF版本
- 主体(Body):包含文档内容(文本、图像等)
- 交叉引用表(Cross-reference Table):提供文档对象位置的索引
- 尾部(Trailer):包含指向交叉引用表的指针和其他对象的引用
了解这些基础概念对于理解PDF操作库的工作原理很有帮助。
2. SpringBoot中的PDF处理库
在SpringBoot应用中处理PDF文件,有几个流行的Java库可供选择:
2.1 iText
iText是一个功能强大的PDF处理库,适用于生成、修改和分析PDF文档。
Maven依赖:
<!-- iText核心库 -->
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itextpdf</artifactId>
<version>5.5.13.3</version>
</dependency>
<!-- iText 7 (更新版本) -->
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itext7-core</artifactId>
<version>7.2.5</version>
<type>pom</type>
</dependency>
注意:iText有开源版本(AGPL许可)和商业版本。在商业项目中使用前,请确认许可证要求。
2.2 Apache PDFBox
Apache PDFBox是Apache软件基金会的开源PDF库,功能全面,许可证更加开放(Apache License 2.0)。
Maven依赖:
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.27</version>
</dependency>
2.3 OpenPDF
OpenPDF是iText 4.2.0的开源继承者,提供了更灵活的许可证(LGPL/MPL)。
Maven依赖:
<dependency>
<groupId>com.github.librepdf</groupId>
<artifactId>openpdf</artifactId>
<version>1.3.30</version>
</dependency>
2.4 JasperReports
JasperReports是一个用于生成PDF报表的高级库,特别适合复杂报表的生成。
Maven依赖:
<dependency>
<groupId>net.sf.jasperreports</groupId>
<artifactId>jasperreports</artifactId>
<version>6.20.0</version>
</dependency>
2.5 选择哪个库?
- iText: 功能最全面,适合需要高度自定义的场景,但许可限制需要注意
- Apache PDFBox: 开源友好,适合基本PDF操作,API相对低级
- OpenPDF: 适合需要iText功能但关注许可问题的项目
- JasperReports: 最适合复杂报表生成,学习曲线较陡
3. 生成PDF文件
3.1 使用iText生成PDF
基本PDF文档生成
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;
import java.io.FileOutputStream;
import java.io.IOException;
@Service
public class PdfGenerationService {
public void generateSimplePdf(String outputPath) throws DocumentException, IOException {
// 创建文档
Document document = new Document();
// 创建PdfWriter实例
PdfWriter.getInstance(document, new FileOutputStream(outputPath));
// 打开文档
document.open();
// 添加内容
document.add(new Paragraph("Hello World! 这是我用iText生成的第一个PDF文档。"));
document.add(new Paragraph("PDF生成时间: " + new java.util.Date()));
// 关闭文档
document.close();
System.out.println("PDF已创建: " + outputPath);
}
}
添加表格
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Phrase;
import com.itextpdf.text.pdf.PdfPCell;
import com.itextpdf.text.pdf.PdfPTable;
import com.itextpdf.text.pdf.PdfWriter;
import java.io.FileOutputStream;
import java.io.IOException;
@Service
public class PdfTableService {
public void generatePdfWithTable(String outputPath) throws DocumentException, IOException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(outputPath));
document.open();
// 添加一个段落
document.add(new Paragraph("用户数据表"));
// 创建表格(3列)
PdfPTable table = new PdfPTable(3);
// 设置表格宽度百分比
table.setWidthPercentage(100);
// 设置列宽比例
table.setWidths(new float[]{2, 5, 3});
// 添加表头
addTableHeader(table);
// 添加行数据
addTableRows(table);
// 将表格添加到文档
document.add(table);
document.close();
}
private void addTableHeader(PdfPTable table) {
PdfPCell header = new PdfPCell();
header.setBackgroundColor(BaseColor.LIGHT_GRAY);
header.setBorderWidth(2);
header.setHorizontalAlignment(Element.ALIGN_CENTER);
header.setPhrase(new Phrase("ID"));
table.addCell(header);
header.setPhrase(new Phrase("姓名"));
table.addCell(header);
header.setPhrase(new Phrase("角色"));
table.addCell(header);
}
private void addTableRows(PdfPTable table) {
// 第一行
table.addCell("1001");
table.addCell("张三");
table.addCell("管理员");
// 第二行
table.addCell("1002");
table.addCell("李四");
table.addCell("用户");
// 第三行
table.addCell("1003");
table.addCell("王五");
table.addCell("审核员");
}
}
添加图片
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;
import java.io.FileOutputStream;
import java.io.IOException;
@Service
public class PdfImageService {
public void generatePdfWithImage(String outputPath, String imagePath)
throws DocumentException, IOException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(outputPath));
document.open();
// 添加文本
document.add(new Paragraph("包含图片的PDF文档"));
// 添加图片
Image image = Image.getInstance(imagePath);
// 缩放图片
image.scaleToFit(400, 300);
// 设置图片位置(居中)
image.setAlignment(Image.MIDDLE);
document.add(image);
// 在图片下添加说明
document.add(new Paragraph("图1: 示例图片"));
document.close();
}
}
3.2 使用Apache PDFBox生成PDF
基本文档生成
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType0Font;
import java.io.File;
import java.io.IOException;
@Service
public class PdfBoxService {
public void generateSimplePdf(String outputPath) throws IOException {
// 创建新文档
PDDocument document = new PDDocument();
// 添加空白页
PDPage page = new PDPage();
document.addPage(page);
// 创建内容流以添加内容
PDPageContentStream contentStream = new PDPageContentStream(document, page);
// 开始文本操作
contentStream.beginText();
// 设置字体和大小
// 使用带有中文支持的字体
PDType0Font font = PDType0Font.load(document,
new File("src/main/resources/fonts/SimSun.ttf"));
contentStream.setFont(font, 12);
// 设置文本位置(从页面左下角计算,单位是点)
contentStream.newLineAtOffset(25, 700);
// 添加文本
contentStream.showText("Hello World! 这是我用PDFBox生成的PDF文档。");
// 移动到下一行
contentStream.newLineAtOffset(0, -15);
contentStream.showText("PDF生成时间: " + new java.util.Date());
// 结束文本操作
contentStream.endText();
// 关闭内容流
contentStream.close();
// 保存文档
document.save(outputPath);
// 关闭文档
document.close();
System.out.println("PDFBox已创建PDF: " + outputPath);
}
}
添加表格(PDFBox中较为复杂)
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType0Font;
import java.io.File;
import java.io.IOException;
@Service
public class PdfBoxTableService {
public void generatePdfWithTable(String outputPath) throws IOException {
// 创建文档
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
// 加载字体
PDType0Font font = PDType0Font.load(document,
new File("src/main/resources/fonts/SimSun.ttf"));
// 表格内容
String[][] content = {
{"ID", "姓名", "角色"},
{"1001", "张三", "管理员"},
{"1002", "李四", "用户"},
{"1003", "王五", "审核员"}
};
// 表格位置和尺寸
float margin = 50;
float y = page.getMediaBox().getHeight() - margin;
float tableWidth = page.getMediaBox().getWidth() - 2 * margin;
// 绘制标题
contentStream.beginText();
contentStream.setFont(font, 16);
contentStream.newLineAtOffset(margin, y);
contentStream.showText("用户数据表");
contentStream.endText();
y -= 30;
// 计算每列宽度
final int rows = content.length;
final int cols = content[0].length;
final float rowHeight = 20f;
final float tableHeight = rowHeight * rows;
final float colWidth = tableWidth / (float)cols;
// 画表格
// 表格外框
contentStream.setLineWidth(1f);
contentStream.addRect(margin, y - tableHeight, tableWidth, tableHeight);
contentStream.stroke();
// 画横线
for(int i = 0; i < rows; i++) {
contentStream.addLine(margin, y - i * rowHeight,
margin + tableWidth, y - i * rowHeight);
}
contentStream.stroke();
// 画竖线
for(int i = 0; i <= cols; i++) {
contentStream.addLine(margin + i * colWidth, y,
margin + i * colWidth, y - tableHeight);
}
contentStream.stroke();
// 添加文本
contentStream.setFont(font, 12);
// 表头使用粗体
float textx = margin + 5;
float texty = y - 15;
for(int i = 0; i < rows; i++) {
for(int j = 0; j < cols; j++) {
contentStream.beginText();
contentStream.newLineAtOffset(textx + j * colWidth, texty - i * rowHeight);
contentStream.showText(content[i][j]);
contentStream.endText();
}
}
contentStream.close();
document.save(outputPath);
document.close();
}
}
3.3 使用OpenPDF生成PDF
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
import java.io.FileOutputStream;
import java.io.IOException;
@Service
public class OpenPdfService {
public void generateSimplePdf(String outputPath) throws DocumentException, IOException {
// 创建文档
Document document = new Document();
// 创建Writer
PdfWriter.getInstance(document, new FileOutputStream(outputPath));
// 打开文档
document.open();
// 添加内容
document.add(new Paragraph("Hello World! 这是我用OpenPDF生成的文档。"));
document.add(new Paragraph("PDF生成时间: " + new java.util.Date()));
// 关闭文档
document.close();
System.out.println("OpenPDF已创建PDF: " + outputPath);
}
}
3.4 在SpringBoot控制器中生成并下载PDF
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpStatus;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import java.io.ByteArrayOutputStream;
@RestController
@RequestMapping("/api/pdf")
public class PdfController {
@Autowired
private PdfGenerationService pdfService;
@GetMapping("/download")
public ResponseEntity<byte[]> downloadPdf() {
try {
// 使用ByteArrayOutputStream而非文件
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// 生成PDF到内存流
pdfService.generatePdf(baos);
// 设置HTTP头
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_PDF);
// 设置文件下载头
String filename = "generated_document.pdf";
headers.setContentDispositionFormData("attachment", filename);
// 返回PDF字节数组
return new ResponseEntity<>(baos.toByteArray(), headers, HttpStatus.OK);
} catch (Exception e) {
e.printStackTrace();
return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
}
}
}
对应的服务类:
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;
import java.io.IOException;
import java.io.OutputStream;
@Service
public class PdfGenerationService {
public void generatePdf(OutputStream outputStream) throws DocumentException, IOException {
// 创建文档
Document document = new Document();
// 写入输出流
PdfWriter.getInstance(document, outputStream);
// 打开文档
document.open();
// 添加内容
document.add(new Paragraph("动态生成的PDF内容"));
document.add(new Paragraph("此PDF由SpringBoot应用程序生成"));
document.add(new Paragraph("生成时间: " + new java.util.Date()));
// 关闭文档
document.close();
}
}
4. 读取与解析PDF
4.1 使用PDFBox读取PDF文本
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import java.io.File;
import java.io.IOException;
@Service
public class PdfReaderService {
public String extractTextFromPdf(String pdfPath) throws IOException {
// 加载PDF文档
File file = new File(pdfPath);
PDDocument document = PDDocument.load(file);
try {
// 创建PDF文本提取器
PDFTextStripper stripper = new PDFTextStripper();
// 获取文本内容
String text = stripper.getText(document);
return text;
} finally {
// 确保文档关闭
if (document != null) {
document.close();
}
}
}
// 提取特定页面的文本
public String extractTextFromPage(String pdfPath, int pageNumber) throws IOException {
File file = new File(pdfPath);
PDDocument document = PDDocument.load(file);
try {
PDFTextStripper stripper = new PDFTextStripper();
// 设置起始页和结束页
stripper.setStartPage(pageNumber);
stripper.setEndPage(pageNumber);
return stripper.getText(document);
} finally {
if (document != null) {
document.close();
}
}
}
}
4.2 使用iText解析PDF
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy;
import com.itextpdf.text.pdf.parser.TextExtractionStrategy;
import java.io.IOException;
@Service
public class ITextPdfReaderService {
public String extractTextFromPdf(String pdfPath) throws IOException {
PdfReader reader = new PdfReader(pdfPath);
StringBuilder textBuilder = new StringBuilder();
try {
int pages = reader.getNumberOfPages();
// 遍历所有页面
for (int i = 1; i <= pages; i++) {
TextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
String pageText = PdfTextExtractor.getTextFromPage(reader, i, strategy);
textBuilder.append(pageText).append("\n");
}
return textBuilder.toString();
} finally {
if (reader != null) {
reader.close();
}
}
}
// 获取PDF元数据
public Map<String, String> getPdfMetadata(String pdfPath) throws IOException {
PdfReader reader = new PdfReader(pdfPath);
Map<String, String> metadata = new HashMap<>();
try {
metadata.put("Title", reader.getInfo().get("Title"));
metadata.put("Author", reader.getInfo().get("Author"));
metadata.put("Subject", reader.getInfo().get("Subject"));
metadata.put("Keywords", reader.getInfo().get("Keywords"));
metadata.put("Creator", reader.getInfo().get("Creator"));
metadata.put("Producer", reader.getInfo().get("Producer"));
metadata.put("Creation Date", reader.getInfo().get("CreationDate"));
metadata.put("Modification Date", reader.getInfo().get("ModDate"));
metadata.put("Page Count", String.valueOf(reader.getNumberOfPages()));
return metadata;
} finally {
if (reader != null) {
reader.close();
}
}
}
}
4.3 从PDF中提取表格数据
提取PDF中的表格是一个复杂任务,可以使用如Tabula-Java等专门库:
<dependency>
<groupId>technology.tabula</groupId>
<artifactId>tabula</artifactId>
<version>1.0.5</version>
</dependency>
import technology.tabula.ObjectExtractor;
import technology.tabula.Page;
import technology.tabula.PageIterator;
import technology.tabula.Rectangle;
import technology.tabula.Table;
import technology.tabula.extractors.SpreadsheetExtractionAlgorithm;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
@Service
public class PdfTableExtractorService {
public List<String[][]> extractTablesFromPdf(String pdfPath) throws IOException {
// 打开PDF文件
PDDocument document = PDDocument.load(new File(pdfPath));
List<String[][]> allTables = new ArrayList<>();
try {
// 创建ObjectExtractor
ObjectExtractor extractor = new ObjectExtractor(document);
// 提取所有页面
PageIterator iterator = extractor.extract();
// 表格提取算法
SpreadsheetExtractionAlgorithm sea = new SpreadsheetExtractionAlgorithm();
// 处理每一页
while (iterator.hasNext()) {
Page page = iterator.next();
// 提取表格
List<Table> tables = sea.extract(page);
// 处理每个表格
for (Table table : tables) {
int rowCount = table.getRowCount();
int colCount = table.getColCount();
String[][] tableData = new String[rowCount][colCount];
// 提取单元格数据
for (int i = 0; i < rowCount; i++) {
for (int j = 0; j < colCount; j++) {
if (j < table.getRows().get(i).size()) {
tableData[i][j] = table.getRows().get(i).get(j).getText();
} else {
tableData[i][j] = "";
}
}
}
allTables.add(tableData);
}
}
} finally {
if (document != null) {
document.close();
}
}
return allTables;
}
// 打印表格数据(用于测试)
public void printTableData(String[][] tableData) {
for (String[] row : tableData) {
for (String cell : row) {
System.out.print(cell + " | ");
}
System.out.println();
}
}
}
5. 修改现有PDF文件
修改现有PDF文件是常见需求,包括添加新内容、修改文本、删除页面、合并文档等操作。
5.1 添加水印和页码
使用iText添加水印
import com.itextpdf.text.BaseColor;
import com.itextpdf.text.Document;
import com.itextpdf.text.Element;
import com.itextpdf.text.Font;
import com.itextpdf.text.FontFactory;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfGState;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;
import java.io.FileOutputStream;
import java.io.IOException;
@Service
public class PdfWatermarkService {
public void addWatermark(String inputPath, String outputPath, String watermarkText)
throws IOException, DocumentException {
// 打开现有PDF
PdfReader reader = new PdfReader(inputPath);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(outputPath));
// 创建基本字体
BaseFont baseFont = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED);
Font font = new Font(baseFont, 30, Font.BOLD, BaseColor.GRAY);
// 获取PDF页数
int pageCount = reader.getNumberOfPages();
// 对每页添加水印
for (int i = 1; i <= pageCount; i++) {
// 获取页面尺寸
Rectangle pageRect = reader.getPageSize(i);
float width = pageRect.getWidth();
float height = pageRect.getHeight();
// 获取内容字节层(在内容下方)
PdfContentByte under = stamper.getUnderContent(i);
// 设置透明度
PdfGState gs = new PdfGState();
gs.setFillOpacity(0.3f);
under.setGState(gs);
// 保存图形状态
under.saveState();
// 设置字体和颜色
under.setFontAndSize(baseFont, 30);
under.setColorFill(BaseColor.GRAY);
// 添加水印文本(旋转45度)
under.beginText();
// 文本旋转和位置
under.showTextAligned(Element.ALIGN_CENTER, watermarkText,
width / 2, height / 2, 45);
under.endText();
// 恢复图形状态
under.restoreState();
}
// 关闭资源
stamper.close();
reader.close();
System.out.println("水印已添加: " + outputPath);
}
}
使用PDFBox添加页码
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.PDPageContentStream.AppendMode;
import org.apache.pdfbox.pdmodel.font.PDType1Font;
import java.io.IOException;
@Service
public class PdfPageNumberService {
public void addPageNumbers(String inputPath, String outputPath) throws IOException {
// 打开PDF文档
PDDocument document = PDDocument.load(new File(inputPath));
try {
// 获取页数
int pageCount = document.getNumberOfPages();
// 为每一页添加页码
for (int i = 0; i < pageCount; i++) {
PDPage page = document.getPage(i);
// 创建内容流以追加内容(AppendMode.APPEND表示添加到现有内容后)
PDPageContentStream contentStream = new PDPageContentStream(
document, page, AppendMode.APPEND, true, true);
// 获取页面尺寸
float pageWidth = page.getMediaBox().getWidth();
float pageHeight = page.getMediaBox().getHeight();
// 设置页码文本
String pageNumberText = "第 " + (i + 1) + " 页,共 " + pageCount + " 页";
// 使用PDFBox内置字体(注意中文需要使用支持中文的字体)
contentStream.setFont(PDType1Font.HELVETICA, 10);
// 添加文本(居中于页面底部)
contentStream.beginText();
// 计算文本宽度以居中
float textWidth = PDType1Font.HELVETICA.getStringWidth(pageNumberText) / 1000 * 10;
float xPosition = (pageWidth - textWidth) / 2;
contentStream.newLineAtOffset(xPosition, 20); // 距底部20点
contentStream.showText(pageNumberText);
contentStream.endText();
// 关闭内容流
contentStream.close();
}
// 保存修改后的文档
document.save(outputPath);
} finally {
if (document != null) {
document.close();
}
}
System.out.println("已添加页码: " + outputPath);
}
}
5.2 合并多个PDF文件
使用PDFBox合并PDF
import org.apache.pdfbox.multipdf.PDFMergerUtility;
import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.File;
import java.io.IOException;
import java.util.List;
@Service
public class PdfMergeService {
public void mergePdfFiles(List<String> inputPaths, String outputPath) throws IOException {
// 创建PDF合并工具
PDFMergerUtility merger = new PDFMergerUtility();
// 设置目标文件
merger.setDestinationFileName(outputPath);
// 添加源文件
for (String path : inputPaths) {
File file = new File(path);
merger.addSource(file);
}
// 执行合并
merger.mergeDocuments(null);
System.out.println("PDF合并完成: " + outputPath);
}
// 指定页面范围合并
public void mergeWithPageRanges(String outputPath) throws IOException {
// 创建空文档接收合并结果
PDDocument mergedDocument = new PDDocument();
try {
// 打开第一个PDF
PDDocument doc1 = PDDocument.load(new File("pdf1.pdf"));
// 仅添加第1页和第3页
mergedDocument.addPage(doc1.getPage(0)); // 0表示第1页
mergedDocument.addPage(doc1.getPage(2)); // 2表示第3页
doc1.close();
// 打开第二个PDF
PDDocument doc2 = PDDocument.load(new File("pdf2.pdf"));
// 添加所有页面
for (int i = 0; i < doc2.getNumberOfPages(); i++) {
mergedDocument.addPage(doc2.getPage(i));
}
doc2.close();
// 打开第三个PDF
PDDocument doc3 = PDDocument.load(new File("pdf3.pdf"));
// 只添加最后一页
mergedDocument.addPage(doc3.getPage(doc3.getNumberOfPages() - 1));
doc3.close();
// 保存合并后的文档
mergedDocument.save(outputPath);
} finally {
if (mergedDocument != null) {
mergedDocument.close();
}
}
System.out.println("选择性页面合并完成: " + outputPath);
}
}
使用iText合并PDF
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfCopy;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.List;
@Service
public class ITextPdfMergeService {
public void mergePdfFiles(List<String> inputPaths, String outputPath)
throws IOException, DocumentException {
// 创建一个文档对象
Document document = new Document();
// 创建PdfCopy实例
PdfCopy copy = new PdfCopy(document, new FileOutputStream(outputPath));
// 打开文档
document.open();
try {
// 遍历每个输入PDF
for (String path : inputPaths) {
// 创建PdfReader实例
PdfReader reader = new PdfReader(path);
// 获取页数
int pageCount = reader.getNumberOfPages();
// 复制每一页
for (int i = 1; i <= pageCount; i++) {
PdfImportedPage page = copy.getImportedPage(reader, i);
copy.addPage(page);
}
// 关闭reader
reader.close();
}
} finally {
// 关闭document
if (document.isOpen()) {
document.close();
}
}
System.out.println("iText PDF合并完成: " + outputPath);
}
}
5.3 分割PDF文件
import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.File;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
@Service
public class PdfSplitService {
// 将PDF拆分为单页文件
public void splitPdfToSinglePages(String inputPath, String outputFolder) throws IOException {
// 加载PDF
PDDocument document = PDDocument.load(new File(inputPath));
try {
// 创建分割器
Splitter splitter = new Splitter();
// 执行分割(每个文档一页)
List<PDDocument> pages = splitter.split(document);
// 获取输入文件名(不带扩展名)
String fileNameWithoutExt = new File(inputPath).getName();
if (fileNameWithoutExt.contains(".")) {
fileNameWithoutExt = fileNameWithoutExt.substring(0,
fileNameWithoutExt.lastIndexOf('.'));
}
// 创建输出目录(如果不存在)
File outputDir = new File(outputFolder);
if (!outputDir.exists()) {
outputDir.mkdirs();
}
// 保存每一页为单独文件
Iterator<PDDocument> iterator = pages.iterator();
int pageNumber = 1;
while (iterator.hasNext()) {
PDDocument pd = iterator.next();
String outputPath = outputFolder + File.separator
+ fileNameWithoutExt + "_page_" + pageNumber + ".pdf";
pd.save(outputPath);
pd.close();
pageNumber++;
}
System.out.println("PDF分割完成,共 " + (pageNumber - 1) + " 页已保存到: " + outputFolder);
} finally {
if (document != null) {
document.close();
}
}
}
// 将PDF按页范围拆分
public void splitPdfByPageRange(String inputPath, String outputPath, int startPage, int endPage)
throws IOException {
// 加载PDF
PDDocument document = PDDocument.load(new File(inputPath));
try {
// 验证页面范围
int totalPages = document.getNumberOfPages();
if (startPage < 1 || endPage > totalPages || startPage > endPage) {
throw new IllegalArgumentException("无效的页面范围: " + startPage + "-" + endPage
+ ",文档共有 " + totalPages + " 页");
}
// 创建新文档
PDDocument newDocument = new PDDocument();
// 复制指定页面
for (int i = startPage; i <= endPage; i++) {
// 注意PDFBox页码从0开始
newDocument.addPage(document.getPage(i - 1));
}
// 保存新文档
newDocument.save(outputPath);
newDocument.close();
System.out.println("已提取页面 " + startPage + " 到 " + endPage + " 并保存到: " + outputPath);
} finally {
if (document != null) {
document.close();
}
}
}
}
5.4 加密与解密PDF
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.encryption.AccessPermission;
import org.apache.pdfbox.pdmodel.encryption.StandardProtectionPolicy;
import java.io.File;
import java.io.IOException;
@Service
public class PdfEncryptionService {
// 加密PDF文件
public void encryptPdf(String inputPath, String outputPath,
String userPassword, String ownerPassword) throws IOException {
// 加载PDF
PDDocument document = PDDocument.load(new File(inputPath));
try {
// 设置访问权限
AccessPermission accessPermission = new AccessPermission();
// 禁止打印
accessPermission.setCanPrint(false);
// 禁止修改内容
accessPermission.setCanModify(false);
// 禁止复制内容
accessPermission.setCanExtractContent(false);
// 禁止添加或修改注释
accessPermission.setCanModifyAnnotations(false);
// 创建保护策略(用户密码、所有者密码、密钥长度)
StandardProtectionPolicy policy = new StandardProtectionPolicy(
ownerPassword, userPassword, accessPermission);
// 设置加密密钥长度(128位)
policy.setEncryptionKeyLength(128);
// 应用加密
document.protect(policy);
// 保存加密后的文档
document.save(outputPath);
System.out.println("PDF加密完成: " + outputPath);
} finally {
if (document != null) {
document.close();
}
}
}
// 解密PDF文件(需要提供密码)
public void decryptPdf(String inputPath, String outputPath, String password) throws IOException {
// 加载加密的PDF(提供密码)
PDDocument document = PDDocument.load(new File(inputPath), password);
try {
// 检查文档是否加密
if (document.isEncrypted()) {
// 移除加密
document.setAllSecurityToBeRemoved(true);
// 保存解密后的文档
document.save(outputPath);
System.out.println("PDF解密完成: " + outputPath);
} else {
System.out.println("PDF未加密,无需解密");
// 可以选择直接复制文件
document.save(outputPath);
}
} finally {
if (document != null) {
document.close();
}
}
}
}
5.5 删除和重新排序页面
import org.apache.pdfbox.pdmodel.PDDocument;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
@Service
public class PdfPageManipulationService {
// 删除指定页面
public void removePages(String inputPath, String outputPath, List<Integer> pagesToRemove)
throws IOException {
// 加载PDF
PDDocument document = PDDocument.load(new File(inputPath));
try {
// 按降序排列要删除的页码(从后往前删除避免索引变化)
Collections.sort(pagesToRemove, Collections.reverseOrder());
// 删除指定页面
for (int pageNum : pagesToRemove) {
// 验证页码是否有效
if (pageNum >= 1 && pageNum <= document.getNumberOfPages()) {
// PDFBox页码从0开始
document.removePage(pageNum - 1);
} else {
System.out.println("警告:页码 " + pageNum + " 超出范围,已忽略");
}
}
// 保存修改后的文档
document.save(outputPath);
System.out.println("页面删除完成,保存到: " + outputPath);
} finally {
if (document != null) {
document.close();
}
}
}
// 重新排序页面
public void reorderPages(String inputPath, String outputPath, int[] newOrder) throws IOException {
// 加载PDF
PDDocument document = PDDocument.load(new File(inputPath));
try {
// 获取原始页数
int pageCount = document.getNumberOfPages();
// 验证新顺序数组长度
if (newOrder.length != pageCount) {
throw new IllegalArgumentException(
"新顺序数组长度(" + newOrder.length + ")与页数(" + pageCount + ")不匹配");
}
// 创建临时文档
PDDocument newDocument = new PDDocument();
// 按新顺序添加页面
for (int i = 0; i < newOrder.length; i++) {
// 确保索引从1开始转为0开始
int oldIndex = newOrder[i] - 1;
// 验证索引有效性
if (oldIndex < 0 || oldIndex >= pageCount) {
throw new IllegalArgumentException("新顺序中包含无效页码: " + (oldIndex + 1));
}
// 导入页面
newDocument.addPage(document.getPage(oldIndex));
}
// 保存新文档
newDocument.save(outputPath);
newDocument.close();
System.out.println("页面重新排序完成,保存到: " + outputPath);
} finally {
if (document != null) {
document.close();
}
}
}
}
5.6 填充PDF表单
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;
import java.io.File;
import java.io.IOException;
import java.util.Map;
@Service
public class PdfFormFillingService {
// 填充PDF表单
public void fillPdfForm(String templatePath, String outputPath,
Map<String, String> formData) throws IOException {
// 加载PDF模板
PDDocument document = PDDocument.load(new File(templatePath));
try {
// 获取表单
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
if (acroForm != null) {
// 遍历表单字段并填充数据
for (Map.Entry<String, String> entry : formData.entrySet()) {
String fieldName = entry.getKey();
String fieldValue = entry.getValue();
// 查找表单字段
PDField field = acroForm.getField(fieldName);
if (field != null) {
// 设置字段值
field.setValue(fieldValue);
} else {
System.out.println("警告:找不到表单字段 '" + fieldName + "'");
}
}
// 设置表单为不可编辑(可选)
acroForm.setNeedAppearances(true);
// 保存填充后的表单
document.save(outputPath);
System.out.println("PDF表单填充完成,保存到: " + outputPath);
} else {
System.out.println("错误:PDF文档不包含表单");
}
} finally {
if (document != null) {
document.close();
}
}
}
// 列出PDF文档中的所有表单字段(用于调试)
public void listFormFields(String pdfPath) throws IOException {
// 加载PDF
PDDocument document = PDDocument.load(new File(pdfPath));
try {
// 获取表单
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
if (acroForm != null) {
// 获取所有字段
List<PDField> fields = acroForm.getFields();
System.out.println("PDF文档包含 " + fields.size() + " 个表单字段:");
// 打印字段名称和类型
for (PDField field : fields) {
System.out.println("字段名称: " + field.getFullyQualifiedName() +
", 类型: " + field.getClass().getSimpleName());
}
} else {
System.out.println("PDF文档不包含表单");
}
} finally {
if (document != null) {
document.close();
}
}
}
}
6. Web应用中的PDF处理
在SpringBoot Web应用中,PDF处理是一个常见需求,包括PDF的生成与下载、在线预览、上传与解析等功能。本节将介绍如何在SpringBoot应用中实现这些功能。
6.1 PDF下载功能实现
在Web应用中,通常需要提供PDF下载功能,如报表下载、证书下载等。下面是一个基本的PDF下载控制器实现:
@RestController
@RequestMapping("/api/pdf")
public class PdfDownloadController {
private final PdfGenerationService pdfService;
@Autowired
public PdfDownloadController(PdfGenerationService pdfService) {
this.pdfService = pdfService;
}
/**
* 生成并下载简单PDF
*/
@GetMapping("/download/simple")
public ResponseEntity<byte[]> downloadSimplePdf() {
try {
byte[] pdfBytes = pdfService.generateSimplePdf();
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_PDF);
headers.setContentDispositionFormData("attachment", "document.pdf");
headers.setCacheControl("must-revalidate, post-check=0, pre-check=0");
return new ResponseEntity<>(pdfBytes, headers, HttpStatus.OK);
} catch (Exception e) {
return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
}
}
/**
* 根据ID生成并下载报表
*/
@GetMapping("/download/report/{id}")
public ResponseEntity<byte[]> downloadReportById(@PathVariable("id") Long id) {
try {
byte[] pdfBytes = pdfService.generateReportPdf(id);
if (pdfBytes == null) {
return new ResponseEntity<>(HttpStatus.NOT_FOUND);
}
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_PDF);
headers.setContentDispositionFormData("attachment", "report-" + id + ".pdf");
headers.setCacheControl("must-revalidate, post-check=0, pre-check=0");
return new ResponseEntity<>(pdfBytes, headers, HttpStatus.OK);
} catch (Exception e) {
return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
}
}
}
以下是使用模板生成证书并下载的示例:
@Controller
@RequestMapping("/api/pdf/templates")
public class PdfTemplateController {
private final PdfTemplateService templateService;
@Autowired
public PdfTemplateController(PdfTemplateService templateService) {
this.templateService = templateService;
}
/**
* 生成证书并下载
*/
@GetMapping("/certificate/{userId}")
public ResponseEntity<byte[]> generateCertificate(@PathVariable Long userId) {
try {
// 获取用户信息
UserDto user = userService.findById(userId);
if (user == null) {
return new ResponseEntity<>(HttpStatus.NOT_FOUND);
}
// 生成证书
byte[] certificateBytes = templateService.generateCertificateFromTemplate(user);
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_PDF);
headers.setContentDispositionFormData("attachment",
"certificate-" + user.getUsername() + ".pdf");
return new ResponseEntity<>(certificateBytes, headers, HttpStatus.OK);
} catch (Exception e) {
return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
}
}
}
6.2 在线PDF预览实现
为了提供更好的用户体验,有时需要在浏览器中直接预览PDF,而不是下载到本地。下面是一个实现在线PDF预览的控制器:
@Controller
@RequestMapping("/pdf/view")
public class PdfViewController {
private final DocumentService documentService;
@Autowired
public PdfViewController(DocumentService documentService) {
this.documentService = documentService;
}
/**
* 返回PDF预览页面
*/
@GetMapping("/{documentId}")
public String viewPdf(@PathVariable Long documentId, Model model) {
// 检查文档是否存在
if (!documentService.exists(documentId)) {
return "error/404";
}
model.addAttribute("documentId", documentId);
model.addAttribute("documentName", documentService.getName(documentId));
return "pdf/viewer"; // 返回Thymeleaf模板
}
/**
* 提供PDF数据的端点
*/
@GetMapping("/data/{documentId}")
@ResponseBody
public ResponseEntity<byte[]> getPdfData(@PathVariable Long documentId) {
try {
byte[] pdfData = documentService.getPdfContent(documentId);
if (pdfData == null) {
return new ResponseEntity<>(HttpStatus.NOT_FOUND);
}
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_PDF);
// 使用inline而不是attachment,这样浏览器会直接显示PDF而不是下载
headers.add("Content-Disposition", "inline; filename=document-" + documentId + ".pdf");
return new ResponseEntity<>(pdfData, headers, HttpStatus.OK);
} catch (Exception e) {
return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
}
}
}
Thymeleaf模板(viewer.html)示例:
<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">
<head>
<title th:text="${documentName} + ' - PDF预览'"></title>
<meta charset="UTF-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.11.338/pdf.min.js"></script>
<style>
body { margin: 0; padding: 0; }
#pdf-container { width: 100%; height: 100vh; overflow: auto; }
#pdf-viewer { width: 100%; height: 100%; }
</style>
</head>
<body>
<div id="pdf-container">
<iframe id="pdf-viewer" th:src="@{'/pdf/view/data/' + ${documentId}}" frameborder="0"></iframe>
</div>
</body>
</html>
6.3 PDF文件上传和处理
在许多应用中,需要允许用户上传PDF文件并进行处理。以下是一个处理PDF上传的控制器示例:
@RestController
@RequestMapping("/api/pdf/upload")
public class PdfUploadController {
private final PdfAnalysisService pdfAnalysisService;
@Autowired
public PdfUploadController(PdfAnalysisService pdfAnalysisService) {
this.pdfAnalysisService = pdfAnalysisService;
}
/**
* 处理PDF上传并分析内容
*/
@PostMapping("/analyze")
public ResponseEntity<?> uploadAndAnalyzePdf(@RequestParam("file") MultipartFile file) {
// 检查文件是否为空
if (file.isEmpty()) {
return ResponseEntity.badRequest().body("请选择要上传的PDF文件");
}
// 检查文件类型
if (!file.getContentType().equals("application/pdf")) {
return ResponseEntity.badRequest().body("只支持PDF文件上传");
}
try {
// 读取文件内容
byte[] pdfBytes = file.getBytes();
// 分析PDF内容
PdfAnalysisResult result = pdfAnalysisService.analyzePdf(pdfBytes);
return ResponseEntity.ok(result);
} catch (IOException e) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body("PDF处理失败: " + e.getMessage());
}
}
/**
* 上传并保存PDF文件
*/
@PostMapping("/save")
public ResponseEntity<?> uploadAndSavePdf(
@RequestParam("file") MultipartFile file,
@RequestParam("title") String title,
@RequestParam("description") String description) {
if (file.isEmpty()) {
return ResponseEntity.badRequest().body("请选择要上传的PDF文件");
}
try {
// 保存PDF文件并获取文档ID
Long documentId = pdfAnalysisService.savePdfDocument(
file.getBytes(),
file.getOriginalFilename(),
title,
description);
Map<String, Object> response = new HashMap<>();
response.put("success", true);
response.put("documentId", documentId);
response.put("message", "PDF文件上传成功");
return ResponseEntity.ok(response);
} catch (Exception e) {
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body("PDF上传失败: " + e.getMessage());
}
}
}
6.4 批量PDF处理
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;
@RestController
@RequestMapping("/api/pdf/batch")
public class PdfBatchController {
@Autowired
private PdfBatchService pdfBatchService;
// 临时目录用于保存上传的文件
private final Path tempDir = Paths.get(System.getProperty("java.io.tmpdir"));
@PostMapping("/merge")
public ResponseEntity<?> mergePdfs(@RequestParam("files") MultipartFile[] files) {
Map<String, Object> response = new HashMap<>();
List<String> tempFilePaths = new ArrayList<>();
try {
// 检查文件
if (files.length < 2) {
response.put("success", false);
response.put("message", "请至少提供两个PDF文件进行合并");
return ResponseEntity.badRequest().body(response);
}
// 保存上传的文件到临时目录
for (MultipartFile file : files) {
// 验证文件类型
if (!file.getContentType().equals("application/pdf")) {
response.put("success", false);
response.put("message", "文件 " + file.getOriginalFilename() + " 不是PDF格式");
return ResponseEntity.badRequest().body(response);
}
// 创建唯一文件名
String tempFileName = UUID.randomUUID().toString() + ".pdf";
Path tempFile = tempDir.resolve(tempFileName);
// 保存文件
Files.copy(file.getInputStream(), tempFile);
tempFilePaths.add(tempFile.toString());
}
// 生成合并后的PDF文件名
String outputFileName = "merged_" + UUID.randomUUID().toString() + ".pdf";
Path outputPath = tempDir.resolve(outputFileName);
// 执行合并
pdfBatchService.mergePdfFiles(tempFilePaths, outputPath.toString());
// 读取合并后的文件
byte[] mergedPdfBytes = Files.readAllBytes(outputPath);
// 清理临时文件
for (String path : tempFilePaths) {
Files.deleteIfExists(Paths.get(path));
}
Files.deleteIfExists(outputPath);
// 设置响应
response.put("success", true);
response.put("message", "PDF文件已成功合并");
response.put("mergedFileSize", mergedPdfBytes.length);
// 可以返回合并后的文件作为Base64字符串(适用于小文件)
response.put("mergedPdfBase64",
java.util.Base64.getEncoder().encodeToString(mergedPdfBytes));
return ResponseEntity.ok(response);
} catch (Exception e) {
// 清理临时文件
for (String path : tempFilePaths) {
try {
Files.deleteIfExists(Paths.get(path));
} catch (IOException ex) {
// 忽略清理错误
}
}
response.put("success", false);
response.put("message", "合并PDF时出错: " + e.getMessage());
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(response);
}
}
}
批量处理服务:
import org.apache.pdfbox.multipdf.PDFMergerUtility;
import org.springframework.stereotype.Service;
import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.stream.Collectors;
@Service
public class PdfBatchService {
// 用于异步处理的线程池
private final ExecutorService executor = Executors.newFixedThreadPool(5);
// 合并PDF文件
public void mergePdfFiles(List<String> inputPaths, String outputPath) throws IOException {
PDFMergerUtility merger = new PDFMergerUtility();
merger.setDestinationFileName(outputPath);
for (String path : inputPaths) {
merger.addSource(new File(path));
}
merger.mergeDocuments(null);
}
// 异步处理多个PDF
public CompletableFuture<List<String>> processMultiplePdfsAsync(
List<String> inputPaths, String outputDir, String operation) {
// 创建CompletableFuture列表
List<CompletableFuture<String>> futures = inputPaths.stream()
.map(path -> CompletableFuture.supplyAsync(() -> {
try {
// 根据操作类型处理PDF
String outputPath = outputDir + File.separator
+ new File(path).getName().replace(".pdf", "_processed.pdf");
switch (operation) {
case "watermark":
addWatermark(path, outputPath, "机密文档");
break;
case "compress":
compressPdf(path, outputPath);
break;
case "encrypt":
encryptPdf(path, outputPath, "password123", "owner456");
break;
// 添加其他操作...
default:
throw new IllegalArgumentException("不支持的操作: " + operation);
}
return outputPath;
} catch (Exception e) {
throw new RuntimeException("处理文件失败: " + path, e);
}
}, executor))
.collect(Collectors.toList());
// 组合所有Future
CompletableFuture<Void> allOf = CompletableFuture.allOf(
futures.toArray(new CompletableFuture[0]));
// 处理完成后收集结果
return allOf.thenApply(v ->
futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList())
);
}
// 添加水印方法
private void addWatermark(String inputPath, String outputPath, String watermarkText)
throws IOException {
// 实现与PdfWatermarkService类似
// ...
}
// 压缩PDF方法
private void compressPdf(String inputPath, String outputPath) throws IOException {
// 实现PDF压缩
// ...
}
// 加密PDF方法
private void encryptPdf(String inputPath, String outputPath,
String userPassword, String ownerPassword) throws IOException {
// 实现与PdfEncryptionService类似
// ...
}
}
7. PDF安全性与高级功能
在处理PDF文档时,安全性是一个重要的考虑因素,尤其是在处理敏感信息时。同时,高级功能如水印、数字签名等可以为PDF文档添加更多实用功能。
7.1 PDF文档加密与权限控制
使用iText可以很容易地对PDF文档进行加密和设置权限:
@Service
public class PdfSecurityService {
/**
* 创建带密码保护的PDF
*/
public byte[] createEncryptedPdf(String content, String userPassword, String ownerPassword) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(baos);
PdfDocument pdf = new PdfDocument(writer);
// 设置PDF文档的加密选项
WriterProperties writerProperties = new WriterProperties();
// 设置用户密码(打开文档需要)和所有者密码(修改权限需要)
writerProperties.setStandardEncryption(
userPassword.getBytes(),
ownerPassword.getBytes(),
EncryptionConstants.ALLOW_PRINTING, // 允许打印
EncryptionConstants.ENCRYPTION_AES_128 // 使用AES 128位加密
);
writer.setWriterProperties(writerProperties);
// 创建文档内容
Document document = new Document(pdf);
document.add(new Paragraph(content));
document.close();
return baos.toByteArray();
}
/**
* 创建带权限控制的PDF
*/
public byte[] createPermissionControlledPdf(String content, String ownerPassword, int permissions) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
WriterProperties writerProperties = new WriterProperties();
writerProperties.setStandardEncryption(
null, // 无用户密码
ownerPassword.getBytes(),
permissions, // 自定义权限
EncryptionConstants.ENCRYPTION_AES_256 // 使用AES 256位加密
);
PdfWriter writer = new PdfWriter(baos, writerProperties);
PdfDocument pdf = new PdfDocument(writer);
Document document = new Document(pdf);
document.add(new Paragraph(content));
document.close();
return baos.toByteArray();
}
/**
* 检查PDF是否加密
*/
public boolean isPdfEncrypted(byte[] pdfData) throws IOException {
PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
PdfDocument pdf = new PdfDocument(reader);
boolean isEncrypted = pdf.isEncrypted();
pdf.close();
return isEncrypted;
}
}
7.2 添加水印
水印是一种常见的PDF高级功能,可以用来保护文档版权或标记文档状态:
@Service
public class PdfWatermarkService {
/**
* 添加文本水印
*/
public byte[] addTextWatermark(byte[] originalPdf, String watermarkText) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(new ByteArrayInputStream(originalPdf));
PdfWriter writer = new PdfWriter(baos);
PdfDocument pdf = new PdfDocument(reader, writer);
// 获取页数
int numberOfPages = pdf.getNumberOfPages();
// 创建透明的水印文本
PdfFont font = PdfFontFactory.createFont(StandardFonts.HELVETICA);
Paragraph watermark = new Paragraph(watermarkText)
.setFont(font)
.setFontSize(30)
.setFontColor(new DeviceRgb(0.5f, 0.5f, 0.5f), 0.3f); // 灰色,30%不透明度
// 在每一页添加水印
for (int i = 1; i <= numberOfPages; i++) {
PdfPage page = pdf.getPage(i);
Rectangle pageSize = page.getPageSize();
float x = (pageSize.getLeft() + pageSize.getRight()) / 2;
float y = (pageSize.getBottom() + pageSize.getTop()) / 2;
// 创建一个新的Canvas用于绘制水印
PdfCanvas canvas = new PdfCanvas(page);
canvas.saveState();
// 应用旋转变换
canvas.setFillColor(new DeviceRgb(0.5f, 0.5f, 0.5f));
canvas.setExtGState(new PdfExtGState().setFillOpacity(0.3f));
// 使用Canvas绘制文本
Canvas watermarkCanvas = new Canvas(canvas, pdf, page.getPageSize());
watermarkCanvas.showTextAligned(watermark, x, y, i, TextAlignment.CENTER, VerticalAlignment.MIDDLE, (float) Math.PI / 6);
canvas.restoreState();
}
pdf.close();
return baos.toByteArray();
}
/**
* 添加图片水印
*/
public byte[] addImageWatermark(byte[] originalPdf, byte[] imageData) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(new ByteArrayInputStream(originalPdf));
PdfWriter writer = new PdfWriter(baos);
PdfDocument pdf = new PdfDocument(reader, writer);
// 转换图片数据为ImageData
ImageData imageDataObj = ImageDataFactory.create(imageData);
Image image = new Image(imageDataObj);
// 设置图片大小和不透明度
image.scaleAbsolute(100, 100);
image.setOpacity(0.3f);
// 在每一页添加水印
int numberOfPages = pdf.getNumberOfPages();
for (int i = 1; i <= numberOfPages; i++) {
PdfPage page = pdf.getPage(i);
Rectangle pageSize = page.getPageSize();
// 计算图片位置(居中)
float x = (pageSize.getLeft() + pageSize.getRight()) / 2 - image.getImageScaledWidth() / 2;
float y = (pageSize.getBottom() + pageSize.getTop()) / 2 - image.getImageScaledHeight() / 2;
// 创建Canvas并添加图片
PdfCanvas canvas = new PdfCanvas(page);
canvas.addImage(imageDataObj, x, y, false);
}
pdf.close();
return baos.toByteArray();
}
}
7.3 数字签名
数字签名是确保PDF文档完整性和真实性的重要手段:
@Service
public class PdfSignatureService {
/**
* 使用数字证书签署PDF
*/
public byte[] signPdf(byte[] pdfData, KeyStore keystore, String alias, char[] password) throws Exception {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
PdfSigner signer = new PdfSigner(reader, baos, new StampingProperties());
// 配置签名外观
PdfSignatureAppearance appearance = signer.getSignatureAppearance();
appearance.setReason("证明文档真实性")
.setLocation("北京")
.setSignatureCreator("PDF签名系统")
.setReuseAppearance(false);
// 配置签名矩形区域(位于最后一页的左下角)
Rectangle rect = new Rectangle(36, 36, 200, 50);
appearance.setPageRect(rect)
.setPageNumber(reader.getNumberOfPages());
// 设置签名信息
PrivateKey pk = (PrivateKey) keystore.getKey(alias, password);
Certificate[] chain = keystore.getCertificateChain(alias);
// 创建签名者
IExternalSignature pks = new PrivateKeySignature(pk, DigestAlgorithms.SHA256, null);
IExternalDigest digest = new BouncyCastleDigest();
// 执行签名
signer.signDetached(digest, pks, chain, null, null, null, 0, PdfSigner.CryptoStandard.CMS);
return baos.toByteArray();
}
/**
* 创建自签名证书(仅用于测试)
*/
public KeyStore createSelfSignedCertificate() throws Exception {
// 生成密钥对
KeyPairGenerator keyGen = KeyPairGenerator.getInstance("RSA");
keyGen.initialize(2048);
KeyPair keyPair = keyGen.generateKeyPair();
// 创建自签名证书
X509Certificate cert = generateSelfSignedCertificate(keyPair);
// 创建KeyStore并存储证书
KeyStore keyStore = KeyStore.getInstance(KeyStore.getDefaultType());
keyStore.load(null, null);
keyStore.setKeyEntry("pdf-signer", keyPair.getPrivate(), "password".toCharArray(),
new java.security.cert.Certificate[]{cert});
return keyStore;
}
/**
* 生成自签名证书
*/
private X509Certificate generateSelfSignedCertificate(KeyPair keyPair) throws Exception {
// 使用Bouncy Castle实现
Security.addProvider(new org.bouncycastle.jce.provider.BouncyCastleProvider());
// 当前时间
long now = System.currentTimeMillis();
// 证书有效期为1年
Date startDate = new Date(now);
Date endDate = new Date(now + 365 * 24 * 60 * 60 * 1000);
// 证书序列号
BigInteger serialNumber = BigInteger.valueOf(now);
// 证书信息
X500Name subject = new X500Name("CN=PDF Signer, O=Example Organization, L=Beijing, C=CN");
// 证书生成
X509v3CertificateBuilder builder = new JcaX509v3CertificateBuilder(
subject,
serialNumber,
startDate,
endDate,
subject,
keyPair.getPublic()
);
// 签名算法
ContentSigner contentSigner = new JcaContentSignerBuilder("SHA256WithRSAEncryption")
.setProvider("BC").build(keyPair.getPrivate());
// 生成证书
X509CertificateHolder holder = builder.build(contentSigner);
X509Certificate cert = new JcaX509CertificateConverter()
.setProvider("BC").getCertificate(holder);
return cert;
}
/**
* 验证PDF签名
*/
public List<SignatureVerificationResult> verifyPdfSignatures(byte[] pdfData) throws IOException {
List<SignatureVerificationResult> results = new ArrayList<>();
PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
PdfDocument pdf = new PdfDocument(reader);
SignatureUtil signUtil = new SignatureUtil(pdf);
List<String> sigNames = signUtil.getSignatureNames();
for (String name : sigNames) {
PdfPKCS7 pkcs7 = signUtil.readSignatureData(name);
// 获取签名时间
Calendar cal = pkcs7.getSignDate();
// 获取签名信息
String reason = pkcs7.getReason();
String location = pkcs7.getLocation();
// 验证签名
boolean isSignatureValid = false;
boolean isDocumentModified = false;
try {
isSignatureValid = pkcs7.verifySignatureIntegrityAndAuthenticity();
isDocumentModified = signUtil.signatureCoversWholeDocument(name);
} catch (Exception e) {
// 验证过程出错
}
// 记录验证结果
SignatureVerificationResult result = new SignatureVerificationResult(
name, cal.getTime(), reason, location, isSignatureValid, !isDocumentModified);
results.add(result);
}
pdf.close();
return results;
}
// 签名验证结果类
public static class SignatureVerificationResult {
private String name;
private Date date;
private String reason;
private String location;
private boolean valid;
private boolean modified;
// 构造函数、getter和setter省略
public SignatureVerificationResult(String name, Date date, String reason, String location,
boolean valid, boolean modified) {
this.name = name;
this.date = date;
this.reason = reason;
this.location = location;
this.valid = valid;
this.modified = modified;
}
}
}
7.4 PDF表单与交互功能
PDF表单允许创建交互式文档,用户可以填写并提交这些表单:
@Service
public class PdfFormService {
/**
* 创建包含表单的PDF
*/
public byte[] createPdfWithForm() throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfWriter writer = new PdfWriter(baos);
PdfDocument pdf = new PdfDocument(writer);
Document document = new Document(pdf);
// 添加标题
document.add(new Paragraph("用户注册表单").setFontSize(20).setBold());
document.add(new Paragraph("请填写以下信息:").setFontSize(12));
document.add(new Paragraph("\n"));
// 创建表单
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
// 姓名字段
document.add(new Paragraph("姓名:"));
Rectangle nameRect = new Rectangle(100, 700, 200, 20);
PdfTextFormField nameField = PdfTextFormField.createText(pdf, nameRect, "name", "");
form.addField(nameField);
// 邮箱字段
document.add(new Paragraph("邮箱:"));
Rectangle emailRect = new Rectangle(100, 650, 200, 20);
PdfTextFormField emailField = PdfTextFormField.createText(pdf, emailRect, "email", "");
form.addField(emailField);
// 性别单选按钮
document.add(new Paragraph("性别:"));
// 创建单选按钮组
PdfButtonFormField genderGroup = PdfFormField.createRadioGroup(pdf, "gender", "");
// 男性选项
Rectangle maleRect = new Rectangle(100, 600, 20, 20);
PdfFormField male = PdfFormField.createRadioButton(pdf, maleRect, genderGroup, "男");
form.addField(male);
document.add(new Paragraph("男").setFixedPosition(125, 600, 50));
// 女性选项
Rectangle femaleRect = new Rectangle(160, 600, 20, 20);
PdfFormField female = PdfFormField.createRadioButton(pdf, femaleRect, genderGroup, "女");
form.addField(female);
document.add(new Paragraph("女").setFixedPosition(185, 600, 50));
form.addField(genderGroup);
// 兴趣复选框
document.add(new Paragraph("兴趣爱好:"));
// 阅读选项
Rectangle readingRect = new Rectangle(100, 550, 20, 20);
PdfFormField reading = PdfFormField.createCheckBox(pdf, readingRect, "reading", "Yes", PdfFormField.TYPE_CHECK);
form.addField(reading);
document.add(new Paragraph("阅读").setFixedPosition(125, 550, 50));
// 旅行选项
Rectangle travelRect = new Rectangle(180, 550, 20, 20);
PdfFormField travel = PdfFormField.createCheckBox(pdf, travelRect, "travel", "Yes", PdfFormField.TYPE_CHECK);
form.addField(travel);
document.add(new Paragraph("旅行").setFixedPosition(205, 550, 50));
// 音乐选项
Rectangle musicRect = new Rectangle(260, 550, 20, 20);
PdfFormField music = PdfFormField.createCheckBox(pdf, musicRect, "music", "Yes", PdfFormField.TYPE_CHECK);
form.addField(music);
document.add(new Paragraph("音乐").setFixedPosition(285, 550, 50));
// 提交按钮
Rectangle submitRect = new Rectangle(100, 500, 100, 30);
PdfButtonFormField submit = PdfFormField.createPushButton(pdf, submitRect, "submit", "提交");
submit.setAction(PdfAction.createSubmitForm("/submit-form", null, PdfAction.SUBMIT_HTML_FORMAT, 0));
form.addField(submit);
document.close();
return baos.toByteArray();
}
/**
* 从提交的表单中提取数据
*/
public Map<String, Object> extractFormData(byte[] pdfData) throws IOException {
Map<String, Object> formData = new HashMap<>();
PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
PdfDocument pdf = new PdfDocument(reader);
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, false);
if (form != null) {
// 获取所有表单字段
Map<String, PdfFormField> fields = form.getFormFields();
// 提取每个字段的值
for (Map.Entry<String, PdfFormField> entry : fields.entrySet()) {
String fieldName = entry.getKey();
PdfFormField field = entry.getValue();
// 根据字段类型处理不同的表单元素
if (field.getFormType() == PdfName.Tx) {
// 文本字段
formData.put(fieldName, field.getValueAsString());
} else if (field.getFormType() == PdfName.Btn) {
// 按钮(复选框或单选按钮)
if (field.isCheckBox()) {
formData.put(fieldName, "Yes".equals(field.getValueAsString()));
} else if (field.isRadioButton()) {
formData.put(fieldName, field.getValueAsString());
}
} else if (field.getFormType() == PdfName.Ch) {
// 选择字段(下拉列表或列表框)
formData.put(fieldName, field.getValueAsString());
}
}
}
pdf.close();
return formData;
}
/**
* 填充PDF表单
*/
public byte[] fillPdfForm(byte[] pdfTemplate, Map<String, Object> formData) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfTemplate));
PdfWriter writer = new PdfWriter(baos);
PdfDocument pdf = new PdfDocument(reader, writer);
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
// 设置表单为不可更改
form.setNeedAppearances(false);
// 填充表单字段
for (Map.Entry<String, Object> entry : formData.entrySet()) {
String fieldName = entry.getKey();
Object value = entry.getValue();
if (form.getField(fieldName) != null) {
if (value instanceof String) {
form.getField(fieldName).setValue((String) value);
} else if (value instanceof Boolean) {
boolean checked = (Boolean) value;
form.getField(fieldName).setValue(checked ? "Yes" : "Off");
}
}
}
// 设置所有字段为只读
form.flattenFields();
pdf.close();
return baos.toByteArray();
}
}
通过上述示例代码,我们演示了如何实现PDF文档的安全性控制(加密与权限控制)、添加水印、数字签名以及创建和处理PDF表单等高级功能。这些功能可根据实际应用需求进行组合和定制,以满足不同的业务场景需求。
8. PDF处理最佳实践
在SpringBoot应用中处理PDF文件,遵循一些最佳实践可以让您的应用程序更加高效、安全和易于维护。
8.1 性能优化
处理PDF文件,特别是大型PDF文件时,性能是一个重要的考虑因素。
8.1.1 内存管理
@Service
public class PdfMemoryOptimizationService {
/**
* 高效处理大型PDF文件
*/
public void processLargePdf(String inputPath, String outputPath) throws IOException {
// 使用RandomAccessFile而不是将整个文件加载到内存中
RandomAccessFile raf = new RandomAccessFile(new File(inputPath), "r");
FileChannel channel = raf.getChannel();
// 使用内存映射文件来高效访问大文件
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
// 处理PDF文件...
PDFParser parser = new PDFParser(new RandomAccessBufferedFileInputStream(new ByteBufferInputStream(buf)));
parser.parse();
PDDocument document = parser.getPDDocument();
// 按页处理,而不是一次加载所有页面
int pageCount = document.getNumberOfPages();
for (int i = 0; i < pageCount; i++) {
PDPage page = document.getPage(i);
// 处理每一页...
processPage(page);
// 处理完后清理页面资源,释放内存
page.clear();
// 定期调用垃圾回收(生产环境一般不推荐,这里仅作示例)
if (i % 100 == 0) {
System.gc();
}
}
// 保存处理后的文档
document.save(outputPath);
document.close();
channel.close();
raf.close();
}
private void processPage(PDPage page) {
// 页面处理逻辑...
}
/**
* 使用PDFBox内存设置优化
*/
public void configureMemorySettings() {
// 设置最大主内存缓存大小(字节)
System.setProperty("org.apache.pdfbox.maxMemory", String.valueOf(50 * 1024 * 1024)); // 50MB
// 设置临时文件目录
System.setProperty("java.io.tmpdir", "/path/to/temp/directory");
// 禁用字体缓存(减少内存使用,但可能影响性能)
System.setProperty("org.apache.pdfbox.fontcache.disablenew", "true");
}
}
8.1.2 并行处理
@Service
public class PdfParallelProcessingService {
private final ExecutorService executor = Executors.newFixedThreadPool(
Runtime.getRuntime().availableProcessors());
/**
* 并行处理多个PDF文件
*/
public List<CompletableFuture<ProcessingResult>> processPdfFilesInParallel(List<String> filePaths) {
return filePaths.stream()
.map(path -> CompletableFuture.supplyAsync(() -> {
try {
// 处理单个PDF文件
return processSinglePdf(path);
} catch (Exception e) {
throw new CompletionException(e);
}
}, executor))
.collect(Collectors.toList());
}
/**
* 并行处理单个PDF的多个页面
*/
public ProcessingResult processMultiPagePdfInParallel(String pdfPath) throws IOException {
PDDocument document = PDDocument.load(new File(pdfPath));
int pageCount = document.getNumberOfPages();
List<CompletableFuture<PageResult>> futures = new ArrayList<>();
// 并行处理每一页
for (int i = 0; i < pageCount; i++) {
final int pageNum = i;
futures.add(CompletableFuture.supplyAsync(() -> {
try {
PDPage page = document.getPage(pageNum);
return processPageContent(page, pageNum);
} catch (Exception e) {
throw new CompletionException(e);
}
}, executor));
}
// 等待所有页面处理完成
List<PageResult> results = futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());
document.close();
return new ProcessingResult(pdfPath, results);
}
private ProcessingResult processSinglePdf(String path) throws IOException {
// 单个PDF文件处理逻辑
// ...
return new ProcessingResult(path, new ArrayList<>());
}
private PageResult processPageContent(PDPage page, int pageNum) throws IOException {
// 单页处理逻辑
// ...
return new PageResult(pageNum, "Processed");
}
// 结果类
@Data
@AllArgsConstructor
public static class ProcessingResult {
private String filePath;
private List<PageResult> pageResults;
}
@Data
@AllArgsConstructor
public static class PageResult {
private int pageNumber;
private String result;
}
}
8.2 异常处理与日志
良好的异常处理和日志记录对于排查PDF处理问题至关重要。
@Service
@Slf4j // 使用Lombok的日志注解
public class PdfProcessingService {
/**
* 处理PDF文件,包含完善的异常处理和日志
*/
public ProcessingResult processPdf(String inputPath) {
log.info("开始处理PDF文件: {}", inputPath);
PDDocument document = null;
ProcessingResult result = new ProcessingResult();
result.setFilePath(inputPath);
try {
// 验证文件存在
File file = new File(inputPath);
if (!file.exists() || !file.isFile()) {
throw new FileNotFoundException("找不到PDF文件: " + inputPath);
}
log.debug("文件验证通过,开始加载PDF");
// 加载文档
try {
document = PDDocument.load(file);
} catch (InvalidPasswordException e) {
log.error("PDF文件受密码保护: {}", inputPath, e);
result.setStatus(ProcessingStatus.PASSWORD_PROTECTED);
return result;
} catch (IOException e) {
log.error("无法加载PDF文件: {}", inputPath, e);
result.setStatus(ProcessingStatus.LOAD_ERROR);
result.setErrorMessage("无法加载PDF文件: " + e.getMessage());
return result;
}
// 检查是否为空文档
if (document.getNumberOfPages() <= 0) {
log.warn("PDF文件不包含任何页面: {}", inputPath);
result.setStatus(ProcessingStatus.EMPTY_DOCUMENT);
return result;
}
log.info("成功加载PDF,共{}页", document.getNumberOfPages());
// 处理文档内容
try {
processDocumentContent(document, result);
result.setStatus(ProcessingStatus.SUCCESS);
log.info("PDF文件处理成功: {}", inputPath);
} catch (Exception e) {
log.error("处理PDF内容时出错: {}", inputPath, e);
result.setStatus(ProcessingStatus.PROCESSING_ERROR);
result.setErrorMessage("处理内容出错: " + e.getMessage());
}
} catch (Exception e) {
log.error("处理PDF时发生未预期的异常: {}", inputPath, e);
result.setStatus(ProcessingStatus.UNEXPECTED_ERROR);
result.setErrorMessage("未预期的错误: " + e.getMessage());
} finally {
// 确保资源释放
if (document != null) {
try {
document.close();
log.debug("PDF文档已关闭");
} catch (IOException e) {
log.warn("关闭PDF文档时出错", e);
}
}
}
return result;
}
private void processDocumentContent(PDDocument document, ProcessingResult result) {
// 文档处理逻辑...
}
// 处理结果类
@Data
public static class ProcessingResult {
private String filePath;
private ProcessingStatus status;
private String errorMessage;
private Map<String, Object> extractedData = new HashMap<>();
}
// 处理状态枚举
public enum ProcessingStatus {
SUCCESS,
PASSWORD_PROTECTED,
LOAD_ERROR,
EMPTY_DOCUMENT,
PROCESSING_ERROR,
UNEXPECTED_ERROR
}
}
8.3 安全性建议
8.3.1 文件上传安全性
@Service
public class PdfSecurityService {
// 允许的最大PDF文件大小
private static final long MAX_FILE_SIZE = 10 * 1024 * 1024; // 10MB
// 文件类型验证
public boolean isValidPdfFile(MultipartFile file) {
// 检查文件大小
if (file.getSize() > MAX_FILE_SIZE) {
throw new FileValidationException("文件大小超过限制");
}
// 检查内容类型
String contentType = file.getContentType();
if (contentType == null || !contentType.equals("application/pdf")) {
throw new FileValidationException("文件类型必须是PDF");
}
// 检查文件扩展名
String originalFilename = file.getOriginalFilename();
if (originalFilename == null || !originalFilename.toLowerCase().endsWith(".pdf")) {
throw new FileValidationException("文件必须是.pdf格式");
}
// 检查文件内容(魔术字节)
try (InputStream is = file.getInputStream()) {
byte[] header = new byte[5];
int bytesRead = is.read(header);
if (bytesRead < 5 || !new String(header).equals("%PDF-")) {
throw new FileValidationException("无效的PDF文件内容");
}
} catch (IOException e) {
throw new FileValidationException("无法验证文件内容");
}
return true;
}
// 安全处理上传的PDF
public File securelyProcessUploadedPdf(MultipartFile file) throws IOException {
// 验证文件
isValidPdfFile(file);
// 创建一个临时文件
File tempFile = File.createTempFile("secure-pdf-", ".pdf");
try (FileOutputStream fos = new FileOutputStream(tempFile)) {
// 将上传的文件内容写入临时文件
fos.write(file.getBytes());
}
// 扫描文件是否包含恶意内容
scanForMaliciousContent(tempFile);
return tempFile;
}
// 扫描恶意内容
private void scanForMaliciousContent(File pdfFile) throws IOException {
try (PDDocument document = PDDocument.load(pdfFile)) {
// 检查JavaScript
if (hasJavaScript(document)) {
throw new SecurityException("PDF包含可能不安全的JavaScript");
}
// 检查外部链接
if (hasExternalLinks(document)) {
// 可以选择警告而不是阻止
log.warn("PDF包含外部链接: {}", pdfFile.getName());
}
// 检查嵌入式文件
if (hasEmbeddedFiles(document)) {
throw new SecurityException("PDF包含嵌入式文件,可能存在安全风险");
}
// 更多安全检查...
}
}
// 检查JavaScript
private boolean hasJavaScript(PDDocument document) {
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
if (acroForm != null) {
// 检查表单中的JavaScript
// ...
return false; // 假设实现
}
return false;
}
// 检查外部链接
private boolean hasExternalLinks(PDDocument document) {
// 遍历页面和注释,检查外部URL
// ...
return false; // 假设实现
}
// 检查嵌入式文件
private boolean hasEmbeddedFiles(PDDocument document) {
PDDocumentNameDictionary names = document.getDocumentCatalog().getNames();
if (names != null) {
PDEmbeddedFilesNameTreeNode embeddedFiles = names.getEmbeddedFiles();
return embeddedFiles != null && !embeddedFiles.getNames().isEmpty();
}
return false;
}
// 自定义验证异常
public static class FileValidationException extends RuntimeException {
public FileValidationException(String message) {
super(message);
}
}
}
8.3.2 敏感信息保护
@Service
public class PdfDataProtectionService {
// 添加敏感信息水印
public byte[] addConfidentialWatermark(byte[] pdfData) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
PdfStamper stamper = new PdfStamper(reader, baos);
int pageCount = reader.getNumberOfPages();
BaseFont baseFont = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
for (int i = 1; i <= pageCount; i++) {
PdfContentByte content = stamper.getUnderContent(i);
content.beginText();
content.setFontAndSize(baseFont, 60);
content.setColorFill(BaseColor.LIGHT_GRAY);
content.setTextMatrix(30, 30);
content.showTextAligned(Element.ALIGN_CENTER, "机密文件 - 请勿传播",
reader.getPageSize(i).getWidth()/2,
reader.getPageSize(i).getHeight()/2, 45);
content.endText();
}
stamper.close();
reader.close();
return baos.toByteArray();
}
// 文档脱敏
public byte[] redactSensitiveInformation(byte[] pdfData, List<String> patternsToRedact)
throws IOException {
// 注意:真正的PDF编辑和脱敏需要更复杂的处理
// 这里仅做示例
// 1. 提取文本
PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
int pageCount = reader.getNumberOfPages();
// 2. 遍历每页,查找并标记匹配的模式
List<PDFRedactionInfo> redactions = new ArrayList<>();
for (int i = 1; i <= pageCount; i++) {
String pageText = PdfTextExtractor.getTextFromPage(reader, i);
for (String pattern : patternsToRedact) {
Pattern regex = Pattern.compile(pattern);
Matcher matcher = regex.matcher(pageText);
while (matcher.find()) {
// 这里需要实际位置信息,简化版只记录页码
redactions.add(new PDFRedactionInfo(i, matcher.start(), matcher.end()));
}
}
}
reader.close();
// 3. 应用脱敏
// 实际实现需要使用PDFBox的PDFRedactor或iText的PdfCleanUp
// 下面只是概念演示
// 创建脱敏后的PDF
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// ... 复杂的脱敏处理 ...
return baos.toByteArray();
}
// 保存敏感PDF时加密
public byte[] encryptForStorage(byte[] pdfData) throws DocumentException, IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
PdfStamper stamper = new PdfStamper(reader, baos);
// 生成随机密码
String password = generateSecureRandomPassword(16);
// 应用强加密,仅允许打开和打印
stamper.setEncryption(password.getBytes(),
password.getBytes(),
PdfWriter.ALLOW_PRINTING,
PdfWriter.ENCRYPTION_AES_256);
stamper.close();
reader.close();
// 注意:在实际应用中,需要安全地存储密码
storePasswordSecurely(password);
return baos.toByteArray();
}
private String generateSecureRandomPassword(int length) {
SecureRandom random = new SecureRandom();
String chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()";
StringBuilder sb = new StringBuilder();
for (int i = 0; i < length; i++) {
int randomIndex = random.nextInt(chars.length());
sb.append(chars.charAt(randomIndex));
}
return sb.toString();
}
private void storePasswordSecurely(String password) {
// 实际应用中,应使用安全的密钥管理系统
// 例如:HashiCorp Vault, AWS KMS等
}
// 脱敏信息记录类
@Data
@AllArgsConstructor
private static class PDFRedactionInfo {
private int pageNumber;
private int startPosition;
private int endPosition;
}
}
8.4 测试PDF处理功能
编写全面的测试对于确保PDF处理功能的正确性和可靠性至关重要。
@SpringBootTest
public class PdfGenerationServiceTest {
@Autowired
private PdfGenerationService pdfService;
@TempDir
Path tempDir;
@Test
public void testGenerateSimplePdf() throws Exception {
// 安排
String outputPath = tempDir.resolve("test-output.pdf").toString();
// 执行
pdfService.generateSimplePdf(outputPath);
// 断言
File outputFile = new File(outputPath);
assertTrue(outputFile.exists(), "生成的PDF文件应该存在");
assertTrue(outputFile.length() > 0, "PDF文件不应为空");
// 验证PDF内容
PDDocument document = PDDocument.load(outputFile);
assertEquals(1, document.getNumberOfPages(), "PDF应该有1页");
// 验证文本内容
PDFTextStripper stripper = new PDFTextStripper();
String text = stripper.getText(document);
assertTrue(text.contains("Hello World"), "PDF应包含预期文本");
document.close();
}
@Test
public void testGeneratePdfWithTable() throws Exception {
// 安排
String outputPath = tempDir.resolve("table-output.pdf").toString();
List<UserDto> users = Arrays.asList(
new UserDto(1L, "张三", "admin"),
new UserDto(2L, "李四", "user"),
new UserDto(3L, "王五", "editor")
);
// 执行
pdfService.generatePdfWithTable(outputPath, users);
// 断言
File outputFile = new File(outputPath);
assertTrue(outputFile.exists());
// 验证内容(表格验证比较复杂,这里只做基本检查)
PDDocument document = PDDocument.load(outputFile);
assertTrue(document.getNumberOfPages() > 0);
// 验证文本是否包含用户名
PDFTextStripper stripper = new PDFTextStripper();
String text = stripper.getText(document);
assertTrue(text.contains("张三"));
assertTrue(text.contains("李四"));
assertTrue(text.contains("王五"));
document.close();
}
@Test
public void testPdfGeneration_withInvalidInput_shouldThrowException() {
// 安排
String outputPath = tempDir.resolve("invalid-output.pdf").toString();
// 断言
assertThrows(IllegalArgumentException.class, () -> {
// 执行
pdfService.generatePdfWithInvalidInput(outputPath);
});
}
}
8.5 部署最佳实践
@Configuration
public class PdfServiceConfig {
@Bean
public PdfGenerationService pdfGenerationService(
@Value("${pdf.fonts.directory:/app/fonts}") String fontsDirectory,
@Value("${pdf.output.directory:/app/output}") String outputDirectory) {
// 验证目录存在
File fontsDir = new File(fontsDirectory);
if (!fontsDir.exists()) {
fontsDir.mkdirs();
}
File outputDir = new File(outputDirectory);
if (!outputDir.exists()) {
outputDir.mkdirs();
}
// 返回配置好的服务
return new PdfGenerationService(fontsDirectory, outputDirectory);
}
@Bean
public PdfProcessingTaskExecutor pdfTaskExecutor(
@Value("${pdf.processing.thread-pool-size:4}") int threadPoolSize,
@Value("${pdf.processing.queue-capacity:100}") int queueCapacity) {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(threadPoolSize);
executor.setMaxPoolSize(threadPoolSize * 2);
executor.setQueueCapacity(queueCapacity);
executor.setThreadNamePrefix("pdf-proc-");
executor.initialize();
return new PdfProcessingTaskExecutor(executor);
}
// PDF处理健康检查
@Bean
public HealthIndicator pdfServiceHealthIndicator(PdfGenerationService pdfService) {
return () -> {
try {
// 尝试生成一个简单的PDF来验证服务是否正常
ByteArrayOutputStream baos = new ByteArrayOutputStream();
pdfService.generateTestPdf(baos);
if (baos.size() > 0) {
return Health.up().build();
} else {
return Health.down()
.withDetail("reason", "PDF生成输出为空")
.build();
}
} catch (Exception e) {
return Health.down()
.withDetail("reason", "PDF生成失败")
.withDetail("error", e.getMessage())
.build();
}
};
}
}
8.6 错误处理与恢复机制
@Service
@Slf4j
public class PdfProcessingErrorHandler {
/**
* 处理PDF文件,包含重试机制
*/
@Retryable(
value = {IOException.class, TemporaryPdfProcessingException.class},
maxAttempts = 3,
backoff = @Backoff(delay = 1000, multiplier = 2)
)
public ProcessingResult processPdfWithRetry(String pdfPath) throws IOException {
log.info("尝试处理PDF文件: {}", pdfPath);
// PDF处理逻辑,可能抛出异常
return doProcessPdf(pdfPath);
}
/**
* 重试失败后的恢复处理
*/
@Recover
public ProcessingResult recoverFromFailure(Exception e, String pdfPath) {
log.error("处理PDF文件失败,无法恢复: {}", pdfPath, e);
// 创建失败结果
ProcessingResult result = new ProcessingResult();
result.setStatus(ProcessingStatus.FAILED_WITH_RECOVERY);
result.setMessage("处理失败: " + e.getMessage());
// 记录故障
recordFailure(pdfPath, e);
// 发送警报
sendAlert(pdfPath, e);
return result;
}
/**
* 尝试修复损坏的PDF
*/
public byte[] attemptToRepairCorruptedPdf(byte[] corruptedPdfData) {
ByteArrayOutputStream repairedOutput = new ByteArrayOutputStream();
try {
// 使用PDFBox的修复功能尝试恢复
PDFParser parser = new PDFParser(new RandomAccessBufferedFileInputStream(
new ByteArrayInputStream(corruptedPdfData)));
parser.setLenient(true); // 宽容模式
parser.parse();
PDDocument document = parser.getPDDocument();
// 添加空白页(如果文档为空)
if (document.getNumberOfPages() == 0) {
document.addPage(new PDPage());
}
// 保存修复后的文档
document.save(repairedOutput);
document.close();
log.info("成功修复损坏的PDF文件");
return repairedOutput.toByteArray();
} catch (Exception e) {
log.error("无法修复损坏的PDF", e);
// 如果仍然无法修复,返回null或抛出异常
return null;
}
}
/**
* 记录文件处理故障
*/
private void recordFailure(String pdfPath, Exception e) {
// 将故障信息记录到数据库或日志系统
PdfProcessingFailure failure = new PdfProcessingFailure();
failure.setFilePath(pdfPath);
failure.setTimestamp(new Date());
failure.setErrorMessage(e.getMessage());
failure.setStackTrace(ExceptionUtils.getStackTrace(e));
// 保存到数据库
// pdfFailureRepository.save(failure);
}
/**
* 发送警报
*/
private void sendAlert(String pdfPath, Exception e) {
// 当处理重要文件失败时发送警报
// alertService.sendAlert("PDF处理失败", "文件 " + pdfPath + " 处理失败: " + e.getMessage());
}
/**
* 实际的PDF处理逻辑
*/
private ProcessingResult doProcessPdf(String pdfPath) throws IOException {
// 实现PDF处理逻辑...
return new ProcessingResult();
}
// 临时处理异常(可重试)
public static class TemporaryPdfProcessingException extends RuntimeException {
public TemporaryPdfProcessingException(String message) {
super(message);
}
}
// 处理结果类
@Data
public static class ProcessingResult {
private ProcessingStatus status;
private String message;
// 其他字段...
}
// 处理状态枚举
public enum ProcessingStatus {
SUCCESS, FAILED, FAILED_WITH_RECOVERY
}
// 故障记录实体
@Data
public static class PdfProcessingFailure {
private String filePath;
private Date timestamp;
private String errorMessage;
private String stackTrace;
}
}
9. 常见问题与解决方案
在使用SpringBoot处理PDF文件时,开发人员经常会遇到各种问题。本节整理了最常见的问题及其解决方案,帮助您快速解决开发中遇到的困难。
9.1 乱码与字体问题
中文或特殊字符显示为乱码是PDF处理中最常见的问题之一。
问题:PDF中中文显示为方框或乱码
原因:默认情况下,很多PDF库使用的标准字体不支持中文字符。
解决方案:
// 使用iText解决中文显示问题
public byte[] generatePdfWithChineseText() throws IOException, DocumentException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// 创建文档
Document document = new Document();
PdfWriter.getInstance(document, baos);
document.open();
// 方法1:使用中文字体(需要字体文件)
BaseFont baseFont = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED);
Font chineseFont = new Font(baseFont, 12, Font.NORMAL);
document.add(new Paragraph("这是中文内容", chineseFont));
// 方法2:使用嵌入字体(会增加文件大小)
String fontPath = "path/to/fonts/msyh.ttf"; // 微软雅黑字体
BaseFont customFont = BaseFont.createFont(fontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Font embeddedFont = new Font(customFont, 12, Font.NORMAL);
document.add(new Paragraph("这是使用嵌入字体的中文", embeddedFont));
document.close();
return baos.toByteArray();
}
// 使用PDFBox解决中文显示问题
public void addChineseTextWithPdfBox(String pdfPath) throws IOException {
// 创建文档
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
// 加载中文字体
PDType0Font font = PDType0Font.load(document, new File("path/to/fonts/msyh.ttf"));
// 创建内容流
PDPageContentStream contentStream = new PDPageContentStream(document, page);
// 设置字体
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.newLineAtOffset(25, 700);
contentStream.showText("这是PDFBox生成的中文内容");
contentStream.endText();
contentStream.close();
document.save(pdfPath);
document.close();
}
最佳实践:
- 在应用中包含常用的中文字体
- 使用字体子集嵌入减小文件大小
- 创建字体工厂类管理和复用字体实例
9.2 图像处理问题
问题:图片在PDF中模糊或变形
原因:图片DPI设置不正确,或缩放比例不当。
解决方案:
public void addHighQualityImage(Document document, String imagePath) throws IOException, DocumentException {
// 加载图片
Image image = Image.getInstance(imagePath);
// 设置适当的DPI
image.setDpi(300, 300);
// 保持原始宽高比
float width = document.getPageSize().getWidth() - 80; // 左右各40点边距
float aspectRatio = image.getWidth() / image.getHeight();
float height = width / aspectRatio;
// 限制高度不超过页面高度的2/3
float maxHeight = document.getPageSize().getHeight() * 2/3;
if (height > maxHeight) {
height = maxHeight;
width = height * aspectRatio;
}
// 设置大小并保持比例
image.scaleToFit(width, height);
// 居中显示
image.setAlignment(Image.MIDDLE);
document.add(image);
}
问题:PDF文件大小过大
原因:图片未压缩或使用了无损格式。
解决方案:
public byte[] compressImage(byte[] imageData, String format) throws IOException {
ByteArrayInputStream bais = new ByteArrayInputStream(imageData);
BufferedImage image = ImageIO.read(bais);
// 创建输出流
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// 对于JPEG格式,设置压缩质量
if ("jpg".equalsIgnoreCase(format) || "jpeg".equalsIgnoreCase(format)) {
Iterator<ImageWriter> writers = ImageIO.getImageWritersByFormatName(format);
if (writers.hasNext()) {
ImageWriter writer = writers.next();
ImageWriteParam param = writer.getDefaultWriteParam();
param.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
param.setCompressionQuality(0.7f); // 70%质量,调整以平衡大小和质量
ImageOutputStream ios = ImageIO.createImageOutputStream(baos);
writer.setOutput(ios);
writer.write(null, new IIOImage(image, null, null), param);
ios.close();
writer.dispose();
}
} else {
// 对于其他格式,使用默认压缩
ImageIO.write(image, format, baos);
}
return baos.toByteArray();
}
9.3 表单与交互性问题
问题:填充PDF表单后字段值不显示
原因:表单需要重新计算外观,或者字体不兼容。
解决方案:
public byte[] fillPdfForm(byte[] templateBytes, Map<String, String> formData) throws IOException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// 打开PDF模板
PdfReader reader = new PdfReader(new ByteArrayInputStream(templateBytes));
PdfStamper stamper = new PdfStamper(reader, baos);
// 获取表单
AcroFields form = stamper.getAcroFields();
// 设置表单需要重新计算外观
stamper.setFormFlattening(true);
form.setGenerateAppearances(true);
// 添加中文字体支持
BaseFont bf = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED);
form.addSubstitutionFont(bf);
// 填充表单字段
for (Map.Entry<String, String> entry : formData.entrySet()) {
form.setField(entry.getKey(), entry.getValue());
}
// 关闭文档
stamper.close();
reader.close();
return baos.toByteArray();
}
问题:无法在PDF中添加交互式元素
解决方案:
public byte[] createInteractivePdf() throws IOException, DocumentException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// 创建文档
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, baos);
document.open();
// 添加正文内容
document.add(new Paragraph("这是一个包含交互元素的PDF"));
document.add(new Paragraph("请点击下方链接或按钮:"));
document.add(new Paragraph(" "));
// 添加超链接
Anchor anchor = new Anchor("点击访问官网");
anchor.setReference("https://www.example.com");
document.add(anchor);
document.add(new Paragraph(" "));
// 创建一个按钮
Rectangle rect = new Rectangle(100, 100, 200, 130);
PushbuttonField button = new PushbuttonField(writer, rect, "submitButton");
button.setText("提交表单");
button.setBackgroundColor(new BaseColor(0, 122, 204));
button.setTextColor(BaseColor.WHITE);
button.setVisibility(PushbuttonField.VISIBLE);
// 为按钮添加JavaScript动作
button.setAction(PdfAction.javaScript(
"app.alert('按钮被点击了!');", writer));
// 添加按钮到文档
PdfFormField field = button.getField();
writer.addAnnotation(field);
document.close();
return baos.toByteArray();
}
9.4 性能与内存问题
问题:处理大型PDF文件时内存溢出(OutOfMemoryError)
原因:整个PDF文件被一次性加载到内存中。
解决方案:
public void processLargePdfMemoryEfficient(String inputPath, String outputPath) throws IOException {
// 1. 使用随机访问文件而不是将整个文件读入内存
RandomAccessFile raf = new RandomAccessFile(new File(inputPath), "r");
RandomAccessBufferedFileInputStream input = new RandomAccessBufferedFileInputStream(raf);
// 2. 解析PDF文件
PDFParser parser = new PDFParser(input);
parser.parse();
PDDocument document = parser.getPDDocument();
// 3. 按页处理,而不是一次处理所有页面
PDFRenderer renderer = new PDFRenderer(document);
PDPageTree pages = document.getPages();
// 创建输出文档
PDDocument outputDocument = new PDDocument();
// 4. 逐页处理,释放资源
for (int i = 0; i < pages.getCount(); i++) {
// 处理当前页
PDPage page = pages.get(i);
// 执行页面处理
// 例如:提取文本、修改内容等
// 添加处理后的页面到新文档
PDPage newPage = new PDPage(page.getMediaBox());
outputDocument.addPage(newPage);
// 复制内容(简化示例)
PDPageContentStream contentStream = new PDPageContentStream(
outputDocument, newPage, PDPageContentStream.AppendMode.OVERWRITE, true);
// ...处理和复制内容...
contentStream.close();
// 定期清理,每处理10页执行一次垃圾回收
if (i % 10 == 0) {
System.gc();
}
}
// 5. 保存并关闭文档
outputDocument.save(outputPath);
outputDocument.close();
document.close();
input.close();
raf.close();
}
问题:PDF处理速度慢
解决方案:
// 多线程并行处理PDF
public void parallelPdfProcessing(List<String> pdfPaths) {
int parallelism = Math.min(Runtime.getRuntime().availableProcessors(), pdfPaths.size());
ExecutorService executor = Executors.newFixedThreadPool(parallelism);
try {
// 创建任务列表
List<Future<ProcessingResult>> futures = new ArrayList<>();
for (String path : pdfPaths) {
futures.add(executor.submit(() -> processSinglePdf(path)));
}
// 收集结果
for (Future<ProcessingResult> future : futures) {
try {
ProcessingResult result = future.get();
// 处理结果...
System.out.println("处理完成: " + result.getFilePath());
} catch (Exception e) {
// 处理异常...
e.printStackTrace();
}
}
} finally {
executor.shutdown();
}
}
private ProcessingResult processSinglePdf(String path) {
// 单个PDF处理逻辑
// ...
return new ProcessingResult(path, true);
}
@Data
@AllArgsConstructor
private static class ProcessingResult {
private String filePath;
private boolean success;
}
9.5 安全问题
问题:如何防止PDF注入攻击
解决方案:
public void securePdfGeneration(String content, String outputPath) {
// 1. 内容验证和清理
content = cleanContent(content);
try {
// 2. 创建PDF
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputPath));
// 3. 禁用JavaScript
writer.setEncryption(null, null,
PdfWriter.ALLOW_PRINTING | PdfWriter.ALLOW_COPY,
PdfWriter.ENCRYPTION_AES_128);
document.open();
document.add(new Paragraph(content));
document.close();
} catch (Exception e) {
throw new SecurityException("PDF生成失败", e);
}
}
private String cleanContent(String content) {
// 清理潜在的恶意内容
// 例如:移除JavaScript代码、限制特殊字符等
// 简单示例:移除<script>标签
content = content.replaceAll("(?i)<script.*?>.*?</script>", "");
// 移除可能的PDF注入命令 (例如: %PDF-, startxref, xref, trailer等)
content = content.replaceAll("(?i)(%PDF-|startxref|xref|trailer)", "");
// 应用更复杂的内容清理逻辑...
return content;
}
问题:如何安全地处理上传的PDF文件
解决方案:
@Service
public class SecurePdfUploadService {
public boolean validateAndProcessPdfUpload(MultipartFile file) throws IOException {
// 1. 验证MIME类型
if (!file.getContentType().equals("application/pdf")) {
throw new SecurityException("只接受PDF文件");
}
// 2. 验证文件扩展名
String filename = file.getOriginalFilename();
if (filename == null || !filename.toLowerCase().endsWith(".pdf")) {
throw new SecurityException("文件必须是PDF格式");
}
// 3. 检查文件大小
if (file.getSize() > 10 * 1024 * 1024) { // 10MB
throw new SecurityException("PDF文件大小不能超过10MB");
}
// 4. 验证PDF文件头
byte[] content = file.getBytes();
if (content.length < 5 || !isPdfHeader(content)) {
throw new SecurityException("无效的PDF文件格式");
}
// 5. 扫描PDF内容是否安全
if (!scanPdfForThreats(content)) {
throw new SecurityException("PDF文件可能包含恶意内容");
}
// 6. 处理文件内容
return processPdfContent(content);
}
private boolean isPdfHeader(byte[] content) {
// 检查PDF文件头 (%PDF-)
String header = new String(Arrays.copyOf(content, 5));
return header.equals("%PDF-");
}
private boolean scanPdfForThreats(byte[] content) {
try {
PDDocument document = PDDocument.load(new ByteArrayInputStream(content));
// 检查是否包含JavaScript
boolean hasJavaScript = checkForJavaScript(document);
// 检查是否包含外部链接
boolean hasExternalLinks = checkForExternalLinks(document);
// 检查是否包含嵌入式文件
boolean hasEmbeddedFiles = checkForEmbeddedFiles(document);
document.close();
// 根据安全策略决定是否安全
return !hasJavaScript && !hasEmbeddedFiles; // 外部链接可能允许
} catch (Exception e) {
// 解析失败,可能是损坏或恶意文件
return false;
}
}
private boolean checkForJavaScript(PDDocument document) {
// 检查文档中的JavaScript代码
// ...
return false; // 示例返回
}
private boolean checkForExternalLinks(PDDocument document) {
// 检查外部链接
// ...
return false; // 示例返回
}
private boolean checkForEmbeddedFiles(PDDocument document) {
// 检查嵌入文件
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDDocumentNameDictionary names = catalog.getNames();
if (names != null) {
PDEmbeddedFilesNameTreeNode embeddedFiles = names.getEmbeddedFiles();
return embeddedFiles != null && !embeddedFiles.getNames().isEmpty();
}
return false;
}
private boolean processPdfContent(byte[] content) {
// 安全地处理PDF内容
// ...
return true; // 处理成功
}
}
9.6 布局与分页问题
问题:内容跨页不正确或分页不合理
解决方案:
public void createDocumentWithProperPagination(String outputPath) throws IOException, DocumentException {
Document document = new Document(PageSize.A4, 50, 50, 50, 50);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputPath));
// 添加分页事件监听器
writer.setPageEvent(new PaginationHandler());
document.open();
// 设置合适的字体和段落间距
Font normalFont = new Font(Font.FontFamily.TIMES_ROMAN, 12);
Font headingFont = new Font(Font.FontFamily.TIMES_ROMAN, 16, Font.BOLD);
// 添加标题
Paragraph title = new Paragraph("文档标题", headingFont);
title.setAlignment(Element.ALIGN_CENTER);
title.setSpacingAfter(20);
document.add(title);
// 添加内容段落
for (int i = 1; i <= 5; i++) {
Paragraph heading = new Paragraph("章节 " + i, headingFont);
heading.setSpacingBefore(20);
heading.setSpacingAfter(10);
// 确保章节标题不会独自出现在页脚
heading.setKeepTogether(true);
document.add(heading);
// 添加段落
for (int j = 1; j <= 3; j++) {
Paragraph para = new Paragraph("这是第" + i + "章节的第" + j + "个段落。" +
"这是示例文本,用于展示分页效果。这是示例文本,用于展示分页效果。" +
"这是示例文本,用于展示分页效果。", normalFont);
para.setAlignment(Element.ALIGN_JUSTIFIED);
para.setSpacingAfter(10);
document.add(para);
}
// 添加表格(确保表格不会被拆分到两页)
if (i == 3) {
PdfPTable table = new PdfPTable(3);
table.setWidthPercentage(100);
table.setKeepTogether(true); // 保持表格不被分页
// 添加表头
table.addCell(new PdfPCell(new Phrase("列 1", headingFont)));
table.addCell(new PdfPCell(new Phrase("列 2", headingFont)));
table.addCell(new PdfPCell(new Phrase("列 3", headingFont)));
// 添加表格数据
for (int k = 1; k <= 5; k++) {
table.addCell("数据 " + k + "-1");
table.addCell("数据 " + k + "-2");
table.addCell("数据 " + k + "-3");
}
table.setSpacingBefore(15);
table.setSpacingAfter(15);
document.add(table);
}
}
document.close();
}
// 分页处理类
private static class PaginationHandler extends PdfPageEventHelper {
@Override
public void onEndPage(PdfWriter writer, Document document) {
PdfContentByte cb = writer.getDirectContent();
// 添加页码
String pageText = "第 " + writer.getPageNumber() + " 页";
// 设置字体
cb.beginText();
cb.setFontAndSize(BaseFont.createFont(), 10);
// 在页脚居中显示页码
float x = (document.right() - document.left()) / 2 + document.leftMargin();
float y = document.bottom() - 20;
cb.showTextAligned(PdfContentByte.ALIGN_CENTER, pageText, x, y, 0);
cb.endText();
// 如果需要,添加页眉
String headerText = "文档标题";
cb.beginText();
cb.setFontAndSize(BaseFont.createFont(), 10);
cb.showTextAligned(PdfContentByte.ALIGN_CENTER, headerText, x, document.top() + 10, 0);
cb.endText();
}
}
9.7 PDF转换问题
问题:如何将HTML转换为PDF
解决方案:
public byte[] convertHtmlToPdf(String htmlContent) throws IOException, DocumentException {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// 使用iText的XMLWorkerHelper
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, baos);
document.open();
// 转换HTML为PDF
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new ByteArrayInputStream(htmlContent.getBytes(StandardCharsets.UTF_8)));
document.close();
return baos.toByteArray();
}
// 更复杂的HTML转PDF(使用Flying Saucer)
@Service
public class HtmlToPdfService {
public byte[] convertHtmlToPdfWithCss(String htmlContent, String baseUrl) throws IOException {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
try {
// 准备HTML内容
String xHtml = convertToXhtml(htmlContent);
// 创建渲染器
ITextRenderer renderer = new ITextRenderer();
// 设置字体解析器(支持中文)
renderer.getFontResolver().addFont("fonts/simsun.ttc",
BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
// 设置基础URL,用于解析相对路径的资源(CSS、图片等)
if (baseUrl != null) {
renderer.setDocumentFromString(xHtml, baseUrl);
} else {
renderer.setDocumentFromString(xHtml);
}
// 布局文档
renderer.layout();
// 渲染PDF
renderer.createPDF(outputStream);
return outputStream.toByteArray();
} finally {
outputStream.close();
}
}
private String convertToXhtml(String html) {
// 转换普通HTML为XHTML
Tidy tidy = new Tidy();
tidy.setInputEncoding("UTF-8");
tidy.setOutputEncoding("UTF-8");
tidy.setXHTML(true);
ByteArrayInputStream inputStream = new ByteArrayInputStream(
html.getBytes(StandardCharsets.UTF_8));
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
tidy.parse(inputStream, outputStream);
return outputStream.toString(StandardCharsets.UTF_8);
}
}
问题:如何将PDF转换为图片
解决方案:
public List<BufferedImage> convertPdfToImages(byte[] pdfData, int dpi) throws IOException {
List<BufferedImage> images = new ArrayList<>();
// 加载PDF文档
PDDocument document = PDDocument.load(new ByteArrayInputStream(pdfData));
try {
// 创建PDF渲染器
PDFRenderer renderer = new PDFRenderer(document);
// 逐页转换为图片
for (int i = 0; i < document.getNumberOfPages(); i++) {
// 渲染图片(RGB模式,指定DPI)
BufferedImage image = renderer.renderImageWithDPI(i, dpi, ImageType.RGB);
images.add(image);
}
} finally {
document.close();
}
return images;
}
// 保存为图片文件
public void savePdfPagesAsImages(byte[] pdfData, String outputDir, String format)
throws IOException {
List<BufferedImage> images = convertPdfToImages(pdfData, 300);
// 确保输出目录存在
File dir = new File(outputDir);
if (!dir.exists()) {
dir.mkdirs();
}
// 保存每一页为单独的图片文件
for (int i = 0; i < images.size(); i++) {
BufferedImage image = images.get(i);
File outputFile = new File(dir, "page_" + (i + 1) + "." + format);
ImageIO.write(image, format, outputFile);
}
}
9.8 Spring Boot 集成问题
问题:如何在Spring Boot中优雅地处理PDF生成失败
解决方案:
@RestController
@RequestMapping("/api/pdf")
public class PdfController {
private final PdfService pdfService;
private final Logger logger = LoggerFactory.getLogger(PdfController.class);
@Autowired
public PdfController(PdfService pdfService) {
this.pdfService = pdfService;
}
@GetMapping("/generate/{id}")
public ResponseEntity<?> generatePdf(@PathVariable Long id) {
try {
// 尝试生成PDF
byte[] pdfData = pdfService.generatePdf(id);
// 设置响应头
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_PDF);
String filename = "document-" + id + ".pdf";
headers.setContentDispositionFormData("attachment", filename);
return new ResponseEntity<>(pdfData, headers, HttpStatus.OK);
} catch (ResourceNotFoundException e) {
// 资源不存在
logger.warn("尝试生成不存在的资源PDF: {}", id);
return ResponseEntity.status(HttpStatus.NOT_FOUND)
.body(new ErrorResponse("资源不存在", e.getMessage()));
} catch (PdfGenerationException e) {
// PDF生成错误
logger.error("PDF生成失败: {}", e.getMessage(), e);
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(new ErrorResponse("PDF生成错误", e.getMessage()));
} catch (Exception e) {
// 未预期的错误
logger.error("处理PDF请求时发生未预期错误", e);
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(new ErrorResponse("系统错误", "处理请求时发生错误"));
}
}
// 自定义异常
public static class PdfGenerationException extends RuntimeException {
public PdfGenerationException(String message) {
super(message);
}
public PdfGenerationException(String message, Throwable cause) {
super(message, cause);
}
}
// 错误响应DTO
@Data
@AllArgsConstructor
public static class ErrorResponse {
private String error;
private String message;
}
}
// 全局异常处理器
@ControllerAdvice
public class GlobalExceptionHandler {
private final Logger logger = LoggerFactory.getLogger(GlobalExceptionHandler.class);
@ExceptionHandler(PdfController.PdfGenerationException.class)
public ResponseEntity<PdfController.ErrorResponse> handlePdfGenerationException(
PdfController.PdfGenerationException e) {
logger.error("PDF生成异常被全局处理器捕获", e);
return ResponseEntity
.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(new PdfController.ErrorResponse("PDF生成错误", e.getMessage()));
}
}
try {
// 创建PDF渲染器
PDFRenderer renderer = new PDFRenderer(document);
// 逐页转换为图片
for (int i = 0; i < document.getNumberOfPages(); i++) {
// 渲染图片(RGB模式,指定DPI)
BufferedImage image = renderer.renderImageWithDPI(i, dpi, ImageType.RGB);
images.add(image);
}
} finally {
document.close();
}
return images;
}
// 保存为图片文件
public void savePdfPagesAsImages(byte[] pdfData, String outputDir, String format)
throws IOException {
List images = convertPdfToImages(pdfData, 300);
// 确保输出目录存在
File dir = new File(outputDir);
if (!dir.exists()) {
dir.mkdirs();
}
// 保存每一页为单独的图片文件
for (int i = 0; i < images.size(); i++) {
BufferedImage image = images.get(i);
File outputFile = new File(dir, "page_" + (i + 1) + "." + format);
ImageIO.write(image, format, outputFile);
}
}
### 9.8 Spring Boot 集成问题
#### 问题:如何在Spring Boot中优雅地处理PDF生成失败
**解决方案**:
```java
@RestController
@RequestMapping("/api/pdf")
public class PdfController {
private final PdfService pdfService;
private final Logger logger = LoggerFactory.getLogger(PdfController.class);
@Autowired
public PdfController(PdfService pdfService) {
this.pdfService = pdfService;
}
@GetMapping("/generate/{id}")
public ResponseEntity<?> generatePdf(@PathVariable Long id) {
try {
// 尝试生成PDF
byte[] pdfData = pdfService.generatePdf(id);
// 设置响应头
HttpHeaders headers = new HttpHeaders();
headers.setContentType(MediaType.APPLICATION_PDF);
String filename = "document-" + id + ".pdf";
headers.setContentDispositionFormData("attachment", filename);
return new ResponseEntity<>(pdfData, headers, HttpStatus.OK);
} catch (ResourceNotFoundException e) {
// 资源不存在
logger.warn("尝试生成不存在的资源PDF: {}", id);
return ResponseEntity.status(HttpStatus.NOT_FOUND)
.body(new ErrorResponse("资源不存在", e.getMessage()));
} catch (PdfGenerationException e) {
// PDF生成错误
logger.error("PDF生成失败: {}", e.getMessage(), e);
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(new ErrorResponse("PDF生成错误", e.getMessage()));
} catch (Exception e) {
// 未预期的错误
logger.error("处理PDF请求时发生未预期错误", e);
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(new ErrorResponse("系统错误", "处理请求时发生错误"));
}
}
// 自定义异常
public static class PdfGenerationException extends RuntimeException {
public PdfGenerationException(String message) {
super(message);
}
public PdfGenerationException(String message, Throwable cause) {
super(message, cause);
}
}
// 错误响应DTO
@Data
@AllArgsConstructor
public static class ErrorResponse {
private String error;
private String message;
}
}
// 全局异常处理器
@ControllerAdvice
public class GlobalExceptionHandler {
private final Logger logger = LoggerFactory.getLogger(GlobalExceptionHandler.class);
@ExceptionHandler(PdfController.PdfGenerationException.class)
public ResponseEntity<PdfController.ErrorResponse> handlePdfGenerationException(
PdfController.PdfGenerationException e) {
logger.error("PDF生成异常被全局处理器捕获", e);
return ResponseEntity
.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(new PdfController.ErrorResponse("PDF生成错误", e.getMessage()));
}
}
通过以上解决方案,您应该能够解决在SpringBoot应用中处理PDF时遇到的大多数常见问题。如果遇到更复杂的情况,可能需要结合多种技术和方法,或者考虑使用专门的PDF处理服务。