SpringBoot中PDF处理完全指南-EW帮帮网

文章目录

1. PDF基础知识

1.1 什么是PDF

PDF(Portable Document Format，便携式文档格式)是由Adobe公司开发的一种电子文件格式，旨在独立于应用软件、硬件和操作系统，呈现文档的固定布局。PDF具有以下特点：

跨平台兼容性：可以在任何操作系统上查看，保持相同的外观
文档完整性：包含文本、图像、表格、字体等所有文档元素
紧凑性：支持多种压缩技术
安全性：可以设置密码和权限
交互性：支持超链接、表单、多媒体等交互元素

1.2 PDF文件结构

PDF文件包含四个主要部分：

头部(Header)：标识PDF版本
主体(Body)：包含文档内容（文本、图像等）
交叉引用表(Cross-reference Table)：提供文档对象位置的索引
尾部(Trailer)：包含指向交叉引用表的指针和其他对象的引用

了解这些基础概念对于理解PDF操作库的工作原理很有帮助。

2. SpringBoot中的PDF处理库

在SpringBoot应用中处理PDF文件，有几个流行的Java库可供选择：

2.1 iText

iText是一个功能强大的PDF处理库，适用于生成、修改和分析PDF文档。

Maven依赖:

<!-- iText核心库 -->
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>5.5.13.3</version>
</dependency>

<!-- iText 7 (更新版本) -->
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itext7-core</artifactId>
    <version>7.2.5</version>
    <type>pom</type>
</dependency>

注意：iText有开源版本(AGPL许可)和商业版本。在商业项目中使用前，请确认许可证要求。

2.2 Apache PDFBox

Apache PDFBox是Apache软件基金会的开源PDF库，功能全面，许可证更加开放(Apache License 2.0)。

Maven依赖:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.27</version>
</dependency>

2.3 OpenPDF

OpenPDF是iText 4.2.0的开源继承者，提供了更灵活的许可证(LGPL/MPL)。

Maven依赖:

<dependency>
    <groupId>com.github.librepdf</groupId>
    <artifactId>openpdf</artifactId>
    <version>1.3.30</version>
</dependency>

2.4 JasperReports

JasperReports是一个用于生成PDF报表的高级库，特别适合复杂报表的生成。

Maven依赖:

<dependency>
    <groupId>net.sf.jasperreports</groupId>
    <artifactId>jasperreports</artifactId>
    <version>6.20.0</version>
</dependency>

2.5 选择哪个库？

iText: 功能最全面，适合需要高度自定义的场景，但许可限制需要注意
Apache PDFBox: 开源友好，适合基本PDF操作，API相对低级
OpenPDF: 适合需要iText功能但关注许可问题的项目
JasperReports: 最适合复杂报表生成，学习曲线较陡

3. 生成PDF文件

3.1 使用iText生成PDF

基本PDF文档生成

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;

import java.io.FileOutputStream;
import java.io.IOException;

@Service
public class PdfGenerationService {

    public void generateSimplePdf(String outputPath) throws DocumentException, IOException {
        // 创建文档
        Document document = new Document();
        
        // 创建PdfWriter实例
        PdfWriter.getInstance(document, new FileOutputStream(outputPath));
        
        // 打开文档
        document.open();
        
        // 添加内容
        document.add(new Paragraph("Hello World! 这是我用iText生成的第一个PDF文档。"));
        document.add(new Paragraph("PDF生成时间: " + new java.util.Date()));
        
        // 关闭文档
        document.close();
        
        System.out.println("PDF已创建: " + outputPath);
    }
}

添加表格

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Phrase;
import com.itextpdf.text.pdf.PdfPCell;
import com.itextpdf.text.pdf.PdfPTable;
import com.itextpdf.text.pdf.PdfWriter;

import java.io.FileOutputStream;
import java.io.IOException;

@Service
public class PdfTableService {

    public void generatePdfWithTable(String outputPath) throws DocumentException, IOException {
        Document document = new Document();
        PdfWriter.getInstance(document, new FileOutputStream(outputPath));
        
        document.open();
        
        // 添加一个段落
        document.add(new Paragraph("用户数据表"));
        
        // 创建表格(3列)
        PdfPTable table = new PdfPTable(3);
        // 设置表格宽度百分比
        table.setWidthPercentage(100);
        // 设置列宽比例
        table.setWidths(new float[]{2, 5, 3});
        
        // 添加表头
        addTableHeader(table);
        
        // 添加行数据
        addTableRows(table);
        
        // 将表格添加到文档
        document.add(table);
        
        document.close();
    }
    
    private void addTableHeader(PdfPTable table) {
        PdfPCell header = new PdfPCell();
        header.setBackgroundColor(BaseColor.LIGHT_GRAY);
        header.setBorderWidth(2);
        header.setHorizontalAlignment(Element.ALIGN_CENTER);
        
        header.setPhrase(new Phrase("ID"));
        table.addCell(header);
        
        header.setPhrase(new Phrase("姓名"));
        table.addCell(header);
        
        header.setPhrase(new Phrase("角色"));
        table.addCell(header);
    }
    
    private void addTableRows(PdfPTable table) {
        // 第一行
        table.addCell("1001");
        table.addCell("张三");
        table.addCell("管理员");
        
        // 第二行
        table.addCell("1002");
        table.addCell("李四");
        table.addCell("用户");
        
        // 第三行
        table.addCell("1003");
        table.addCell("王五");
        table.addCell("审核员");
    }
}

添加图片

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;

import java.io.FileOutputStream;
import java.io.IOException;

@Service
public class PdfImageService {

    public void generatePdfWithImage(String outputPath, String imagePath) 
            throws DocumentException, IOException {
        Document document = new Document();
        PdfWriter.getInstance(document, new FileOutputStream(outputPath));
        
        document.open();
        
        // 添加文本
        document.add(new Paragraph("包含图片的PDF文档"));
        
        // 添加图片
        Image image = Image.getInstance(imagePath);
        // 缩放图片
        image.scaleToFit(400, 300);
        // 设置图片位置(居中)
        image.setAlignment(Image.MIDDLE);
        
        document.add(image);
        
        // 在图片下添加说明
        document.add(new Paragraph("图1: 示例图片"));
        
        document.close();
    }
}

3.2 使用Apache PDFBox生成PDF

基本文档生成

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType0Font;

import java.io.File;
import java.io.IOException;

@Service
public class PdfBoxService {

    public void generateSimplePdf(String outputPath) throws IOException {
        // 创建新文档
        PDDocument document = new PDDocument();
        
        // 添加空白页
        PDPage page = new PDPage();
        document.addPage(page);
        
        // 创建内容流以添加内容
        PDPageContentStream contentStream = new PDPageContentStream(document, page);
        
        // 开始文本操作
        contentStream.beginText();
        
        // 设置字体和大小
        // 使用带有中文支持的字体
        PDType0Font font = PDType0Font.load(document, 
                new File("src/main/resources/fonts/SimSun.ttf"));
        contentStream.setFont(font, 12);
        
        // 设置文本位置(从页面左下角计算,单位是点)
        contentStream.newLineAtOffset(25, 700);
        
        // 添加文本
        contentStream.showText("Hello World! 这是我用PDFBox生成的PDF文档。");
        
        // 移动到下一行
        contentStream.newLineAtOffset(0, -15);
        contentStream.showText("PDF生成时间: " + new java.util.Date());
        
        // 结束文本操作
        contentStream.endText();
        
        // 关闭内容流
        contentStream.close();
        
        // 保存文档
        document.save(outputPath);
        
        // 关闭文档
        document.close();
        
        System.out.println("PDFBox已创建PDF: " + outputPath);
    }
}

添加表格(PDFBox中较为复杂)

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType0Font;

import java.io.File;
import java.io.IOException;

@Service
public class PdfBoxTableService {

    public void generatePdfWithTable(String outputPath) throws IOException {
        // 创建文档
        PDDocument document = new PDDocument();
        PDPage page = new PDPage();
        document.addPage(page);
        
        PDPageContentStream contentStream = new PDPageContentStream(document, page);
        
        // 加载字体
        PDType0Font font = PDType0Font.load(document, 
                new File("src/main/resources/fonts/SimSun.ttf"));
        
        // 表格内容
        String[][] content = {
                {"ID", "姓名", "角色"},
                {"1001", "张三", "管理员"},
                {"1002", "李四", "用户"},
                {"1003", "王五", "审核员"}
        };
        
        // 表格位置和尺寸
        float margin = 50;
        float y = page.getMediaBox().getHeight() - margin;
        float tableWidth = page.getMediaBox().getWidth() - 2 * margin;
        
        // 绘制标题
        contentStream.beginText();
        contentStream.setFont(font, 16);
        contentStream.newLineAtOffset(margin, y);
        contentStream.showText("用户数据表");
        contentStream.endText();
        
        y -= 30;
        
        // 计算每列宽度
        final int rows = content.length;
        final int cols = content[0].length;
        final float rowHeight = 20f;
        final float tableHeight = rowHeight * rows;
        final float colWidth = tableWidth / (float)cols;
        
        // 画表格
        // 表格外框
        contentStream.setLineWidth(1f);
        contentStream.addRect(margin, y - tableHeight, tableWidth, tableHeight);
        contentStream.stroke();
        
        // 画横线
        for(int i = 0; i < rows; i++) {
            contentStream.addLine(margin, y - i * rowHeight, 
                    margin + tableWidth, y - i * rowHeight);
        }
        contentStream.stroke();
        
        // 画竖线
        for(int i = 0; i <= cols; i++) {
            contentStream.addLine(margin + i * colWidth, y, 
                    margin + i * colWidth, y - tableHeight);
        }
        contentStream.stroke();
        
        // 添加文本
        contentStream.setFont(font, 12);
        
        // 表头使用粗体
        float textx = margin + 5;
        float texty = y - 15;
        
        for(int i = 0; i < rows; i++) {
            for(int j = 0; j < cols; j++) {
                contentStream.beginText();
                contentStream.newLineAtOffset(textx + j * colWidth, texty - i * rowHeight);
                contentStream.showText(content[i][j]);
                contentStream.endText();
            }
        }
        
        contentStream.close();
        document.save(outputPath);
        document.close();
    }
}

3.3 使用OpenPDF生成PDF

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;

import java.io.FileOutputStream;
import java.io.IOException;

@Service
public class OpenPdfService {

    public void generateSimplePdf(String outputPath) throws DocumentException, IOException {
        // 创建文档
        Document document = new Document();
        
        // 创建Writer
        PdfWriter.getInstance(document, new FileOutputStream(outputPath));
        
        // 打开文档
        document.open();
        
        // 添加内容
        document.add(new Paragraph("Hello World! 这是我用OpenPDF生成的文档。"));
        document.add(new Paragraph("PDF生成时间: " + new java.util.Date()));
        
        // 关闭文档
        document.close();
        
        System.out.println("OpenPDF已创建PDF: " + outputPath);
    }
}

3.4 在SpringBoot控制器中生成并下载PDF

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpStatus;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.ByteArrayOutputStream;

@RestController
@RequestMapping("/api/pdf")
public class PdfController {

    @Autowired
    private PdfGenerationService pdfService;
    
    @GetMapping("/download")
    public ResponseEntity<byte[]> downloadPdf() {
        try {
            // 使用ByteArrayOutputStream而非文件
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            
            // 生成PDF到内存流
            pdfService.generatePdf(baos);
            
            // 设置HTTP头
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            
            // 设置文件下载头
            String filename = "generated_document.pdf";
            headers.setContentDispositionFormData("attachment", filename);
            
            // 返回PDF字节数组
            return new ResponseEntity<>(baos.toByteArray(), headers, HttpStatus.OK);
            
        } catch (Exception e) {
            e.printStackTrace();
            return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
        }
    }
}

对应的服务类：

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;

import java.io.IOException;
import java.io.OutputStream;

@Service
public class PdfGenerationService {

    public void generatePdf(OutputStream outputStream) throws DocumentException, IOException {
        // 创建文档
        Document document = new Document();
        
        // 写入输出流
        PdfWriter.getInstance(document, outputStream);
        
        // 打开文档
        document.open();
        
        // 添加内容
        document.add(new Paragraph("动态生成的PDF内容"));
        document.add(new Paragraph("此PDF由SpringBoot应用程序生成"));
        document.add(new Paragraph("生成时间: " + new java.util.Date()));
        
        // 关闭文档
        document.close();
    }
}

4. 读取与解析PDF

4.1 使用PDFBox读取PDF文本

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;

import java.io.File;
import java.io.IOException;

@Service
public class PdfReaderService {

    public String extractTextFromPdf(String pdfPath) throws IOException {
        // 加载PDF文档
        File file = new File(pdfPath);
        PDDocument document = PDDocument.load(file);
        
        try {
            // 创建PDF文本提取器
            PDFTextStripper stripper = new PDFTextStripper();
            
            // 获取文本内容
            String text = stripper.getText(document);
            
            return text;
        } finally {
            // 确保文档关闭
            if (document != null) {
                document.close();
            }
        }
    }
    
    // 提取特定页面的文本
    public String extractTextFromPage(String pdfPath, int pageNumber) throws IOException {
        File file = new File(pdfPath);
        PDDocument document = PDDocument.load(file);
        
        try {
            PDFTextStripper stripper = new PDFTextStripper();
            
            // 设置起始页和结束页
            stripper.setStartPage(pageNumber);
            stripper.setEndPage(pageNumber);
            
            return stripper.getText(document);
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
}

4.2 使用iText解析PDF

import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy;
import com.itextpdf.text.pdf.parser.TextExtractionStrategy;

import java.io.IOException;

@Service
public class ITextPdfReaderService {

    public String extractTextFromPdf(String pdfPath) throws IOException {
        PdfReader reader = new PdfReader(pdfPath);
        StringBuilder textBuilder = new StringBuilder();
        
        try {
            int pages = reader.getNumberOfPages();
            
            // 遍历所有页面
            for (int i = 1; i <= pages; i++) {
                TextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                String pageText = PdfTextExtractor.getTextFromPage(reader, i, strategy);
                textBuilder.append(pageText).append("\n");
            }
            
            return textBuilder.toString();
        } finally {
            if (reader != null) {
                reader.close();
            }
        }
    }
    
    // 获取PDF元数据
    public Map<String, String> getPdfMetadata(String pdfPath) throws IOException {
        PdfReader reader = new PdfReader(pdfPath);
        Map<String, String> metadata = new HashMap<>();
        
        try {
            metadata.put("Title", reader.getInfo().get("Title"));
            metadata.put("Author", reader.getInfo().get("Author"));
            metadata.put("Subject", reader.getInfo().get("Subject"));
            metadata.put("Keywords", reader.getInfo().get("Keywords"));
            metadata.put("Creator", reader.getInfo().get("Creator"));
            metadata.put("Producer", reader.getInfo().get("Producer"));
            metadata.put("Creation Date", reader.getInfo().get("CreationDate"));
            metadata.put("Modification Date", reader.getInfo().get("ModDate"));
            metadata.put("Page Count", String.valueOf(reader.getNumberOfPages()));
            
            return metadata;
        } finally {
            if (reader != null) {
                reader.close();
            }
        }
    }
}

4.3 从PDF中提取表格数据

提取PDF中的表格是一个复杂任务，可以使用如Tabula-Java等专门库：

<dependency>
    <groupId>technology.tabula</groupId>
    <artifactId>tabula</artifactId>
    <version>1.0.5</version>
</dependency>

import technology.tabula.ObjectExtractor;
import technology.tabula.Page;
import technology.tabula.PageIterator;
import technology.tabula.Rectangle;
import technology.tabula.Table;
import technology.tabula.extractors.SpreadsheetExtractionAlgorithm;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

@Service
public class PdfTableExtractorService {

    public List<String[][]> extractTablesFromPdf(String pdfPath) throws IOException {
        // 打开PDF文件
        PDDocument document = PDDocument.load(new File(pdfPath));
        
        List<String[][]> allTables = new ArrayList<>();
        
        try {
            // 创建ObjectExtractor
            ObjectExtractor extractor = new ObjectExtractor(document);
            
            // 提取所有页面
            PageIterator iterator = extractor.extract();
            
            // 表格提取算法
            SpreadsheetExtractionAlgorithm sea = new SpreadsheetExtractionAlgorithm();
            
            // 处理每一页
            while (iterator.hasNext()) {
                Page page = iterator.next();
                
                // 提取表格
                List<Table> tables = sea.extract(page);
                
                // 处理每个表格
                for (Table table : tables) {
                    int rowCount = table.getRowCount();
                    int colCount = table.getColCount();
                    
                    String[][] tableData = new String[rowCount][colCount];
                    
                    // 提取单元格数据
                    for (int i = 0; i < rowCount; i++) {
                        for (int j = 0; j < colCount; j++) {
                            if (j < table.getRows().get(i).size()) {
                                tableData[i][j] = table.getRows().get(i).get(j).getText();
                            } else {
                                tableData[i][j] = "";
                            }
                        }
                    }
                    
                    allTables.add(tableData);
                }
            }
        } finally {
            if (document != null) {
                document.close();
            }
        }
        
        return allTables;
    }
    
    // 打印表格数据(用于测试)
    public void printTableData(String[][] tableData) {
        for (String[] row : tableData) {
            for (String cell : row) {
                System.out.print(cell + " | ");
            }
            System.out.println();
        }
    }
}

5. 修改现有PDF文件

修改现有PDF文件是常见需求，包括添加新内容、修改文本、删除页面、合并文档等操作。

5.1 添加水印和页码

使用iText添加水印

import com.itextpdf.text.BaseColor;
import com.itextpdf.text.Document;
import com.itextpdf.text.Element;
import com.itextpdf.text.Font;
import com.itextpdf.text.FontFactory;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfGState;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;

import java.io.FileOutputStream;
import java.io.IOException;

@Service
public class PdfWatermarkService {

    public void addWatermark(String inputPath, String outputPath, String watermarkText) 
            throws IOException, DocumentException {
        // 打开现有PDF
        PdfReader reader = new PdfReader(inputPath);
        PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(outputPath));
        
        // 创建基本字体
        BaseFont baseFont = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED);
        Font font = new Font(baseFont, 30, Font.BOLD, BaseColor.GRAY);
        
        // 获取PDF页数
        int pageCount = reader.getNumberOfPages();
        
        // 对每页添加水印
        for (int i = 1; i <= pageCount; i++) {
            // 获取页面尺寸
            Rectangle pageRect = reader.getPageSize(i);
            float width = pageRect.getWidth();
            float height = pageRect.getHeight();
            
            // 获取内容字节层(在内容下方)
            PdfContentByte under = stamper.getUnderContent(i);
            
            // 设置透明度
            PdfGState gs = new PdfGState();
            gs.setFillOpacity(0.3f);
            under.setGState(gs);
            
            // 保存图形状态
            under.saveState();
            
            // 设置字体和颜色
            under.setFontAndSize(baseFont, 30);
            under.setColorFill(BaseColor.GRAY);
            
            // 添加水印文本(旋转45度)
            under.beginText();
            // 文本旋转和位置
            under.showTextAligned(Element.ALIGN_CENTER, watermarkText, 
                                 width / 2, height / 2, 45);
            under.endText();
            
            // 恢复图形状态
            under.restoreState();
        }
        
        // 关闭资源
        stamper.close();
        reader.close();
        
        System.out.println("水印已添加: " + outputPath);
    }
}

使用PDFBox添加页码

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.PDPageContentStream.AppendMode;
import org.apache.pdfbox.pdmodel.font.PDType1Font;

import java.io.IOException;

@Service
public class PdfPageNumberService {

    public void addPageNumbers(String inputPath, String outputPath) throws IOException {
        // 打开PDF文档
        PDDocument document = PDDocument.load(new File(inputPath));
        
        try {
            // 获取页数
            int pageCount = document.getNumberOfPages();
            
            // 为每一页添加页码
            for (int i = 0; i < pageCount; i++) {
                PDPage page = document.getPage(i);
                
                // 创建内容流以追加内容(AppendMode.APPEND表示添加到现有内容后)
                PDPageContentStream contentStream = new PDPageContentStream(
                        document, page, AppendMode.APPEND, true, true);
                
                // 获取页面尺寸
                float pageWidth = page.getMediaBox().getWidth();
                float pageHeight = page.getMediaBox().getHeight();
                
                // 设置页码文本
                String pageNumberText = "第 " + (i + 1) + " 页，共 " + pageCount + " 页";
                
                // 使用PDFBox内置字体(注意中文需要使用支持中文的字体)
                contentStream.setFont(PDType1Font.HELVETICA, 10);
                
                // 添加文本(居中于页面底部)
                contentStream.beginText();
                // 计算文本宽度以居中
                float textWidth = PDType1Font.HELVETICA.getStringWidth(pageNumberText) / 1000 * 10;
                float xPosition = (pageWidth - textWidth) / 2;
                
                contentStream.newLineAtOffset(xPosition, 20); // 距底部20点
                contentStream.showText(pageNumberText);
                contentStream.endText();
                
                // 关闭内容流
                contentStream.close();
            }
            
            // 保存修改后的文档
            document.save(outputPath);
            
        } finally {
            if (document != null) {
                document.close();
            }
        }
        
        System.out.println("已添加页码: " + outputPath);
    }
}

5.2 合并多个PDF文件

使用PDFBox合并PDF

import org.apache.pdfbox.multipdf.PDFMergerUtility;
import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.File;
import java.io.IOException;
import java.util.List;

@Service
public class PdfMergeService {

    public void mergePdfFiles(List<String> inputPaths, String outputPath) throws IOException {
        // 创建PDF合并工具
        PDFMergerUtility merger = new PDFMergerUtility();
        
        // 设置目标文件
        merger.setDestinationFileName(outputPath);
        
        // 添加源文件
        for (String path : inputPaths) {
            File file = new File(path);
            merger.addSource(file);
        }
        
        // 执行合并
        merger.mergeDocuments(null);
        
        System.out.println("PDF合并完成: " + outputPath);
    }
    
    // 指定页面范围合并
    public void mergeWithPageRanges(String outputPath) throws IOException {
        // 创建空文档接收合并结果
        PDDocument mergedDocument = new PDDocument();
        
        try {
            // 打开第一个PDF
            PDDocument doc1 = PDDocument.load(new File("pdf1.pdf"));
            // 仅添加第1页和第3页
            mergedDocument.addPage(doc1.getPage(0)); // 0表示第1页
            mergedDocument.addPage(doc1.getPage(2)); // 2表示第3页
            doc1.close();
            
            // 打开第二个PDF
            PDDocument doc2 = PDDocument.load(new File("pdf2.pdf"));
            // 添加所有页面
            for (int i = 0; i < doc2.getNumberOfPages(); i++) {
                mergedDocument.addPage(doc2.getPage(i));
            }
            doc2.close();
            
            // 打开第三个PDF
            PDDocument doc3 = PDDocument.load(new File("pdf3.pdf"));
            // 只添加最后一页
            mergedDocument.addPage(doc3.getPage(doc3.getNumberOfPages() - 1));
            doc3.close();
            
            // 保存合并后的文档
            mergedDocument.save(outputPath);
            
        } finally {
            if (mergedDocument != null) {
                mergedDocument.close();
            }
        }
        
        System.out.println("选择性页面合并完成: " + outputPath);
    }
}

使用iText合并PDF

import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfCopy;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;

import java.io.FileOutputStream;
import java.io.IOException;
import java.util.List;

@Service
public class ITextPdfMergeService {

    public void mergePdfFiles(List<String> inputPaths, String outputPath) 
            throws IOException, DocumentException {
        // 创建一个文档对象
        Document document = new Document();
        
        // 创建PdfCopy实例
        PdfCopy copy = new PdfCopy(document, new FileOutputStream(outputPath));
        
        // 打开文档
        document.open();
        
        try {
            // 遍历每个输入PDF
            for (String path : inputPaths) {
                // 创建PdfReader实例
                PdfReader reader = new PdfReader(path);
                
                // 获取页数
                int pageCount = reader.getNumberOfPages();
                
                // 复制每一页
                for (int i = 1; i <= pageCount; i++) {
                    PdfImportedPage page = copy.getImportedPage(reader, i);
                    copy.addPage(page);
                }
                
                // 关闭reader
                reader.close();
            }
        } finally {
            // 关闭document
            if (document.isOpen()) {
                document.close();
            }
        }
        
        System.out.println("iText PDF合并完成: " + outputPath);
    }
}

5.3 分割PDF文件

import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.File;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;

@Service
public class PdfSplitService {

    // 将PDF拆分为单页文件
    public void splitPdfToSinglePages(String inputPath, String outputFolder) throws IOException {
        // 加载PDF
        PDDocument document = PDDocument.load(new File(inputPath));
        
        try {
            // 创建分割器
            Splitter splitter = new Splitter();
            
            // 执行分割(每个文档一页)
            List<PDDocument> pages = splitter.split(document);
            
            // 获取输入文件名(不带扩展名)
            String fileNameWithoutExt = new File(inputPath).getName();
            if (fileNameWithoutExt.contains(".")) {
                fileNameWithoutExt = fileNameWithoutExt.substring(0, 
                                    fileNameWithoutExt.lastIndexOf('.'));
            }
            
            // 创建输出目录(如果不存在)
            File outputDir = new File(outputFolder);
            if (!outputDir.exists()) {
                outputDir.mkdirs();
            }
            
            // 保存每一页为单独文件
            Iterator<PDDocument> iterator = pages.iterator();
            int pageNumber = 1;
            
            while (iterator.hasNext()) {
                PDDocument pd = iterator.next();
                String outputPath = outputFolder + File.separator 
                                   + fileNameWithoutExt + "_page_" + pageNumber + ".pdf";
                pd.save(outputPath);
                pd.close();
                pageNumber++;
            }
            
            System.out.println("PDF分割完成，共 " + (pageNumber - 1) + " 页已保存到: " + outputFolder);
            
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
    
    // 将PDF按页范围拆分
    public void splitPdfByPageRange(String inputPath, String outputPath, int startPage, int endPage) 
            throws IOException {
        // 加载PDF
        PDDocument document = PDDocument.load(new File(inputPath));
        
        try {
            // 验证页面范围
            int totalPages = document.getNumberOfPages();
            if (startPage < 1 || endPage > totalPages || startPage > endPage) {
                throw new IllegalArgumentException("无效的页面范围: " + startPage + "-" + endPage 
                                                  + "，文档共有 " + totalPages + " 页");
            }
            
            // 创建新文档
            PDDocument newDocument = new PDDocument();
            
            // 复制指定页面
            for (int i = startPage; i <= endPage; i++) {
                // 注意PDFBox页码从0开始
                newDocument.addPage(document.getPage(i - 1));
            }
            
            // 保存新文档
            newDocument.save(outputPath);
            newDocument.close();
            
            System.out.println("已提取页面 " + startPage + " 到 " + endPage + " 并保存到: " + outputPath);
            
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
}

5.4 加密与解密PDF

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.encryption.AccessPermission;
import org.apache.pdfbox.pdmodel.encryption.StandardProtectionPolicy;

import java.io.File;
import java.io.IOException;

@Service
public class PdfEncryptionService {

    // 加密PDF文件
    public void encryptPdf(String inputPath, String outputPath, 
                         String userPassword, String ownerPassword) throws IOException {
        // 加载PDF
        PDDocument document = PDDocument.load(new File(inputPath));
        
        try {
            // 设置访问权限
            AccessPermission accessPermission = new AccessPermission();
            
            // 禁止打印
            accessPermission.setCanPrint(false);
            
            // 禁止修改内容
            accessPermission.setCanModify(false);
            
            // 禁止复制内容
            accessPermission.setCanExtractContent(false);
            
            // 禁止添加或修改注释
            accessPermission.setCanModifyAnnotations(false);
            
            // 创建保护策略(用户密码、所有者密码、密钥长度)
            StandardProtectionPolicy policy = new StandardProtectionPolicy(
                    ownerPassword, userPassword, accessPermission);
            
            // 设置加密密钥长度(128位)
            policy.setEncryptionKeyLength(128);
            
            // 应用加密
            document.protect(policy);
            
            // 保存加密后的文档
            document.save(outputPath);
            
            System.out.println("PDF加密完成: " + outputPath);
            
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
    
    // 解密PDF文件(需要提供密码)
    public void decryptPdf(String inputPath, String outputPath, String password) throws IOException {
        // 加载加密的PDF(提供密码)
        PDDocument document = PDDocument.load(new File(inputPath), password);
        
        try {
            // 检查文档是否加密
            if (document.isEncrypted()) {
                // 移除加密
                document.setAllSecurityToBeRemoved(true);
                
                // 保存解密后的文档
                document.save(outputPath);
                
                System.out.println("PDF解密完成: " + outputPath);
            } else {
                System.out.println("PDF未加密，无需解密");
                // 可以选择直接复制文件
                document.save(outputPath);
            }
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
}

5.5 删除和重新排序页面

import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

@Service
public class PdfPageManipulationService {

    // 删除指定页面
    public void removePages(String inputPath, String outputPath, List<Integer> pagesToRemove) 
            throws IOException {
        // 加载PDF
        PDDocument document = PDDocument.load(new File(inputPath));
        
        try {
            // 按降序排列要删除的页码(从后往前删除避免索引变化)
            Collections.sort(pagesToRemove, Collections.reverseOrder());
            
            // 删除指定页面
            for (int pageNum : pagesToRemove) {
                // 验证页码是否有效
                if (pageNum >= 1 && pageNum <= document.getNumberOfPages()) {
                    // PDFBox页码从0开始
                    document.removePage(pageNum - 1);
                } else {
                    System.out.println("警告：页码 " + pageNum + " 超出范围，已忽略");
                }
            }
            
            // 保存修改后的文档
            document.save(outputPath);
            
            System.out.println("页面删除完成，保存到: " + outputPath);
            
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
    
    // 重新排序页面
    public void reorderPages(String inputPath, String outputPath, int[] newOrder) throws IOException {
        // 加载PDF
        PDDocument document = PDDocument.load(new File(inputPath));
        
        try {
            // 获取原始页数
            int pageCount = document.getNumberOfPages();
            
            // 验证新顺序数组长度
            if (newOrder.length != pageCount) {
                throw new IllegalArgumentException(
                    "新顺序数组长度(" + newOrder.length + ")与页数(" + pageCount + ")不匹配");
            }
            
            // 创建临时文档
            PDDocument newDocument = new PDDocument();
            
            // 按新顺序添加页面
            for (int i = 0; i < newOrder.length; i++) {
                // 确保索引从1开始转为0开始
                int oldIndex = newOrder[i] - 1;
                
                // 验证索引有效性
                if (oldIndex < 0 || oldIndex >= pageCount) {
                    throw new IllegalArgumentException("新顺序中包含无效页码: " + (oldIndex + 1));
                }
                
                // 导入页面
                newDocument.addPage(document.getPage(oldIndex));
            }
            
            // 保存新文档
            newDocument.save(outputPath);
            newDocument.close();
            
            System.out.println("页面重新排序完成，保存到: " + outputPath);
            
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
}

5.6 填充PDF表单

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;

import java.io.File;
import java.io.IOException;
import java.util.Map;

@Service
public class PdfFormFillingService {

    // 填充PDF表单
    public void fillPdfForm(String templatePath, String outputPath, 
                           Map<String, String> formData) throws IOException {
        // 加载PDF模板
        PDDocument document = PDDocument.load(new File(templatePath));
        
        try {
            // 获取表单
            PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
            
            if (acroForm != null) {
                // 遍历表单字段并填充数据
                for (Map.Entry<String, String> entry : formData.entrySet()) {
                    String fieldName = entry.getKey();
                    String fieldValue = entry.getValue();
                    
                    // 查找表单字段
                    PDField field = acroForm.getField(fieldName);
                    
                    if (field != null) {
                        // 设置字段值
                        field.setValue(fieldValue);
                    } else {
                        System.out.println("警告：找不到表单字段 '" + fieldName + "'");
                    }
                }
                
                // 设置表单为不可编辑(可选)
                acroForm.setNeedAppearances(true);
                
                // 保存填充后的表单
                document.save(outputPath);
                
                System.out.println("PDF表单填充完成，保存到: " + outputPath);
            } else {
                System.out.println("错误：PDF文档不包含表单");
            }
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
    
    // 列出PDF文档中的所有表单字段(用于调试)
    public void listFormFields(String pdfPath) throws IOException {
        // 加载PDF
        PDDocument document = PDDocument.load(new File(pdfPath));
        
        try {
            // 获取表单
            PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
            
            if (acroForm != null) {
                // 获取所有字段
                List<PDField> fields = acroForm.getFields();
                
                System.out.println("PDF文档包含 " + fields.size() + " 个表单字段:");
                
                // 打印字段名称和类型
                for (PDField field : fields) {
                    System.out.println("字段名称: " + field.getFullyQualifiedName() + 
                                      ", 类型: " + field.getClass().getSimpleName());
                }
            } else {
                System.out.println("PDF文档不包含表单");
            }
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
}

6. Web应用中的PDF处理

在SpringBoot Web应用中，PDF处理是一个常见需求，包括PDF的生成与下载、在线预览、上传与解析等功能。本节将介绍如何在SpringBoot应用中实现这些功能。

6.1 PDF下载功能实现

在Web应用中，通常需要提供PDF下载功能，如报表下载、证书下载等。下面是一个基本的PDF下载控制器实现：

@RestController
@RequestMapping("/api/pdf")
public class PdfDownloadController {

    private final PdfGenerationService pdfService;
    
    @Autowired
    public PdfDownloadController(PdfGenerationService pdfService) {
        this.pdfService = pdfService;
    }
    
    /**
     * 生成并下载简单PDF
     */
    @GetMapping("/download/simple")
    public ResponseEntity<byte[]> downloadSimplePdf() {
        try {
            byte[] pdfBytes = pdfService.generateSimplePdf();
            
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            headers.setContentDispositionFormData("attachment", "document.pdf");
            headers.setCacheControl("must-revalidate, post-check=0, pre-check=0");
            
            return new ResponseEntity<>(pdfBytes, headers, HttpStatus.OK);
        } catch (Exception e) {
            return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
        }
    }
    
    /**
     * 根据ID生成并下载报表
     */
    @GetMapping("/download/report/{id}")
    public ResponseEntity<byte[]> downloadReportById(@PathVariable("id") Long id) {
        try {
            byte[] pdfBytes = pdfService.generateReportPdf(id);
            if (pdfBytes == null) {
                return new ResponseEntity<>(HttpStatus.NOT_FOUND);
            }
            
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            headers.setContentDispositionFormData("attachment", "report-" + id + ".pdf");
            headers.setCacheControl("must-revalidate, post-check=0, pre-check=0");
            
            return new ResponseEntity<>(pdfBytes, headers, HttpStatus.OK);
        } catch (Exception e) {
            return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
        }
    }
}

以下是使用模板生成证书并下载的示例：

@Controller
@RequestMapping("/api/pdf/templates")
public class PdfTemplateController {

    private final PdfTemplateService templateService;
    
    @Autowired
    public PdfTemplateController(PdfTemplateService templateService) {
        this.templateService = templateService;
    }
    
    /**
     * 生成证书并下载
     */
    @GetMapping("/certificate/{userId}")
    public ResponseEntity<byte[]> generateCertificate(@PathVariable Long userId) {
        try {
            // 获取用户信息
            UserDto user = userService.findById(userId);
            if (user == null) {
                return new ResponseEntity<>(HttpStatus.NOT_FOUND);
            }
            
            // 生成证书
            byte[] certificateBytes = templateService.generateCertificateFromTemplate(user);
            
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            headers.setContentDispositionFormData("attachment", 
                    "certificate-" + user.getUsername() + ".pdf");
            
            return new ResponseEntity<>(certificateBytes, headers, HttpStatus.OK);
        } catch (Exception e) {
            return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
        }
    }
}

6.2 在线PDF预览实现

为了提供更好的用户体验，有时需要在浏览器中直接预览PDF，而不是下载到本地。下面是一个实现在线PDF预览的控制器：

@Controller
@RequestMapping("/pdf/view")
public class PdfViewController {

    private final DocumentService documentService;
    
    @Autowired
    public PdfViewController(DocumentService documentService) {
        this.documentService = documentService;
    }
    
    /**
     * 返回PDF预览页面
     */
    @GetMapping("/{documentId}")
    public String viewPdf(@PathVariable Long documentId, Model model) {
        // 检查文档是否存在
        if (!documentService.exists(documentId)) {
            return "error/404";
        }
        
        model.addAttribute("documentId", documentId);
        model.addAttribute("documentName", documentService.getName(documentId));
        return "pdf/viewer";  // 返回Thymeleaf模板
    }
    
    /**
     * 提供PDF数据的端点
     */
    @GetMapping("/data/{documentId}")
    @ResponseBody
    public ResponseEntity<byte[]> getPdfData(@PathVariable Long documentId) {
        try {
            byte[] pdfData = documentService.getPdfContent(documentId);
            if (pdfData == null) {
                return new ResponseEntity<>(HttpStatus.NOT_FOUND);
            }
            
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            // 使用inline而不是attachment，这样浏览器会直接显示PDF而不是下载
            headers.add("Content-Disposition", "inline; filename=document-" + documentId + ".pdf");
            
            return new ResponseEntity<>(pdfData, headers, HttpStatus.OK);
        } catch (Exception e) {
            return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
        }
    }
}

Thymeleaf模板（viewer.html）示例：

<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">
<head>
    <title th:text="${documentName} + ' - PDF预览'"></title>
    <meta charset="UTF-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.11.338/pdf.min.js"></script>
    <style>
        body { margin: 0; padding: 0; }
        #pdf-container { width: 100%; height: 100vh; overflow: auto; }
        #pdf-viewer { width: 100%; height: 100%; }
    </style>
</head>
<body>
    <div id="pdf-container">
        <iframe id="pdf-viewer" th:src="@{'/pdf/view/data/' + ${documentId}}" frameborder="0"></iframe>
    </div>
</body>
</html>

6.3 PDF文件上传和处理

在许多应用中，需要允许用户上传PDF文件并进行处理。以下是一个处理PDF上传的控制器示例：

@RestController
@RequestMapping("/api/pdf/upload")
public class PdfUploadController {

    private final PdfAnalysisService pdfAnalysisService;
    
    @Autowired
    public PdfUploadController(PdfAnalysisService pdfAnalysisService) {
        this.pdfAnalysisService = pdfAnalysisService;
    }
    
    /**
     * 处理PDF上传并分析内容
     */
    @PostMapping("/analyze")
    public ResponseEntity<?> uploadAndAnalyzePdf(@RequestParam("file") MultipartFile file) {
        // 检查文件是否为空
        if (file.isEmpty()) {
            return ResponseEntity.badRequest().body("请选择要上传的PDF文件");
        }
        
        // 检查文件类型
        if (!file.getContentType().equals("application/pdf")) {
            return ResponseEntity.badRequest().body("只支持PDF文件上传");
        }
        
        try {
            // 读取文件内容
            byte[] pdfBytes = file.getBytes();
            
            // 分析PDF内容
            PdfAnalysisResult result = pdfAnalysisService.analyzePdf(pdfBytes);
            
            return ResponseEntity.ok(result);
        } catch (IOException e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body("PDF处理失败: " + e.getMessage());
        }
    }
    
    /**
     * 上传并保存PDF文件
     */
    @PostMapping("/save")
    public ResponseEntity<?> uploadAndSavePdf(
            @RequestParam("file") MultipartFile file,
            @RequestParam("title") String title,
            @RequestParam("description") String description) {
        
        if (file.isEmpty()) {
            return ResponseEntity.badRequest().body("请选择要上传的PDF文件");
        }
        
        try {
            // 保存PDF文件并获取文档ID
            Long documentId = pdfAnalysisService.savePdfDocument(
                    file.getBytes(), 
                    file.getOriginalFilename(), 
                    title, 
                    description);
            
            Map<String, Object> response = new HashMap<>();
            response.put("success", true);
            response.put("documentId", documentId);
            response.put("message", "PDF文件上传成功");
            
            return ResponseEntity.ok(response);
        } catch (Exception e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body("PDF上传失败: " + e.getMessage());
        }
    }
}

6.4 批量PDF处理

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;

@RestController
@RequestMapping("/api/pdf/batch")
public class PdfBatchController {

    @Autowired
    private PdfBatchService pdfBatchService;
    
    // 临时目录用于保存上传的文件
    private final Path tempDir = Paths.get(System.getProperty("java.io.tmpdir"));
    
    @PostMapping("/merge")
    public ResponseEntity<?> mergePdfs(@RequestParam("files") MultipartFile[] files) {
        
        Map<String, Object> response = new HashMap<>();
        List<String> tempFilePaths = new ArrayList<>();
        
        try {
            // 检查文件
            if (files.length < 2) {
                response.put("success", false);
                response.put("message", "请至少提供两个PDF文件进行合并");
                return ResponseEntity.badRequest().body(response);
            }
            
            // 保存上传的文件到临时目录
            for (MultipartFile file : files) {
                // 验证文件类型
                if (!file.getContentType().equals("application/pdf")) {
                    response.put("success", false);
                    response.put("message", "文件 " + file.getOriginalFilename() + " 不是PDF格式");
                    return ResponseEntity.badRequest().body(response);
                }
                
                // 创建唯一文件名
                String tempFileName = UUID.randomUUID().toString() + ".pdf";
                Path tempFile = tempDir.resolve(tempFileName);
                
                // 保存文件
                Files.copy(file.getInputStream(), tempFile);
                tempFilePaths.add(tempFile.toString());
            }
            
            // 生成合并后的PDF文件名
            String outputFileName = "merged_" + UUID.randomUUID().toString() + ".pdf";
            Path outputPath = tempDir.resolve(outputFileName);
            
            // 执行合并
            pdfBatchService.mergePdfFiles(tempFilePaths, outputPath.toString());
            
            // 读取合并后的文件
            byte[] mergedPdfBytes = Files.readAllBytes(outputPath);
            
            // 清理临时文件
            for (String path : tempFilePaths) {
                Files.deleteIfExists(Paths.get(path));
            }
            Files.deleteIfExists(outputPath);
            
            // 设置响应
            response.put("success", true);
            response.put("message", "PDF文件已成功合并");
            response.put("mergedFileSize", mergedPdfBytes.length);
            
            // 可以返回合并后的文件作为Base64字符串(适用于小文件)
            response.put("mergedPdfBase64", 
                    java.util.Base64.getEncoder().encodeToString(mergedPdfBytes));
            
            return ResponseEntity.ok(response);
            
        } catch (Exception e) {
            // 清理临时文件
            for (String path : tempFilePaths) {
                try {
                    Files.deleteIfExists(Paths.get(path));
                } catch (IOException ex) {
                    // 忽略清理错误
                }
            }
            
            response.put("success", false);
            response.put("message", "合并PDF时出错: " + e.getMessage());
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(response);
        }
    }
}

批量处理服务：

import org.apache.pdfbox.multipdf.PDFMergerUtility;
import org.springframework.stereotype.Service;

import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.stream.Collectors;

@Service
public class PdfBatchService {

    // 用于异步处理的线程池
    private final ExecutorService executor = Executors.newFixedThreadPool(5);
    
    // 合并PDF文件
    public void mergePdfFiles(List<String> inputPaths, String outputPath) throws IOException {
        PDFMergerUtility merger = new PDFMergerUtility();
        merger.setDestinationFileName(outputPath);
        
        for (String path : inputPaths) {
            merger.addSource(new File(path));
        }
        
        merger.mergeDocuments(null);
    }
    
    // 异步处理多个PDF
    public CompletableFuture<List<String>> processMultiplePdfsAsync(
            List<String> inputPaths, String outputDir, String operation) {
        
        // 创建CompletableFuture列表
        List<CompletableFuture<String>> futures = inputPaths.stream()
            .map(path -> CompletableFuture.supplyAsync(() -> {
                try {
                    // 根据操作类型处理PDF
                    String outputPath = outputDir + File.separator 
                                      + new File(path).getName().replace(".pdf", "_processed.pdf");
                    
                    switch (operation) {
                        case "watermark":
                            addWatermark(path, outputPath, "机密文档");
                            break;
                        case "compress":
                            compressPdf(path, outputPath);
                            break;
                        case "encrypt":
                            encryptPdf(path, outputPath, "password123", "owner456");
                            break;
                        // 添加其他操作...
                        default:
                            throw new IllegalArgumentException("不支持的操作: " + operation);
                    }
                    
                    return outputPath;
                } catch (Exception e) {
                    throw new RuntimeException("处理文件失败: " + path, e);
                }
            }, executor))
            .collect(Collectors.toList());
        
        // 组合所有Future
        CompletableFuture<Void> allOf = CompletableFuture.allOf(
                futures.toArray(new CompletableFuture[0]));
        
        // 处理完成后收集结果
        return allOf.thenApply(v ->
            futures.stream()
                .map(CompletableFuture::join)
                .collect(Collectors.toList())
        );
    }
    
    // 添加水印方法
    private void addWatermark(String inputPath, String outputPath, String watermarkText) 
            throws IOException {
        // 实现与PdfWatermarkService类似
        // ...
    }
    
    // 压缩PDF方法
    private void compressPdf(String inputPath, String outputPath) throws IOException {
        // 实现PDF压缩
        // ...
    }
    
    // 加密PDF方法
    private void encryptPdf(String inputPath, String outputPath, 
                          String userPassword, String ownerPassword) throws IOException {
        // 实现与PdfEncryptionService类似
        // ...
    }
}

7. PDF安全性与高级功能

在处理PDF文档时，安全性是一个重要的考虑因素，尤其是在处理敏感信息时。同时，高级功能如水印、数字签名等可以为PDF文档添加更多实用功能。

7.1 PDF文档加密与权限控制

使用iText可以很容易地对PDF文档进行加密和设置权限：

@Service
public class PdfSecurityService {

    /**
     * 创建带密码保护的PDF
     */
    public byte[] createEncryptedPdf(String content, String userPassword, String ownerPassword) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfWriter writer = new PdfWriter(baos);
        PdfDocument pdf = new PdfDocument(writer);
        
        // 设置PDF文档的加密选项
        WriterProperties writerProperties = new WriterProperties();
        
        // 设置用户密码(打开文档需要)和所有者密码(修改权限需要)
        writerProperties.setStandardEncryption(
                userPassword.getBytes(),
                ownerPassword.getBytes(),
                EncryptionConstants.ALLOW_PRINTING, // 允许打印
                EncryptionConstants.ENCRYPTION_AES_128 // 使用AES 128位加密
        );
        
        writer.setWriterProperties(writerProperties);
        
        // 创建文档内容
        Document document = new Document(pdf);
        document.add(new Paragraph(content));
        document.close();
        
        return baos.toByteArray();
    }
    
    /**
     * 创建带权限控制的PDF
     */
    public byte[] createPermissionControlledPdf(String content, String ownerPassword, int permissions) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        WriterProperties writerProperties = new WriterProperties();
        writerProperties.setStandardEncryption(
                null, // 无用户密码
                ownerPassword.getBytes(),
                permissions, // 自定义权限
                EncryptionConstants.ENCRYPTION_AES_256 // 使用AES 256位加密
        );
        
        PdfWriter writer = new PdfWriter(baos, writerProperties);
        PdfDocument pdf = new PdfDocument(writer);
        Document document = new Document(pdf);
        document.add(new Paragraph(content));
        document.close();
        
        return baos.toByteArray();
    }
    
    /**
     * 检查PDF是否加密
     */
    public boolean isPdfEncrypted(byte[] pdfData) throws IOException {
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        PdfDocument pdf = new PdfDocument(reader);
        boolean isEncrypted = pdf.isEncrypted();
        pdf.close();
        return isEncrypted;
    }
}

7.2 添加水印

水印是一种常见的PDF高级功能，可以用来保护文档版权或标记文档状态：

@Service
public class PdfWatermarkService {

    /**
     * 添加文本水印
     */
    public byte[] addTextWatermark(byte[] originalPdf, String watermarkText) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(originalPdf));
        PdfWriter writer = new PdfWriter(baos);
        PdfDocument pdf = new PdfDocument(reader, writer);
        
        // 获取页数
        int numberOfPages = pdf.getNumberOfPages();
        
        // 创建透明的水印文本
        PdfFont font = PdfFontFactory.createFont(StandardFonts.HELVETICA);
        Paragraph watermark = new Paragraph(watermarkText)
                .setFont(font)
                .setFontSize(30)
                .setFontColor(new DeviceRgb(0.5f, 0.5f, 0.5f), 0.3f); // 灰色，30%不透明度
        
        // 在每一页添加水印
        for (int i = 1; i <= numberOfPages; i++) {
            PdfPage page = pdf.getPage(i);
            Rectangle pageSize = page.getPageSize();
            float x = (pageSize.getLeft() + pageSize.getRight()) / 2;
            float y = (pageSize.getBottom() + pageSize.getTop()) / 2;
            
            // 创建一个新的Canvas用于绘制水印
            PdfCanvas canvas = new PdfCanvas(page);
            canvas.saveState();
            
            // 应用旋转变换
            canvas.setFillColor(new DeviceRgb(0.5f, 0.5f, 0.5f));
            canvas.setExtGState(new PdfExtGState().setFillOpacity(0.3f));
            
            // 使用Canvas绘制文本
            Canvas watermarkCanvas = new Canvas(canvas, pdf, page.getPageSize());
            watermarkCanvas.showTextAligned(watermark, x, y, i, TextAlignment.CENTER, VerticalAlignment.MIDDLE, (float) Math.PI / 6);
            
            canvas.restoreState();
        }
        
        pdf.close();
        return baos.toByteArray();
    }
    
    /**
     * 添加图片水印
     */
    public byte[] addImageWatermark(byte[] originalPdf, byte[] imageData) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(originalPdf));
        PdfWriter writer = new PdfWriter(baos);
        PdfDocument pdf = new PdfDocument(reader, writer);
        
        // 转换图片数据为ImageData
        ImageData imageDataObj = ImageDataFactory.create(imageData);
        Image image = new Image(imageDataObj);
        
        // 设置图片大小和不透明度
        image.scaleAbsolute(100, 100);
        image.setOpacity(0.3f);
        
        // 在每一页添加水印
        int numberOfPages = pdf.getNumberOfPages();
        for (int i = 1; i <= numberOfPages; i++) {
            PdfPage page = pdf.getPage(i);
            Rectangle pageSize = page.getPageSize();
            
            // 计算图片位置(居中)
            float x = (pageSize.getLeft() + pageSize.getRight()) / 2 - image.getImageScaledWidth() / 2;
            float y = (pageSize.getBottom() + pageSize.getTop()) / 2 - image.getImageScaledHeight() / 2;
            
            // 创建Canvas并添加图片
            PdfCanvas canvas = new PdfCanvas(page);
            canvas.addImage(imageDataObj, x, y, false);
        }
        
        pdf.close();
        return baos.toByteArray();
    }
}

7.3 数字签名

数字签名是确保PDF文档完整性和真实性的重要手段：

@Service
public class PdfSignatureService {

    /**
     * 使用数字证书签署PDF
     */
    public byte[] signPdf(byte[] pdfData, KeyStore keystore, String alias, char[] password) throws Exception {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        PdfSigner signer = new PdfSigner(reader, baos, new StampingProperties());
        
        // 配置签名外观
        PdfSignatureAppearance appearance = signer.getSignatureAppearance();
        appearance.setReason("证明文档真实性")
                 .setLocation("北京")
                 .setSignatureCreator("PDF签名系统")
                 .setReuseAppearance(false);
        
        // 配置签名矩形区域（位于最后一页的左下角）
        Rectangle rect = new Rectangle(36, 36, 200, 50);
        appearance.setPageRect(rect)
                 .setPageNumber(reader.getNumberOfPages());
        
        // 设置签名信息
        PrivateKey pk = (PrivateKey) keystore.getKey(alias, password);
        Certificate[] chain = keystore.getCertificateChain(alias);
        
        // 创建签名者
        IExternalSignature pks = new PrivateKeySignature(pk, DigestAlgorithms.SHA256, null);
        IExternalDigest digest = new BouncyCastleDigest();
        
        // 执行签名
        signer.signDetached(digest, pks, chain, null, null, null, 0, PdfSigner.CryptoStandard.CMS);
        
        return baos.toByteArray();
    }
    
    /**
     * 创建自签名证书（仅用于测试）
     */
    public KeyStore createSelfSignedCertificate() throws Exception {
        // 生成密钥对
        KeyPairGenerator keyGen = KeyPairGenerator.getInstance("RSA");
        keyGen.initialize(2048);
        KeyPair keyPair = keyGen.generateKeyPair();
        
        // 创建自签名证书
        X509Certificate cert = generateSelfSignedCertificate(keyPair);
        
        // 创建KeyStore并存储证书
        KeyStore keyStore = KeyStore.getInstance(KeyStore.getDefaultType());
        keyStore.load(null, null);
        keyStore.setKeyEntry("pdf-signer", keyPair.getPrivate(), "password".toCharArray(), 
                new java.security.cert.Certificate[]{cert});
        
        return keyStore;
    }
    
    /**
     * 生成自签名证书
     */
    private X509Certificate generateSelfSignedCertificate(KeyPair keyPair) throws Exception {
        // 使用Bouncy Castle实现
        Security.addProvider(new org.bouncycastle.jce.provider.BouncyCastleProvider());
        
        // 当前时间
        long now = System.currentTimeMillis();
        
        // 证书有效期为1年
        Date startDate = new Date(now);
        Date endDate = new Date(now + 365 * 24 * 60 * 60 * 1000);
        
        // 证书序列号
        BigInteger serialNumber = BigInteger.valueOf(now);
        
        // 证书信息
        X500Name subject = new X500Name("CN=PDF Signer, O=Example Organization, L=Beijing, C=CN");
        
        // 证书生成
        X509v3CertificateBuilder builder = new JcaX509v3CertificateBuilder(
                subject, 
                serialNumber, 
                startDate, 
                endDate, 
                subject, 
                keyPair.getPublic()
        );
        
        // 签名算法
        ContentSigner contentSigner = new JcaContentSignerBuilder("SHA256WithRSAEncryption")
                .setProvider("BC").build(keyPair.getPrivate());
        
        // 生成证书
        X509CertificateHolder holder = builder.build(contentSigner);
        X509Certificate cert = new JcaX509CertificateConverter()
                .setProvider("BC").getCertificate(holder);
        
        return cert;
    }
    
    /**
     * 验证PDF签名
     */
    public List<SignatureVerificationResult> verifyPdfSignatures(byte[] pdfData) throws IOException {
        List<SignatureVerificationResult> results = new ArrayList<>();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        PdfDocument pdf = new PdfDocument(reader);
        
        SignatureUtil signUtil = new SignatureUtil(pdf);
        List<String> sigNames = signUtil.getSignatureNames();
        
        for (String name : sigNames) {
            PdfPKCS7 pkcs7 = signUtil.readSignatureData(name);
            
            // 获取签名时间
            Calendar cal = pkcs7.getSignDate();
            
            // 获取签名信息
            String reason = pkcs7.getReason();
            String location = pkcs7.getLocation();
            
            // 验证签名
            boolean isSignatureValid = false;
            boolean isDocumentModified = false;
            
            try {
                isSignatureValid = pkcs7.verifySignatureIntegrityAndAuthenticity();
                isDocumentModified = signUtil.signatureCoversWholeDocument(name);
            } catch (Exception e) {
                // 验证过程出错
            }
            
            // 记录验证结果
            SignatureVerificationResult result = new SignatureVerificationResult(
                    name, cal.getTime(), reason, location, isSignatureValid, !isDocumentModified);
            results.add(result);
        }
        
        pdf.close();
        return results;
    }
    
    // 签名验证结果类
    public static class SignatureVerificationResult {
        private String name;
        private Date date;
        private String reason;
        private String location;
        private boolean valid;
        private boolean modified;
        
        // 构造函数、getter和setter省略
        
        public SignatureVerificationResult(String name, Date date, String reason, String location, 
                                           boolean valid, boolean modified) {
            this.name = name;
            this.date = date;
            this.reason = reason;
            this.location = location;
            this.valid = valid;
            this.modified = modified;
        }
    }
}

7.4 PDF表单与交互功能

PDF表单允许创建交互式文档，用户可以填写并提交这些表单：

@Service
public class PdfFormService {

    /**
     * 创建包含表单的PDF
     */
    public byte[] createPdfWithForm() throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfWriter writer = new PdfWriter(baos);
        PdfDocument pdf = new PdfDocument(writer);
        Document document = new Document(pdf);
        
        // 添加标题
        document.add(new Paragraph("用户注册表单").setFontSize(20).setBold());
        document.add(new Paragraph("请填写以下信息：").setFontSize(12));
        document.add(new Paragraph("\n"));
        
        // 创建表单
        PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
        
        // 姓名字段
        document.add(new Paragraph("姓名:"));
        Rectangle nameRect = new Rectangle(100, 700, 200, 20);
        PdfTextFormField nameField = PdfTextFormField.createText(pdf, nameRect, "name", "");
        form.addField(nameField);
        
        // 邮箱字段
        document.add(new Paragraph("邮箱:"));
        Rectangle emailRect = new Rectangle(100, 650, 200, 20);
        PdfTextFormField emailField = PdfTextFormField.createText(pdf, emailRect, "email", "");
        form.addField(emailField);
        
        // 性别单选按钮
        document.add(new Paragraph("性别:"));
        // 创建单选按钮组
        PdfButtonFormField genderGroup = PdfFormField.createRadioGroup(pdf, "gender", "");
        
        // 男性选项
        Rectangle maleRect = new Rectangle(100, 600, 20, 20);
        PdfFormField male = PdfFormField.createRadioButton(pdf, maleRect, genderGroup, "男");
        form.addField(male);
        document.add(new Paragraph("男").setFixedPosition(125, 600, 50));
        
        // 女性选项
        Rectangle femaleRect = new Rectangle(160, 600, 20, 20);
        PdfFormField female = PdfFormField.createRadioButton(pdf, femaleRect, genderGroup, "女");
        form.addField(female);
        document.add(new Paragraph("女").setFixedPosition(185, 600, 50));
        
        form.addField(genderGroup);
        
        // 兴趣复选框
        document.add(new Paragraph("兴趣爱好:"));
        
        // 阅读选项
        Rectangle readingRect = new Rectangle(100, 550, 20, 20);
        PdfFormField reading = PdfFormField.createCheckBox(pdf, readingRect, "reading", "Yes", PdfFormField.TYPE_CHECK);
        form.addField(reading);
        document.add(new Paragraph("阅读").setFixedPosition(125, 550, 50));
        
        // 旅行选项
        Rectangle travelRect = new Rectangle(180, 550, 20, 20);
        PdfFormField travel = PdfFormField.createCheckBox(pdf, travelRect, "travel", "Yes", PdfFormField.TYPE_CHECK);
        form.addField(travel);
        document.add(new Paragraph("旅行").setFixedPosition(205, 550, 50));
        
        // 音乐选项
        Rectangle musicRect = new Rectangle(260, 550, 20, 20);
        PdfFormField music = PdfFormField.createCheckBox(pdf, musicRect, "music", "Yes", PdfFormField.TYPE_CHECK);
        form.addField(music);
        document.add(new Paragraph("音乐").setFixedPosition(285, 550, 50));
        
        // 提交按钮
        Rectangle submitRect = new Rectangle(100, 500, 100, 30);
        PdfButtonFormField submit = PdfFormField.createPushButton(pdf, submitRect, "submit", "提交");
        submit.setAction(PdfAction.createSubmitForm("/submit-form", null, PdfAction.SUBMIT_HTML_FORMAT, 0));
        form.addField(submit);
        
        document.close();
        return baos.toByteArray();
    }
    
    /**
     * 从提交的表单中提取数据
     */
    public Map<String, Object> extractFormData(byte[] pdfData) throws IOException {
        Map<String, Object> formData = new HashMap<>();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        PdfDocument pdf = new PdfDocument(reader);
        PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, false);
        
        if (form != null) {
            // 获取所有表单字段
            Map<String, PdfFormField> fields = form.getFormFields();
            
            // 提取每个字段的值
            for (Map.Entry<String, PdfFormField> entry : fields.entrySet()) {
                String fieldName = entry.getKey();
                PdfFormField field = entry.getValue();
                
                // 根据字段类型处理不同的表单元素
                if (field.getFormType() == PdfName.Tx) {
                    // 文本字段
                    formData.put(fieldName, field.getValueAsString());
                } else if (field.getFormType() == PdfName.Btn) {
                    // 按钮(复选框或单选按钮)
                    if (field.isCheckBox()) {
                        formData.put(fieldName, "Yes".equals(field.getValueAsString()));
                    } else if (field.isRadioButton()) {
                        formData.put(fieldName, field.getValueAsString());
                    }
                } else if (field.getFormType() == PdfName.Ch) {
                    // 选择字段(下拉列表或列表框)
                    formData.put(fieldName, field.getValueAsString());
                }
            }
        }
        
        pdf.close();
        return formData;
    }
    
    /**
     * 填充PDF表单
     */
    public byte[] fillPdfForm(byte[] pdfTemplate, Map<String, Object> formData) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfTemplate));
        PdfWriter writer = new PdfWriter(baos);
        PdfDocument pdf = new PdfDocument(reader, writer);
        PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
        
        // 设置表单为不可更改
        form.setNeedAppearances(false);
        
        // 填充表单字段
        for (Map.Entry<String, Object> entry : formData.entrySet()) {
            String fieldName = entry.getKey();
            Object value = entry.getValue();
            
            if (form.getField(fieldName) != null) {
                if (value instanceof String) {
                    form.getField(fieldName).setValue((String) value);
                } else if (value instanceof Boolean) {
                    boolean checked = (Boolean) value;
                    form.getField(fieldName).setValue(checked ? "Yes" : "Off");
                }
            }
        }
        
        // 设置所有字段为只读
        form.flattenFields();
        
        pdf.close();
        return baos.toByteArray();
    }
}

通过上述示例代码，我们演示了如何实现PDF文档的安全性控制（加密与权限控制）、添加水印、数字签名以及创建和处理PDF表单等高级功能。这些功能可根据实际应用需求进行组合和定制，以满足不同的业务场景需求。

8. PDF处理最佳实践

在SpringBoot应用中处理PDF文件，遵循一些最佳实践可以让您的应用程序更加高效、安全和易于维护。

8.1 性能优化

处理PDF文件，特别是大型PDF文件时，性能是一个重要的考虑因素。

8.1.1 内存管理

@Service
public class PdfMemoryOptimizationService {

    /**
     * 高效处理大型PDF文件
     */
    public void processLargePdf(String inputPath, String outputPath) throws IOException {
        // 使用RandomAccessFile而不是将整个文件加载到内存中
        RandomAccessFile raf = new RandomAccessFile(new File(inputPath), "r");
        FileChannel channel = raf.getChannel();
        
        // 使用内存映射文件来高效访问大文件
        ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
        
        // 处理PDF文件...
        PDFParser parser = new PDFParser(new RandomAccessBufferedFileInputStream(new ByteBufferInputStream(buf)));
        parser.parse();
        PDDocument document = parser.getPDDocument();
        
        // 按页处理，而不是一次加载所有页面
        int pageCount = document.getNumberOfPages();
        for (int i = 0; i < pageCount; i++) {
            PDPage page = document.getPage(i);
            // 处理每一页...
            processPage(page);
            
            // 处理完后清理页面资源，释放内存
            page.clear();
            
            // 定期调用垃圾回收(生产环境一般不推荐，这里仅作示例)
            if (i % 100 == 0) {
                System.gc();
            }
        }
        
        // 保存处理后的文档
        document.save(outputPath);
        document.close();
        channel.close();
        raf.close();
    }
    
    private void processPage(PDPage page) {
        // 页面处理逻辑...
    }
    
    /**
     * 使用PDFBox内存设置优化
     */
    public void configureMemorySettings() {
        // 设置最大主内存缓存大小(字节)
        System.setProperty("org.apache.pdfbox.maxMemory", String.valueOf(50 * 1024 * 1024)); // 50MB
        
        // 设置临时文件目录
        System.setProperty("java.io.tmpdir", "/path/to/temp/directory");
        
        // 禁用字体缓存(减少内存使用，但可能影响性能)
        System.setProperty("org.apache.pdfbox.fontcache.disablenew", "true");
    }
}

8.1.2 并行处理

@Service
public class PdfParallelProcessingService {

    private final ExecutorService executor = Executors.newFixedThreadPool(
            Runtime.getRuntime().availableProcessors());
    
    /**
     * 并行处理多个PDF文件
     */
    public List<CompletableFuture<ProcessingResult>> processPdfFilesInParallel(List<String> filePaths) {
        return filePaths.stream()
                .map(path -> CompletableFuture.supplyAsync(() -> {
                    try {
                        // 处理单个PDF文件
                        return processSinglePdf(path);
                    } catch (Exception e) {
                        throw new CompletionException(e);
                    }
                }, executor))
                .collect(Collectors.toList());
    }
    
    /**
     * 并行处理单个PDF的多个页面
     */
    public ProcessingResult processMultiPagePdfInParallel(String pdfPath) throws IOException {
        PDDocument document = PDDocument.load(new File(pdfPath));
        int pageCount = document.getNumberOfPages();
        
        List<CompletableFuture<PageResult>> futures = new ArrayList<>();
        
        // 并行处理每一页
        for (int i = 0; i < pageCount; i++) {
            final int pageNum = i;
            futures.add(CompletableFuture.supplyAsync(() -> {
                try {
                    PDPage page = document.getPage(pageNum);
                    return processPageContent(page, pageNum);
                } catch (Exception e) {
                    throw new CompletionException(e);
                }
            }, executor));
        }
        
        // 等待所有页面处理完成
        List<PageResult> results = futures.stream()
                .map(CompletableFuture::join)
                .collect(Collectors.toList());
        
        document.close();
        
        return new ProcessingResult(pdfPath, results);
    }
    
    private ProcessingResult processSinglePdf(String path) throws IOException {
        // 单个PDF文件处理逻辑
        // ...
        return new ProcessingResult(path, new ArrayList<>());
    }
    
    private PageResult processPageContent(PDPage page, int pageNum) throws IOException {
        // 单页处理逻辑
        // ...
        return new PageResult(pageNum, "Processed");
    }
    
    // 结果类
    @Data
    @AllArgsConstructor
    public static class ProcessingResult {
        private String filePath;
        private List<PageResult> pageResults;
    }
    
    @Data
    @AllArgsConstructor
    public static class PageResult {
        private int pageNumber;
        private String result;
    }
}

8.2 异常处理与日志

良好的异常处理和日志记录对于排查PDF处理问题至关重要。

@Service
@Slf4j  // 使用Lombok的日志注解
public class PdfProcessingService {

    /**
     * 处理PDF文件，包含完善的异常处理和日志
     */
    public ProcessingResult processPdf(String inputPath) {
        log.info("开始处理PDF文件: {}", inputPath);
        
        PDDocument document = null;
        ProcessingResult result = new ProcessingResult();
        result.setFilePath(inputPath);
        
        try {
            // 验证文件存在
            File file = new File(inputPath);
            if (!file.exists() || !file.isFile()) {
                throw new FileNotFoundException("找不到PDF文件: " + inputPath);
            }
            
            log.debug("文件验证通过，开始加载PDF");
            
            // 加载文档
            try {
                document = PDDocument.load(file);
            } catch (InvalidPasswordException e) {
                log.error("PDF文件受密码保护: {}", inputPath, e);
                result.setStatus(ProcessingStatus.PASSWORD_PROTECTED);
                return result;
            } catch (IOException e) {
                log.error("无法加载PDF文件: {}", inputPath, e);
                result.setStatus(ProcessingStatus.LOAD_ERROR);
                result.setErrorMessage("无法加载PDF文件: " + e.getMessage());
                return result;
            }
            
            // 检查是否为空文档
            if (document.getNumberOfPages() <= 0) {
                log.warn("PDF文件不包含任何页面: {}", inputPath);
                result.setStatus(ProcessingStatus.EMPTY_DOCUMENT);
                return result;
            }
            
            log.info("成功加载PDF，共{}页", document.getNumberOfPages());
            
            // 处理文档内容
            try {
                processDocumentContent(document, result);
                result.setStatus(ProcessingStatus.SUCCESS);
                log.info("PDF文件处理成功: {}", inputPath);
            } catch (Exception e) {
                log.error("处理PDF内容时出错: {}", inputPath, e);
                result.setStatus(ProcessingStatus.PROCESSING_ERROR);
                result.setErrorMessage("处理内容出错: " + e.getMessage());
            }
            
        } catch (Exception e) {
            log.error("处理PDF时发生未预期的异常: {}", inputPath, e);
            result.setStatus(ProcessingStatus.UNEXPECTED_ERROR);
            result.setErrorMessage("未预期的错误: " + e.getMessage());
        } finally {
            // 确保资源释放
            if (document != null) {
                try {
                    document.close();
                    log.debug("PDF文档已关闭");
                } catch (IOException e) {
                    log.warn("关闭PDF文档时出错", e);
                }
            }
        }
        
        return result;
    }
    
    private void processDocumentContent(PDDocument document, ProcessingResult result) {
        // 文档处理逻辑...
    }
    
    // 处理结果类
    @Data
    public static class ProcessingResult {
        private String filePath;
        private ProcessingStatus status;
        private String errorMessage;
        private Map<String, Object> extractedData = new HashMap<>();
    }
    
    // 处理状态枚举
    public enum ProcessingStatus {
        SUCCESS, 
        PASSWORD_PROTECTED, 
        LOAD_ERROR, 
        EMPTY_DOCUMENT, 
        PROCESSING_ERROR, 
        UNEXPECTED_ERROR
    }
}

8.3 安全性建议

8.3.1 文件上传安全性

@Service
public class PdfSecurityService {

    // 允许的最大PDF文件大小
    private static final long MAX_FILE_SIZE = 10 * 1024 * 1024; // 10MB
    
    // 文件类型验证
    public boolean isValidPdfFile(MultipartFile file) {
        // 检查文件大小
        if (file.getSize() > MAX_FILE_SIZE) {
            throw new FileValidationException("文件大小超过限制");
        }
        
        // 检查内容类型
        String contentType = file.getContentType();
        if (contentType == null || !contentType.equals("application/pdf")) {
            throw new FileValidationException("文件类型必须是PDF");
        }
        
        // 检查文件扩展名
        String originalFilename = file.getOriginalFilename();
        if (originalFilename == null || !originalFilename.toLowerCase().endsWith(".pdf")) {
            throw new FileValidationException("文件必须是.pdf格式");
        }
        
        // 检查文件内容(魔术字节)
        try (InputStream is = file.getInputStream()) {
            byte[] header = new byte[5];
            int bytesRead = is.read(header);
            if (bytesRead < 5 || !new String(header).equals("%PDF-")) {
                throw new FileValidationException("无效的PDF文件内容");
            }
        } catch (IOException e) {
            throw new FileValidationException("无法验证文件内容");
        }
        
        return true;
    }
    
    // 安全处理上传的PDF
    public File securelyProcessUploadedPdf(MultipartFile file) throws IOException {
        // 验证文件
        isValidPdfFile(file);
        
        // 创建一个临时文件
        File tempFile = File.createTempFile("secure-pdf-", ".pdf");
        
        try (FileOutputStream fos = new FileOutputStream(tempFile)) {
            // 将上传的文件内容写入临时文件
            fos.write(file.getBytes());
        }
        
        // 扫描文件是否包含恶意内容
        scanForMaliciousContent(tempFile);
        
        return tempFile;
    }
    
    // 扫描恶意内容
    private void scanForMaliciousContent(File pdfFile) throws IOException {
        try (PDDocument document = PDDocument.load(pdfFile)) {
            // 检查JavaScript
            if (hasJavaScript(document)) {
                throw new SecurityException("PDF包含可能不安全的JavaScript");
            }
            
            // 检查外部链接
            if (hasExternalLinks(document)) {
                // 可以选择警告而不是阻止
                log.warn("PDF包含外部链接: {}", pdfFile.getName());
            }
            
            // 检查嵌入式文件
            if (hasEmbeddedFiles(document)) {
                throw new SecurityException("PDF包含嵌入式文件，可能存在安全风险");
            }
            
            // 更多安全检查...
        }
    }
    
    // 检查JavaScript
    private boolean hasJavaScript(PDDocument document) {
        PDDocumentCatalog catalog = document.getDocumentCatalog();
        PDAcroForm acroForm = catalog.getAcroForm();
        if (acroForm != null) {
            // 检查表单中的JavaScript
            // ...
            return false; // 假设实现
        }
        return false;
    }
    
    // 检查外部链接
    private boolean hasExternalLinks(PDDocument document) {
        // 遍历页面和注释，检查外部URL
        // ...
        return false; // 假设实现
    }
    
    // 检查嵌入式文件
    private boolean hasEmbeddedFiles(PDDocument document) {
        PDDocumentNameDictionary names = document.getDocumentCatalog().getNames();
        if (names != null) {
            PDEmbeddedFilesNameTreeNode embeddedFiles = names.getEmbeddedFiles();
            return embeddedFiles != null && !embeddedFiles.getNames().isEmpty();
        }
        return false;
    }
    
    // 自定义验证异常
    public static class FileValidationException extends RuntimeException {
        public FileValidationException(String message) {
            super(message);
        }
    }
}

8.3.2 敏感信息保护

@Service
public class PdfDataProtectionService {

    // 添加敏感信息水印
    public byte[] addConfidentialWatermark(byte[] pdfData) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        PdfStamper stamper = new PdfStamper(reader, baos);
        
        int pageCount = reader.getNumberOfPages();
        BaseFont baseFont = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
        
        for (int i = 1; i <= pageCount; i++) {
            PdfContentByte content = stamper.getUnderContent(i);
            content.beginText();
            content.setFontAndSize(baseFont, 60);
            content.setColorFill(BaseColor.LIGHT_GRAY);
            content.setTextMatrix(30, 30);
            content.showTextAligned(Element.ALIGN_CENTER, "机密文件 - 请勿传播", 
                                  reader.getPageSize(i).getWidth()/2, 
                                  reader.getPageSize(i).getHeight()/2, 45);
            content.endText();
        }
        
        stamper.close();
        reader.close();
        
        return baos.toByteArray();
    }
    
    // 文档脱敏
    public byte[] redactSensitiveInformation(byte[] pdfData, List<String> patternsToRedact) 
            throws IOException {
        // 注意：真正的PDF编辑和脱敏需要更复杂的处理
        // 这里仅做示例
        
        // 1. 提取文本
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        int pageCount = reader.getNumberOfPages();
        
        // 2. 遍历每页，查找并标记匹配的模式
        List<PDFRedactionInfo> redactions = new ArrayList<>();
        
        for (int i = 1; i <= pageCount; i++) {
            String pageText = PdfTextExtractor.getTextFromPage(reader, i);
            
            for (String pattern : patternsToRedact) {
                Pattern regex = Pattern.compile(pattern);
                Matcher matcher = regex.matcher(pageText);
                
                while (matcher.find()) {
                    // 这里需要实际位置信息，简化版只记录页码
                    redactions.add(new PDFRedactionInfo(i, matcher.start(), matcher.end()));
                }
            }
        }
        reader.close();
        
        // 3. 应用脱敏
        // 实际实现需要使用PDFBox的PDFRedactor或iText的PdfCleanUp
        // 下面只是概念演示
        
        // 创建脱敏后的PDF
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        // ... 复杂的脱敏处理 ...
        
        return baos.toByteArray();
    }
    
    // 保存敏感PDF时加密
    public byte[] encryptForStorage(byte[] pdfData) throws DocumentException, IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        PdfStamper stamper = new PdfStamper(reader, baos);
        
        // 生成随机密码
        String password = generateSecureRandomPassword(16);
        
        // 应用强加密，仅允许打开和打印
        stamper.setEncryption(password.getBytes(), 
                            password.getBytes(), 
                            PdfWriter.ALLOW_PRINTING, 
                            PdfWriter.ENCRYPTION_AES_256);
        
        stamper.close();
        reader.close();
        
        // 注意：在实际应用中，需要安全地存储密码
        storePasswordSecurely(password);
        
        return baos.toByteArray();
    }
    
    private String generateSecureRandomPassword(int length) {
        SecureRandom random = new SecureRandom();
        String chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()";
        
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < length; i++) {
            int randomIndex = random.nextInt(chars.length());
            sb.append(chars.charAt(randomIndex));
        }
        
        return sb.toString();
    }
    
    private void storePasswordSecurely(String password) {
        // 实际应用中，应使用安全的密钥管理系统
        // 例如：HashiCorp Vault, AWS KMS等
    }
    
    // 脱敏信息记录类
    @Data
    @AllArgsConstructor
    private static class PDFRedactionInfo {
        private int pageNumber;
        private int startPosition;
        private int endPosition;
    }
}

8.4 测试PDF处理功能

编写全面的测试对于确保PDF处理功能的正确性和可靠性至关重要。

@SpringBootTest
public class PdfGenerationServiceTest {

    @Autowired
    private PdfGenerationService pdfService;
    
    @TempDir
    Path tempDir;
    
    @Test
    public void testGenerateSimplePdf() throws Exception {
        // 安排
        String outputPath = tempDir.resolve("test-output.pdf").toString();
        
        // 执行
        pdfService.generateSimplePdf(outputPath);
        
        // 断言
        File outputFile = new File(outputPath);
        assertTrue(outputFile.exists(), "生成的PDF文件应该存在");
        assertTrue(outputFile.length() > 0, "PDF文件不应为空");
        
        // 验证PDF内容
        PDDocument document = PDDocument.load(outputFile);
        assertEquals(1, document.getNumberOfPages(), "PDF应该有1页");
        
        // 验证文本内容
        PDFTextStripper stripper = new PDFTextStripper();
        String text = stripper.getText(document);
        assertTrue(text.contains("Hello World"), "PDF应包含预期文本");
        
        document.close();
    }
    
    @Test
    public void testGeneratePdfWithTable() throws Exception {
        // 安排
        String outputPath = tempDir.resolve("table-output.pdf").toString();
        List<UserDto> users = Arrays.asList(
            new UserDto(1L, "张三", "admin"),
            new UserDto(2L, "李四", "user"),
            new UserDto(3L, "王五", "editor")
        );
        
        // 执行
        pdfService.generatePdfWithTable(outputPath, users);
        
        // 断言
        File outputFile = new File(outputPath);
        assertTrue(outputFile.exists());
        
        // 验证内容(表格验证比较复杂，这里只做基本检查)
        PDDocument document = PDDocument.load(outputFile);
        assertTrue(document.getNumberOfPages() > 0);
        
        // 验证文本是否包含用户名
        PDFTextStripper stripper = new PDFTextStripper();
        String text = stripper.getText(document);
        assertTrue(text.contains("张三"));
        assertTrue(text.contains("李四"));
        assertTrue(text.contains("王五"));
        
        document.close();
    }
    
    @Test
    public void testPdfGeneration_withInvalidInput_shouldThrowException() {
        // 安排
        String outputPath = tempDir.resolve("invalid-output.pdf").toString();
        
        // 断言
        assertThrows(IllegalArgumentException.class, () -> {
            // 执行
            pdfService.generatePdfWithInvalidInput(outputPath);
        });
    }
}

8.5 部署最佳实践

@Configuration
public class PdfServiceConfig {

    @Bean
    public PdfGenerationService pdfGenerationService(
            @Value("${pdf.fonts.directory:/app/fonts}") String fontsDirectory,
            @Value("${pdf.output.directory:/app/output}") String outputDirectory) {
        
        // 验证目录存在
        File fontsDir = new File(fontsDirectory);
        if (!fontsDir.exists()) {
            fontsDir.mkdirs();
        }
        
        File outputDir = new File(outputDirectory);
        if (!outputDir.exists()) {
            outputDir.mkdirs();
        }
        
        // 返回配置好的服务
        return new PdfGenerationService(fontsDirectory, outputDirectory);
    }
    
    @Bean
    public PdfProcessingTaskExecutor pdfTaskExecutor(
            @Value("${pdf.processing.thread-pool-size:4}") int threadPoolSize,
            @Value("${pdf.processing.queue-capacity:100}") int queueCapacity) {
        
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(threadPoolSize);
        executor.setMaxPoolSize(threadPoolSize * 2);
        executor.setQueueCapacity(queueCapacity);
        executor.setThreadNamePrefix("pdf-proc-");
        executor.initialize();
        
        return new PdfProcessingTaskExecutor(executor);
    }
    
    // PDF处理健康检查
    @Bean
    public HealthIndicator pdfServiceHealthIndicator(PdfGenerationService pdfService) {
        return () -> {
            try {
                // 尝试生成一个简单的PDF来验证服务是否正常
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                pdfService.generateTestPdf(baos);
                
                if (baos.size() > 0) {
                    return Health.up().build();
                } else {
                    return Health.down()
                            .withDetail("reason", "PDF生成输出为空")
                            .build();
                }
            } catch (Exception e) {
                return Health.down()
                        .withDetail("reason", "PDF生成失败")
                        .withDetail("error", e.getMessage())
                        .build();
            }
        };
    }
}

8.6 错误处理与恢复机制

@Service
@Slf4j
public class PdfProcessingErrorHandler {

    /**
     * 处理PDF文件，包含重试机制
     */
    @Retryable(
        value = {IOException.class, TemporaryPdfProcessingException.class},
        maxAttempts = 3,
        backoff = @Backoff(delay = 1000, multiplier = 2)
    )
    public ProcessingResult processPdfWithRetry(String pdfPath) throws IOException {
        log.info("尝试处理PDF文件: {}", pdfPath);
        
        // PDF处理逻辑，可能抛出异常
        return doProcessPdf(pdfPath);
    }
    
    /**
     * 重试失败后的恢复处理
     */
    @Recover
    public ProcessingResult recoverFromFailure(Exception e, String pdfPath) {
        log.error("处理PDF文件失败，无法恢复: {}", pdfPath, e);
        
        // 创建失败结果
        ProcessingResult result = new ProcessingResult();
        result.setStatus(ProcessingStatus.FAILED_WITH_RECOVERY);
        result.setMessage("处理失败: " + e.getMessage());
        
        // 记录故障
        recordFailure(pdfPath, e);
        
        // 发送警报
        sendAlert(pdfPath, e);
        
        return result;
    }
    
    /**
     * 尝试修复损坏的PDF
     */
    public byte[] attemptToRepairCorruptedPdf(byte[] corruptedPdfData) {
        ByteArrayOutputStream repairedOutput = new ByteArrayOutputStream();
        
        try {
            // 使用PDFBox的修复功能尝试恢复
            PDFParser parser = new PDFParser(new RandomAccessBufferedFileInputStream(
                                             new ByteArrayInputStream(corruptedPdfData)));
            parser.setLenient(true); // 宽容模式
            parser.parse();
            
            PDDocument document = parser.getPDDocument();
            
            // 添加空白页(如果文档为空)
            if (document.getNumberOfPages() == 0) {
                document.addPage(new PDPage());
            }
            
            // 保存修复后的文档
            document.save(repairedOutput);
            document.close();
            
            log.info("成功修复损坏的PDF文件");
            return repairedOutput.toByteArray();
            
        } catch (Exception e) {
            log.error("无法修复损坏的PDF", e);
            // 如果仍然无法修复，返回null或抛出异常
            return null;
        }
    }
    
    /**
     * 记录文件处理故障
     */
    private void recordFailure(String pdfPath, Exception e) {
        // 将故障信息记录到数据库或日志系统
        PdfProcessingFailure failure = new PdfProcessingFailure();
        failure.setFilePath(pdfPath);
        failure.setTimestamp(new Date());
        failure.setErrorMessage(e.getMessage());
        failure.setStackTrace(ExceptionUtils.getStackTrace(e));
        
        // 保存到数据库
        // pdfFailureRepository.save(failure);
    }
    
    /**
     * 发送警报
     */
    private void sendAlert(String pdfPath, Exception e) {
        // 当处理重要文件失败时发送警报
        // alertService.sendAlert("PDF处理失败", "文件 " + pdfPath + " 处理失败: " + e.getMessage());
    }
    
    /**
     * 实际的PDF处理逻辑
     */
    private ProcessingResult doProcessPdf(String pdfPath) throws IOException {
        // 实现PDF处理逻辑...
        return new ProcessingResult();
    }
    
    // 临时处理异常(可重试)
    public static class TemporaryPdfProcessingException extends RuntimeException {
        public TemporaryPdfProcessingException(String message) {
            super(message);
        }
    }
    
    // 处理结果类
    @Data
    public static class ProcessingResult {
        private ProcessingStatus status;
        private String message;
        // 其他字段...
    }
    
    // 处理状态枚举
    public enum ProcessingStatus {
        SUCCESS, FAILED, FAILED_WITH_RECOVERY
    }
    
    // 故障记录实体
    @Data
    public static class PdfProcessingFailure {
        private String filePath;
        private Date timestamp;
        private String errorMessage;
        private String stackTrace;
    }
}

9. 常见问题与解决方案

在使用SpringBoot处理PDF文件时，开发人员经常会遇到各种问题。本节整理了最常见的问题及其解决方案，帮助您快速解决开发中遇到的困难。

9.1 乱码与字体问题

中文或特殊字符显示为乱码是PDF处理中最常见的问题之一。

问题：PDF中中文显示为方框或乱码

原因：默认情况下，很多PDF库使用的标准字体不支持中文字符。

解决方案：

// 使用iText解决中文显示问题
public byte[] generatePdfWithChineseText() throws IOException, DocumentException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    
    // 创建文档
    Document document = new Document();
    PdfWriter.getInstance(document, baos);
    document.open();
    
    // 方法1：使用中文字体(需要字体文件)
    BaseFont baseFont = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED);
    Font chineseFont = new Font(baseFont, 12, Font.NORMAL);
    
    document.add(new Paragraph("这是中文内容", chineseFont));
    
    // 方法2：使用嵌入字体(会增加文件大小)
    String fontPath = "path/to/fonts/msyh.ttf"; // 微软雅黑字体
    BaseFont customFont = BaseFont.createFont(fontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
    Font embeddedFont = new Font(customFont, 12, Font.NORMAL);
    
    document.add(new Paragraph("这是使用嵌入字体的中文", embeddedFont));
    
    document.close();
    return baos.toByteArray();
}

// 使用PDFBox解决中文显示问题
public void addChineseTextWithPdfBox(String pdfPath) throws IOException {
    // 创建文档
    PDDocument document = new PDDocument();
    PDPage page = new PDPage();
    document.addPage(page);
    
    // 加载中文字体
    PDType0Font font = PDType0Font.load(document, new File("path/to/fonts/msyh.ttf"));
    
    // 创建内容流
    PDPageContentStream contentStream = new PDPageContentStream(document, page);
    
    // 设置字体
    contentStream.beginText();
    contentStream.setFont(font, 12);
    contentStream.newLineAtOffset(25, 700);
    contentStream.showText("这是PDFBox生成的中文内容");
    contentStream.endText();
    
    contentStream.close();
    document.save(pdfPath);
    document.close();
}

最佳实践：

在应用中包含常用的中文字体
使用字体子集嵌入减小文件大小
创建字体工厂类管理和复用字体实例

9.2 图像处理问题

问题：图片在PDF中模糊或变形

原因：图片DPI设置不正确，或缩放比例不当。

解决方案：

public void addHighQualityImage(Document document, String imagePath) throws IOException, DocumentException {
    // 加载图片
    Image image = Image.getInstance(imagePath);
    
    // 设置适当的DPI
    image.setDpi(300, 300);
    
    // 保持原始宽高比
    float width = document.getPageSize().getWidth() - 80; // 左右各40点边距
    float aspectRatio = image.getWidth() / image.getHeight();
    float height = width / aspectRatio;
    
    // 限制高度不超过页面高度的2/3
    float maxHeight = document.getPageSize().getHeight() * 2/3;
    if (height > maxHeight) {
        height = maxHeight;
        width = height * aspectRatio;
    }
    
    // 设置大小并保持比例
    image.scaleToFit(width, height);
    
    // 居中显示
    image.setAlignment(Image.MIDDLE);
    
    document.add(image);
}

问题：PDF文件大小过大

原因：图片未压缩或使用了无损格式。

解决方案：

public byte[] compressImage(byte[] imageData, String format) throws IOException {
    ByteArrayInputStream bais = new ByteArrayInputStream(imageData);
    BufferedImage image = ImageIO.read(bais);
    
    // 创建输出流
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    
    // 对于JPEG格式，设置压缩质量
    if ("jpg".equalsIgnoreCase(format) || "jpeg".equalsIgnoreCase(format)) {
        Iterator<ImageWriter> writers = ImageIO.getImageWritersByFormatName(format);
        if (writers.hasNext()) {
            ImageWriter writer = writers.next();
            ImageWriteParam param = writer.getDefaultWriteParam();
            
            param.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
            param.setCompressionQuality(0.7f); // 70%质量，调整以平衡大小和质量
            
            ImageOutputStream ios = ImageIO.createImageOutputStream(baos);
            writer.setOutput(ios);
            writer.write(null, new IIOImage(image, null, null), param);
            ios.close();
            writer.dispose();
        }
    } else {
        // 对于其他格式，使用默认压缩
        ImageIO.write(image, format, baos);
    }
    
    return baos.toByteArray();
}

9.3 表单与交互性问题

问题：填充PDF表单后字段值不显示

原因：表单需要重新计算外观，或者字体不兼容。

解决方案：

public byte[] fillPdfForm(byte[] templateBytes, Map<String, String> formData) throws IOException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    
    // 打开PDF模板
    PdfReader reader = new PdfReader(new ByteArrayInputStream(templateBytes));
    PdfStamper stamper = new PdfStamper(reader, baos);
    
    // 获取表单
    AcroFields form = stamper.getAcroFields();
    
    // 设置表单需要重新计算外观
    stamper.setFormFlattening(true);
    form.setGenerateAppearances(true);
    
    // 添加中文字体支持
    BaseFont bf = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED);
    form.addSubstitutionFont(bf);
    
    // 填充表单字段
    for (Map.Entry<String, String> entry : formData.entrySet()) {
        form.setField(entry.getKey(), entry.getValue());
    }
    
    // 关闭文档
    stamper.close();
    reader.close();
    
    return baos.toByteArray();
}

问题：无法在PDF中添加交互式元素

解决方案：

public byte[] createInteractivePdf() throws IOException, DocumentException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    
    // 创建文档
    Document document = new Document();
    PdfWriter writer = PdfWriter.getInstance(document, baos);
    document.open();
    
    // 添加正文内容
    document.add(new Paragraph("这是一个包含交互元素的PDF"));
    document.add(new Paragraph("请点击下方链接或按钮："));
    document.add(new Paragraph(" "));
    
    // 添加超链接
    Anchor anchor = new Anchor("点击访问官网");
    anchor.setReference("https://www.example.com");
    document.add(anchor);
    document.add(new Paragraph(" "));
    
    // 创建一个按钮
    Rectangle rect = new Rectangle(100, 100, 200, 130);
    PushbuttonField button = new PushbuttonField(writer, rect, "submitButton");
    button.setText("提交表单");
    button.setBackgroundColor(new BaseColor(0, 122, 204));
    button.setTextColor(BaseColor.WHITE);
    button.setVisibility(PushbuttonField.VISIBLE);
    
    // 为按钮添加JavaScript动作
    button.setAction(PdfAction.javaScript(
        "app.alert('按钮被点击了!');", writer));
    
    // 添加按钮到文档
    PdfFormField field = button.getField();
    writer.addAnnotation(field);
    
    document.close();
    return baos.toByteArray();
}

9.4 性能与内存问题

问题：处理大型PDF文件时内存溢出(OutOfMemoryError)

原因：整个PDF文件被一次性加载到内存中。

解决方案：

public void processLargePdfMemoryEfficient(String inputPath, String outputPath) throws IOException {
    // 1. 使用随机访问文件而不是将整个文件读入内存
    RandomAccessFile raf = new RandomAccessFile(new File(inputPath), "r");
    RandomAccessBufferedFileInputStream input = new RandomAccessBufferedFileInputStream(raf);
    
    // 2. 解析PDF文件
    PDFParser parser = new PDFParser(input);
    parser.parse();
    PDDocument document = parser.getPDDocument();
    
    // 3. 按页处理，而不是一次处理所有页面
    PDFRenderer renderer = new PDFRenderer(document);
    PDPageTree pages = document.getPages();
    
    // 创建输出文档
    PDDocument outputDocument = new PDDocument();
    
    // 4. 逐页处理，释放资源
    for (int i = 0; i < pages.getCount(); i++) {
        // 处理当前页
        PDPage page = pages.get(i);
        
        // 执行页面处理
        // 例如：提取文本、修改内容等
        
        // 添加处理后的页面到新文档
        PDPage newPage = new PDPage(page.getMediaBox());
        outputDocument.addPage(newPage);
        
        // 复制内容（简化示例）
        PDPageContentStream contentStream = new PDPageContentStream(
                outputDocument, newPage, PDPageContentStream.AppendMode.OVERWRITE, true);
        
        // ...处理和复制内容...
        
        contentStream.close();
        
        // 定期清理，每处理10页执行一次垃圾回收
        if (i % 10 == 0) {
            System.gc();
        }
    }
    
    // 5. 保存并关闭文档
    outputDocument.save(outputPath);
    outputDocument.close();
    document.close();
    input.close();
    raf.close();
}

问题：PDF处理速度慢

解决方案：

// 多线程并行处理PDF
public void parallelPdfProcessing(List<String> pdfPaths) {
    int parallelism = Math.min(Runtime.getRuntime().availableProcessors(), pdfPaths.size());
    ExecutorService executor = Executors.newFixedThreadPool(parallelism);
    
    try {
        // 创建任务列表
        List<Future<ProcessingResult>> futures = new ArrayList<>();
        
        for (String path : pdfPaths) {
            futures.add(executor.submit(() -> processSinglePdf(path)));
        }
        
        // 收集结果
        for (Future<ProcessingResult> future : futures) {
            try {
                ProcessingResult result = future.get();
                // 处理结果...
                System.out.println("处理完成: " + result.getFilePath());
            } catch (Exception e) {
                // 处理异常...
                e.printStackTrace();
            }
        }
    } finally {
        executor.shutdown();
    }
}

private ProcessingResult processSinglePdf(String path) {
    // 单个PDF处理逻辑
    // ...
    return new ProcessingResult(path, true);
}

@Data
@AllArgsConstructor
private static class ProcessingResult {
    private String filePath;
    private boolean success;
}

9.5 安全问题

问题：如何防止PDF注入攻击

解决方案：

public void securePdfGeneration(String content, String outputPath) {
    // 1. 内容验证和清理
    content = cleanContent(content);
    
    try {
        // 2. 创建PDF
        Document document = new Document();
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputPath));
        
        // 3. 禁用JavaScript
        writer.setEncryption(null, null, 
                        PdfWriter.ALLOW_PRINTING | PdfWriter.ALLOW_COPY, 
                        PdfWriter.ENCRYPTION_AES_128);
        
        document.open();
        document.add(new Paragraph(content));
        document.close();
        
    } catch (Exception e) {
        throw new SecurityException("PDF生成失败", e);
    }
}

private String cleanContent(String content) {
    // 清理潜在的恶意内容
    // 例如：移除JavaScript代码、限制特殊字符等
    
    // 简单示例：移除<script>标签
    content = content.replaceAll("(?i)<script.*?>.*?</script>", "");
    
    // 移除可能的PDF注入命令 (例如: %PDF-, startxref, xref, trailer等)
    content = content.replaceAll("(?i)(%PDF-|startxref|xref|trailer)", "");
    
    // 应用更复杂的内容清理逻辑...
    
    return content;
}

问题：如何安全地处理上传的PDF文件

解决方案：

@Service
public class SecurePdfUploadService {

    public boolean validateAndProcessPdfUpload(MultipartFile file) throws IOException {
        // 1. 验证MIME类型
        if (!file.getContentType().equals("application/pdf")) {
            throw new SecurityException("只接受PDF文件");
        }
        
        // 2. 验证文件扩展名
        String filename = file.getOriginalFilename();
        if (filename == null || !filename.toLowerCase().endsWith(".pdf")) {
            throw new SecurityException("文件必须是PDF格式");
        }
        
        // 3. 检查文件大小
        if (file.getSize() > 10 * 1024 * 1024) { // 10MB
            throw new SecurityException("PDF文件大小不能超过10MB");
        }
        
        // 4. 验证PDF文件头
        byte[] content = file.getBytes();
        if (content.length < 5 || !isPdfHeader(content)) {
            throw new SecurityException("无效的PDF文件格式");
        }
        
        // 5. 扫描PDF内容是否安全
        if (!scanPdfForThreats(content)) {
            throw new SecurityException("PDF文件可能包含恶意内容");
        }
        
        // 6. 处理文件内容
        return processPdfContent(content);
    }
    
    private boolean isPdfHeader(byte[] content) {
        // 检查PDF文件头 (%PDF-)
        String header = new String(Arrays.copyOf(content, 5));
        return header.equals("%PDF-");
    }
    
    private boolean scanPdfForThreats(byte[] content) {
        try {
            PDDocument document = PDDocument.load(new ByteArrayInputStream(content));
            
            // 检查是否包含JavaScript
            boolean hasJavaScript = checkForJavaScript(document);
            
            // 检查是否包含外部链接
            boolean hasExternalLinks = checkForExternalLinks(document);
            
            // 检查是否包含嵌入式文件
            boolean hasEmbeddedFiles = checkForEmbeddedFiles(document);
            
            document.close();
            
            // 根据安全策略决定是否安全
            return !hasJavaScript && !hasEmbeddedFiles; // 外部链接可能允许
            
        } catch (Exception e) {
            // 解析失败，可能是损坏或恶意文件
            return false;
        }
    }
    
    private boolean checkForJavaScript(PDDocument document) {
        // 检查文档中的JavaScript代码
        // ...
        return false; // 示例返回
    }
    
    private boolean checkForExternalLinks(PDDocument document) {
        // 检查外部链接
        // ...
        return false; // 示例返回
    }
    
    private boolean checkForEmbeddedFiles(PDDocument document) {
        // 检查嵌入文件
        PDDocumentCatalog catalog = document.getDocumentCatalog();
        PDDocumentNameDictionary names = catalog.getNames();
        if (names != null) {
            PDEmbeddedFilesNameTreeNode embeddedFiles = names.getEmbeddedFiles();
            return embeddedFiles != null && !embeddedFiles.getNames().isEmpty();
        }
        return false;
    }
    
    private boolean processPdfContent(byte[] content) {
        // 安全地处理PDF内容
        // ...
        return true; // 处理成功
    }
}

9.6 布局与分页问题

问题：内容跨页不正确或分页不合理

解决方案：

public void createDocumentWithProperPagination(String outputPath) throws IOException, DocumentException {
    Document document = new Document(PageSize.A4, 50, 50, 50, 50);
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputPath));
    
    // 添加分页事件监听器
    writer.setPageEvent(new PaginationHandler());
    
    document.open();
    
    // 设置合适的字体和段落间距
    Font normalFont = new Font(Font.FontFamily.TIMES_ROMAN, 12);
    Font headingFont = new Font(Font.FontFamily.TIMES_ROMAN, 16, Font.BOLD);
    
    // 添加标题
    Paragraph title = new Paragraph("文档标题", headingFont);
    title.setAlignment(Element.ALIGN_CENTER);
    title.setSpacingAfter(20);
    document.add(title);
    
    // 添加内容段落
    for (int i = 1; i <= 5; i++) {
        Paragraph heading = new Paragraph("章节 " + i, headingFont);
        heading.setSpacingBefore(20);
        heading.setSpacingAfter(10);
        
        // 确保章节标题不会独自出现在页脚
        heading.setKeepTogether(true);
        
        document.add(heading);
        
        // 添加段落
        for (int j = 1; j <= 3; j++) {
            Paragraph para = new Paragraph("这是第" + i + "章节的第" + j + "个段落。" +
                "这是示例文本，用于展示分页效果。这是示例文本，用于展示分页效果。" +
                "这是示例文本，用于展示分页效果。", normalFont);
            
            para.setAlignment(Element.ALIGN_JUSTIFIED);
            para.setSpacingAfter(10);
            document.add(para);
        }
        
        // 添加表格(确保表格不会被拆分到两页)
        if (i == 3) {
            PdfPTable table = new PdfPTable(3);
            table.setWidthPercentage(100);
            table.setKeepTogether(true); // 保持表格不被分页
            
            // 添加表头
            table.addCell(new PdfPCell(new Phrase("列 1", headingFont)));
            table.addCell(new PdfPCell(new Phrase("列 2", headingFont)));
            table.addCell(new PdfPCell(new Phrase("列 3", headingFont)));
            
            // 添加表格数据
            for (int k = 1; k <= 5; k++) {
                table.addCell("数据 " + k + "-1");
                table.addCell("数据 " + k + "-2");
                table.addCell("数据 " + k + "-3");
            }
            
            table.setSpacingBefore(15);
            table.setSpacingAfter(15);
            document.add(table);
        }
    }
    
    document.close();
}

// 分页处理类
private static class PaginationHandler extends PdfPageEventHelper {
    @Override
    public void onEndPage(PdfWriter writer, Document document) {
        PdfContentByte cb = writer.getDirectContent();
        
        // 添加页码
        String pageText = "第 " + writer.getPageNumber() + " 页";
        
        // 设置字体
        cb.beginText();
        cb.setFontAndSize(BaseFont.createFont(), 10);
        
        // 在页脚居中显示页码
        float x = (document.right() - document.left()) / 2 + document.leftMargin();
        float y = document.bottom() - 20;
        cb.showTextAligned(PdfContentByte.ALIGN_CENTER, pageText, x, y, 0);
        
        cb.endText();
        
        // 如果需要，添加页眉
        String headerText = "文档标题";
        cb.beginText();
        cb.setFontAndSize(BaseFont.createFont(), 10);
        cb.showTextAligned(PdfContentByte.ALIGN_CENTER, headerText, x, document.top() + 10, 0);
        cb.endText();
    }
}

9.7 PDF转换问题

问题：如何将HTML转换为PDF

解决方案：

public byte[] convertHtmlToPdf(String htmlContent) throws IOException, DocumentException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    
    // 使用iText的XMLWorkerHelper
    Document document = new Document();
    PdfWriter writer = PdfWriter.getInstance(document, baos);
    document.open();
    
    // 转换HTML为PDF
    XMLWorkerHelper.getInstance().parseXHtml(writer, document,
            new ByteArrayInputStream(htmlContent.getBytes(StandardCharsets.UTF_8)));
    
    document.close();
    
    return baos.toByteArray();
}

// 更复杂的HTML转PDF(使用Flying Saucer)
@Service
public class HtmlToPdfService {

    public byte[] convertHtmlToPdfWithCss(String htmlContent, String baseUrl) throws IOException {
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        
        try {
            // 准备HTML内容
            String xHtml = convertToXhtml(htmlContent);
            
            // 创建渲染器
            ITextRenderer renderer = new ITextRenderer();
            
            // 设置字体解析器(支持中文)
            renderer.getFontResolver().addFont("fonts/simsun.ttc", 
                    BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
            
            // 设置基础URL，用于解析相对路径的资源(CSS、图片等)
            if (baseUrl != null) {
                renderer.setDocumentFromString(xHtml, baseUrl);
            } else {
                renderer.setDocumentFromString(xHtml);
            }
            
            // 布局文档
            renderer.layout();
            
            // 渲染PDF
            renderer.createPDF(outputStream);
            
            return outputStream.toByteArray();
            
        } finally {
            outputStream.close();
        }
    }
    
    private String convertToXhtml(String html) {
        // 转换普通HTML为XHTML
        Tidy tidy = new Tidy();
        tidy.setInputEncoding("UTF-8");
        tidy.setOutputEncoding("UTF-8");
        tidy.setXHTML(true);
        
        ByteArrayInputStream inputStream = new ByteArrayInputStream(
                html.getBytes(StandardCharsets.UTF_8));
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        
        tidy.parse(inputStream, outputStream);
        
        return outputStream.toString(StandardCharsets.UTF_8);
    }
}

问题：如何将PDF转换为图片

解决方案：

public List<BufferedImage> convertPdfToImages(byte[] pdfData, int dpi) throws IOException {
    List<BufferedImage> images = new ArrayList<>();
    
    // 加载PDF文档
    PDDocument document = PDDocument.load(new ByteArrayInputStream(pdfData));
    
    try {
        // 创建PDF渲染器
        PDFRenderer renderer = new PDFRenderer(document);
        
        // 逐页转换为图片
        for (int i = 0; i < document.getNumberOfPages(); i++) {
            // 渲染图片(RGB模式，指定DPI)
            BufferedImage image = renderer.renderImageWithDPI(i, dpi, ImageType.RGB);
            images.add(image);
        }
    } finally {
        document.close();
    }
    
    return images;
}

// 保存为图片文件
public void savePdfPagesAsImages(byte[] pdfData, String outputDir, String format) 
        throws IOException {
    List<BufferedImage> images = convertPdfToImages(pdfData, 300);
    
    // 确保输出目录存在
    File dir = new File(outputDir);
    if (!dir.exists()) {
        dir.mkdirs();
    }
    
    // 保存每一页为单独的图片文件
    for (int i = 0; i < images.size(); i++) {
        BufferedImage image = images.get(i);
        
        File outputFile = new File(dir, "page_" + (i + 1) + "." + format);
        ImageIO.write(image, format, outputFile);
    }
}

9.8 Spring Boot 集成问题

问题：如何在Spring Boot中优雅地处理PDF生成失败

解决方案：

@RestController
@RequestMapping("/api/pdf")
public class PdfController {

    private final PdfService pdfService;
    private final Logger logger = LoggerFactory.getLogger(PdfController.class);
    
    @Autowired
    public PdfController(PdfService pdfService) {
        this.pdfService = pdfService;
    }
    
    @GetMapping("/generate/{id}")
    public ResponseEntity<?> generatePdf(@PathVariable Long id) {
        try {
            // 尝试生成PDF
            byte[] pdfData = pdfService.generatePdf(id);
            
            // 设置响应头
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            String filename = "document-" + id + ".pdf";
            headers.setContentDispositionFormData("attachment", filename);
            
            return new ResponseEntity<>(pdfData, headers, HttpStatus.OK);
            
        } catch (ResourceNotFoundException e) {
            // 资源不存在
            logger.warn("尝试生成不存在的资源PDF: {}", id);
            return ResponseEntity.status(HttpStatus.NOT_FOUND)
                    .body(new ErrorResponse("资源不存在", e.getMessage()));
                    
        } catch (PdfGenerationException e) {
            // PDF生成错误
            logger.error("PDF生成失败: {}", e.getMessage(), e);
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body(new ErrorResponse("PDF生成错误", e.getMessage()));
                    
        } catch (Exception e) {
            // 未预期的错误
            logger.error("处理PDF请求时发生未预期错误", e);
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body(new ErrorResponse("系统错误", "处理请求时发生错误"));
        }
    }
    
    // 自定义异常
    public static class PdfGenerationException extends RuntimeException {
        public PdfGenerationException(String message) {
            super(message);
        }
        
        public PdfGenerationException(String message, Throwable cause) {
            super(message, cause);
        }
    }
    
    // 错误响应DTO
    @Data
    @AllArgsConstructor
    public static class ErrorResponse {
        private String error;
        private String message;
    }
}

// 全局异常处理器
@ControllerAdvice
public class GlobalExceptionHandler {

    private final Logger logger = LoggerFactory.getLogger(GlobalExceptionHandler.class);
    
    @ExceptionHandler(PdfController.PdfGenerationException.class)
    public ResponseEntity<PdfController.ErrorResponse> handlePdfGenerationException(
            PdfController.PdfGenerationException e) {
        
        logger.error("PDF生成异常被全局处理器捕获", e);
        
        return ResponseEntity
                .status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(new PdfController.ErrorResponse("PDF生成错误", e.getMessage()));
    }
}

try {
    // 创建PDF渲染器
    PDFRenderer renderer = new PDFRenderer(document);
    
    // 逐页转换为图片
    for (int i = 0; i < document.getNumberOfPages(); i++) {
        // 渲染图片(RGB模式，指定DPI)
        BufferedImage image = renderer.renderImageWithDPI(i, dpi, ImageType.RGB);
        images.add(image);
    }
} finally {
    document.close();
}

return images;

}

// 保存为图片文件
public void savePdfPagesAsImages(byte[] pdfData, String outputDir, String format)
throws IOException {
List images = convertPdfToImages(pdfData, 300);

// 确保输出目录存在
File dir = new File(outputDir);
if (!dir.exists()) {
    dir.mkdirs();
}

// 保存每一页为单独的图片文件
for (int i = 0; i < images.size(); i++) {
    BufferedImage image = images.get(i);
    
    File outputFile = new File(dir, "page_" + (i + 1) + "." + format);
    ImageIO.write(image, format, outputFile);
}

}


### 9.8 Spring Boot 集成问题

#### 问题：如何在Spring Boot中优雅地处理PDF生成失败

**解决方案**：

```java
@RestController
@RequestMapping("/api/pdf")
public class PdfController {

    private final PdfService pdfService;
    private final Logger logger = LoggerFactory.getLogger(PdfController.class);
    
    @Autowired
    public PdfController(PdfService pdfService) {
        this.pdfService = pdfService;
    }
    
    @GetMapping("/generate/{id}")
    public ResponseEntity<?> generatePdf(@PathVariable Long id) {
        try {
            // 尝试生成PDF
            byte[] pdfData = pdfService.generatePdf(id);
            
            // 设置响应头
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            String filename = "document-" + id + ".pdf";
            headers.setContentDispositionFormData("attachment", filename);
            
            return new ResponseEntity<>(pdfData, headers, HttpStatus.OK);
            
        } catch (ResourceNotFoundException e) {
            // 资源不存在
            logger.warn("尝试生成不存在的资源PDF: {}", id);
            return ResponseEntity.status(HttpStatus.NOT_FOUND)
                    .body(new ErrorResponse("资源不存在", e.getMessage()));
                    
        } catch (PdfGenerationException e) {
            // PDF生成错误
            logger.error("PDF生成失败: {}", e.getMessage(), e);
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body(new ErrorResponse("PDF生成错误", e.getMessage()));
                    
        } catch (Exception e) {
            // 未预期的错误
            logger.error("处理PDF请求时发生未预期错误", e);
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body(new ErrorResponse("系统错误", "处理请求时发生错误"));
        }
    }
    
    // 自定义异常
    public static class PdfGenerationException extends RuntimeException {
        public PdfGenerationException(String message) {
            super(message);
        }
        
        public PdfGenerationException(String message, Throwable cause) {
            super(message, cause);
        }
    }
    
    // 错误响应DTO
    @Data
    @AllArgsConstructor
    public static class ErrorResponse {
        private String error;
        private String message;
    }
}

// 全局异常处理器
@ControllerAdvice
public class GlobalExceptionHandler {

    private final Logger logger = LoggerFactory.getLogger(GlobalExceptionHandler.class);
    
    @ExceptionHandler(PdfController.PdfGenerationException.class)
    public ResponseEntity<PdfController.ErrorResponse> handlePdfGenerationException(
            PdfController.PdfGenerationException e) {
        
        logger.error("PDF生成异常被全局处理器捕获", e);
        
        return ResponseEntity
                .status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(new PdfController.ErrorResponse("PDF生成错误", e.getMessage()));
    }
}

通过以上解决方案，您应该能够解决在SpringBoot应用中处理PDF时遇到的大多数常见问题。如果遇到更复杂的情况，可能需要结合多种技术和方法，或者考虑使用专门的PDF处理服务。

SpringBoot中PDF处理完全指南

文章目录

1. PDF基础知识

1.1 什么是PDF

1.2 PDF文件结构

2. SpringBoot中的PDF处理库

2.1 iText

2.2 Apache PDFBox

2.3 OpenPDF

2.4 JasperReports

2.5 选择哪个库？

3. 生成PDF文件

3.1 使用iText生成PDF

基本PDF文档生成

添加表格

添加图片

3.2 使用Apache PDFBox生成PDF

基本文档生成

添加表格(PDFBox中较为复杂)

3.3 使用OpenPDF生成PDF

3.4 在SpringBoot控制器中生成并下载PDF

4. 读取与解析PDF

4.1 使用PDFBox读取PDF文本

4.2 使用iText解析PDF

4.3 从PDF中提取表格数据

5. 修改现有PDF文件

5.1 添加水印和页码

使用iText添加水印

使用PDFBox添加页码

5.2 合并多个PDF文件

使用PDFBox合并PDF

使用iText合并PDF

5.3 分割PDF文件

5.4 加密与解密PDF

5.5 删除和重新排序页面

5.6 填充PDF表单

6. Web应用中的PDF处理

6.1 PDF下载功能实现

6.2 在线PDF预览实现

6.3 PDF文件上传和处理

6.4 批量PDF处理

7. PDF安全性与高级功能

7.1 PDF文档加密与权限控制

7.2 添加水印

7.3 数字签名

7.4 PDF表单与交互功能

8. PDF处理最佳实践

8.1 性能优化

8.1.1 内存管理

8.1.2 并行处理

8.2 异常处理与日志

8.3 安全性建议

8.3.1 文件上传安全性

8.3.2 敏感信息保护

8.4 测试PDF处理功能

8.5 部署最佳实践

8.6 错误处理与恢复机制

9. 常见问题与解决方案

9.1 乱码与字体问题

问题：PDF中中文显示为方框或乱码

9.2 图像处理问题

问题：图片在PDF中模糊或变形

问题：PDF文件大小过大

9.3 表单与交互性问题

问题：填充PDF表单后字段值不显示

问题：无法在PDF中添加交互式元素

9.4 性能与内存问题

问题：处理大型PDF文件时内存溢出(OutOfMemoryError)

问题：PDF处理速度慢

9.5 安全问题

问题：如何防止PDF注入攻击

问题：如何安全地处理上传的PDF文件

9.6 布局与分页问题

问题：内容跨页不正确或分页不合理

9.7 PDF转换问题

问题：如何将HTML转换为PDF

问题：如何将PDF转换为图片

9.8 Spring Boot 集成问题

问题：如何在Spring Boot中优雅地处理PDF生成失败

网站公告