SpringBoot中PDF处理完全指南

发布于:2025-05-01 ⋅ 阅读:(34) ⋅ 点赞:(0)

文章目录

1. PDF基础知识

1.1 什么是PDF

PDF(Portable Document Format,便携式文档格式)是由Adobe公司开发的一种电子文件格式,旨在独立于应用软件、硬件和操作系统,呈现文档的固定布局。PDF具有以下特点:

  • 跨平台兼容性:可以在任何操作系统上查看,保持相同的外观
  • 文档完整性:包含文本、图像、表格、字体等所有文档元素
  • 紧凑性:支持多种压缩技术
  • 安全性:可以设置密码和权限
  • 交互性:支持超链接、表单、多媒体等交互元素

1.2 PDF文件结构

PDF文件包含四个主要部分:

  1. 头部(Header):标识PDF版本
  2. 主体(Body):包含文档内容(文本、图像等)
  3. 交叉引用表(Cross-reference Table):提供文档对象位置的索引
  4. 尾部(Trailer):包含指向交叉引用表的指针和其他对象的引用

了解这些基础概念对于理解PDF操作库的工作原理很有帮助。

2. SpringBoot中的PDF处理库

在SpringBoot应用中处理PDF文件,有几个流行的Java库可供选择:

2.1 iText

iText是一个功能强大的PDF处理库,适用于生成、修改和分析PDF文档。

Maven依赖:

<!-- iText核心库 -->
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itextpdf</artifactId>
    <version>5.5.13.3</version>
</dependency>

<!-- iText 7 (更新版本) -->
<dependency>
    <groupId>com.itextpdf</groupId>
    <artifactId>itext7-core</artifactId>
    <version>7.2.5</version>
    <type>pom</type>
</dependency>

注意:iText有开源版本(AGPL许可)和商业版本。在商业项目中使用前,请确认许可证要求。

2.2 Apache PDFBox

Apache PDFBox是Apache软件基金会的开源PDF库,功能全面,许可证更加开放(Apache License 2.0)。

Maven依赖:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.27</version>
</dependency>

2.3 OpenPDF

OpenPDF是iText 4.2.0的开源继承者,提供了更灵活的许可证(LGPL/MPL)。

Maven依赖:

<dependency>
    <groupId>com.github.librepdf</groupId>
    <artifactId>openpdf</artifactId>
    <version>1.3.30</version>
</dependency>

2.4 JasperReports

JasperReports是一个用于生成PDF报表的高级库,特别适合复杂报表的生成。

Maven依赖:

<dependency>
    <groupId>net.sf.jasperreports</groupId>
    <artifactId>jasperreports</artifactId>
    <version>6.20.0</version>
</dependency>

2.5 选择哪个库?

  • iText: 功能最全面,适合需要高度自定义的场景,但许可限制需要注意
  • Apache PDFBox: 开源友好,适合基本PDF操作,API相对低级
  • OpenPDF: 适合需要iText功能但关注许可问题的项目
  • JasperReports: 最适合复杂报表生成,学习曲线较陡

3. 生成PDF文件

3.1 使用iText生成PDF

基本PDF文档生成
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;

import java.io.FileOutputStream;
import java.io.IOException;

@Service
public class PdfGenerationService {

    public void generateSimplePdf(String outputPath) throws DocumentException, IOException {
        // 创建文档
        Document document = new Document();
        
        // 创建PdfWriter实例
        PdfWriter.getInstance(document, new FileOutputStream(outputPath));
        
        // 打开文档
        document.open();
        
        // 添加内容
        document.add(new Paragraph("Hello World! 这是我用iText生成的第一个PDF文档。"));
        document.add(new Paragraph("PDF生成时间: " + new java.util.Date()));
        
        // 关闭文档
        document.close();
        
        System.out.println("PDF已创建: " + outputPath);
    }
}
添加表格
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Phrase;
import com.itextpdf.text.pdf.PdfPCell;
import com.itextpdf.text.pdf.PdfPTable;
import com.itextpdf.text.pdf.PdfWriter;

import java.io.FileOutputStream;
import java.io.IOException;

@Service
public class PdfTableService {

    public void generatePdfWithTable(String outputPath) throws DocumentException, IOException {
        Document document = new Document();
        PdfWriter.getInstance(document, new FileOutputStream(outputPath));
        
        document.open();
        
        // 添加一个段落
        document.add(new Paragraph("用户数据表"));
        
        // 创建表格(3列)
        PdfPTable table = new PdfPTable(3);
        // 设置表格宽度百分比
        table.setWidthPercentage(100);
        // 设置列宽比例
        table.setWidths(new float[]{2, 5, 3});
        
        // 添加表头
        addTableHeader(table);
        
        // 添加行数据
        addTableRows(table);
        
        // 将表格添加到文档
        document.add(table);
        
        document.close();
    }
    
    private void addTableHeader(PdfPTable table) {
        PdfPCell header = new PdfPCell();
        header.setBackgroundColor(BaseColor.LIGHT_GRAY);
        header.setBorderWidth(2);
        header.setHorizontalAlignment(Element.ALIGN_CENTER);
        
        header.setPhrase(new Phrase("ID"));
        table.addCell(header);
        
        header.setPhrase(new Phrase("姓名"));
        table.addCell(header);
        
        header.setPhrase(new Phrase("角色"));
        table.addCell(header);
    }
    
    private void addTableRows(PdfPTable table) {
        // 第一行
        table.addCell("1001");
        table.addCell("张三");
        table.addCell("管理员");
        
        // 第二行
        table.addCell("1002");
        table.addCell("李四");
        table.addCell("用户");
        
        // 第三行
        table.addCell("1003");
        table.addCell("王五");
        table.addCell("审核员");
    }
}
添加图片
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;

import java.io.FileOutputStream;
import java.io.IOException;

@Service
public class PdfImageService {

    public void generatePdfWithImage(String outputPath, String imagePath) 
            throws DocumentException, IOException {
        Document document = new Document();
        PdfWriter.getInstance(document, new FileOutputStream(outputPath));
        
        document.open();
        
        // 添加文本
        document.add(new Paragraph("包含图片的PDF文档"));
        
        // 添加图片
        Image image = Image.getInstance(imagePath);
        // 缩放图片
        image.scaleToFit(400, 300);
        // 设置图片位置(居中)
        image.setAlignment(Image.MIDDLE);
        
        document.add(image);
        
        // 在图片下添加说明
        document.add(new Paragraph("图1: 示例图片"));
        
        document.close();
    }
}

3.2 使用Apache PDFBox生成PDF

基本文档生成
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType0Font;

import java.io.File;
import java.io.IOException;

@Service
public class PdfBoxService {

    public void generateSimplePdf(String outputPath) throws IOException {
        // 创建新文档
        PDDocument document = new PDDocument();
        
        // 添加空白页
        PDPage page = new PDPage();
        document.addPage(page);
        
        // 创建内容流以添加内容
        PDPageContentStream contentStream = new PDPageContentStream(document, page);
        
        // 开始文本操作
        contentStream.beginText();
        
        // 设置字体和大小
        // 使用带有中文支持的字体
        PDType0Font font = PDType0Font.load(document, 
                new File("src/main/resources/fonts/SimSun.ttf"));
        contentStream.setFont(font, 12);
        
        // 设置文本位置(从页面左下角计算,单位是点)
        contentStream.newLineAtOffset(25, 700);
        
        // 添加文本
        contentStream.showText("Hello World! 这是我用PDFBox生成的PDF文档。");
        
        // 移动到下一行
        contentStream.newLineAtOffset(0, -15);
        contentStream.showText("PDF生成时间: " + new java.util.Date());
        
        // 结束文本操作
        contentStream.endText();
        
        // 关闭内容流
        contentStream.close();
        
        // 保存文档
        document.save(outputPath);
        
        // 关闭文档
        document.close();
        
        System.out.println("PDFBox已创建PDF: " + outputPath);
    }
}
添加表格(PDFBox中较为复杂)
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.font.PDType0Font;

import java.io.File;
import java.io.IOException;

@Service
public class PdfBoxTableService {

    public void generatePdfWithTable(String outputPath) throws IOException {
        // 创建文档
        PDDocument document = new PDDocument();
        PDPage page = new PDPage();
        document.addPage(page);
        
        PDPageContentStream contentStream = new PDPageContentStream(document, page);
        
        // 加载字体
        PDType0Font font = PDType0Font.load(document, 
                new File("src/main/resources/fonts/SimSun.ttf"));
        
        // 表格内容
        String[][] content = {
                {"ID", "姓名", "角色"},
                {"1001", "张三", "管理员"},
                {"1002", "李四", "用户"},
                {"1003", "王五", "审核员"}
        };
        
        // 表格位置和尺寸
        float margin = 50;
        float y = page.getMediaBox().getHeight() - margin;
        float tableWidth = page.getMediaBox().getWidth() - 2 * margin;
        
        // 绘制标题
        contentStream.beginText();
        contentStream.setFont(font, 16);
        contentStream.newLineAtOffset(margin, y);
        contentStream.showText("用户数据表");
        contentStream.endText();
        
        y -= 30;
        
        // 计算每列宽度
        final int rows = content.length;
        final int cols = content[0].length;
        final float rowHeight = 20f;
        final float tableHeight = rowHeight * rows;
        final float colWidth = tableWidth / (float)cols;
        
        // 画表格
        // 表格外框
        contentStream.setLineWidth(1f);
        contentStream.addRect(margin, y - tableHeight, tableWidth, tableHeight);
        contentStream.stroke();
        
        // 画横线
        for(int i = 0; i < rows; i++) {
            contentStream.addLine(margin, y - i * rowHeight, 
                    margin + tableWidth, y - i * rowHeight);
        }
        contentStream.stroke();
        
        // 画竖线
        for(int i = 0; i <= cols; i++) {
            contentStream.addLine(margin + i * colWidth, y, 
                    margin + i * colWidth, y - tableHeight);
        }
        contentStream.stroke();
        
        // 添加文本
        contentStream.setFont(font, 12);
        
        // 表头使用粗体
        float textx = margin + 5;
        float texty = y - 15;
        
        for(int i = 0; i < rows; i++) {
            for(int j = 0; j < cols; j++) {
                contentStream.beginText();
                contentStream.newLineAtOffset(textx + j * colWidth, texty - i * rowHeight);
                contentStream.showText(content[i][j]);
                contentStream.endText();
            }
        }
        
        contentStream.close();
        document.save(outputPath);
        document.close();
    }
}

3.3 使用OpenPDF生成PDF

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;

import java.io.FileOutputStream;
import java.io.IOException;

@Service
public class OpenPdfService {

    public void generateSimplePdf(String outputPath) throws DocumentException, IOException {
        // 创建文档
        Document document = new Document();
        
        // 创建Writer
        PdfWriter.getInstance(document, new FileOutputStream(outputPath));
        
        // 打开文档
        document.open();
        
        // 添加内容
        document.add(new Paragraph("Hello World! 这是我用OpenPDF生成的文档。"));
        document.add(new Paragraph("PDF生成时间: " + new java.util.Date()));
        
        // 关闭文档
        document.close();
        
        System.out.println("OpenPDF已创建PDF: " + outputPath);
    }
}

3.4 在SpringBoot控制器中生成并下载PDF

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpStatus;
import org.springframework.http.MediaType;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;

import java.io.ByteArrayOutputStream;

@RestController
@RequestMapping("/api/pdf")
public class PdfController {

    @Autowired
    private PdfGenerationService pdfService;
    
    @GetMapping("/download")
    public ResponseEntity<byte[]> downloadPdf() {
        try {
            // 使用ByteArrayOutputStream而非文件
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            
            // 生成PDF到内存流
            pdfService.generatePdf(baos);
            
            // 设置HTTP头
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            
            // 设置文件下载头
            String filename = "generated_document.pdf";
            headers.setContentDispositionFormData("attachment", filename);
            
            // 返回PDF字节数组
            return new ResponseEntity<>(baos.toByteArray(), headers, HttpStatus.OK);
            
        } catch (Exception e) {
            e.printStackTrace();
            return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
        }
    }
}

对应的服务类:

import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;

import java.io.IOException;
import java.io.OutputStream;

@Service
public class PdfGenerationService {

    public void generatePdf(OutputStream outputStream) throws DocumentException, IOException {
        // 创建文档
        Document document = new Document();
        
        // 写入输出流
        PdfWriter.getInstance(document, outputStream);
        
        // 打开文档
        document.open();
        
        // 添加内容
        document.add(new Paragraph("动态生成的PDF内容"));
        document.add(new Paragraph("此PDF由SpringBoot应用程序生成"));
        document.add(new Paragraph("生成时间: " + new java.util.Date()));
        
        // 关闭文档
        document.close();
    }
}

4. 读取与解析PDF

4.1 使用PDFBox读取PDF文本

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;

import java.io.File;
import java.io.IOException;

@Service
public class PdfReaderService {

    public String extractTextFromPdf(String pdfPath) throws IOException {
        // 加载PDF文档
        File file = new File(pdfPath);
        PDDocument document = PDDocument.load(file);
        
        try {
            // 创建PDF文本提取器
            PDFTextStripper stripper = new PDFTextStripper();
            
            // 获取文本内容
            String text = stripper.getText(document);
            
            return text;
        } finally {
            // 确保文档关闭
            if (document != null) {
                document.close();
            }
        }
    }
    
    // 提取特定页面的文本
    public String extractTextFromPage(String pdfPath, int pageNumber) throws IOException {
        File file = new File(pdfPath);
        PDDocument document = PDDocument.load(file);
        
        try {
            PDFTextStripper stripper = new PDFTextStripper();
            
            // 设置起始页和结束页
            stripper.setStartPage(pageNumber);
            stripper.setEndPage(pageNumber);
            
            return stripper.getText(document);
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
}

4.2 使用iText解析PDF

import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;
import com.itextpdf.text.pdf.parser.SimpleTextExtractionStrategy;
import com.itextpdf.text.pdf.parser.TextExtractionStrategy;

import java.io.IOException;

@Service
public class ITextPdfReaderService {

    public String extractTextFromPdf(String pdfPath) throws IOException {
        PdfReader reader = new PdfReader(pdfPath);
        StringBuilder textBuilder = new StringBuilder();
        
        try {
            int pages = reader.getNumberOfPages();
            
            // 遍历所有页面
            for (int i = 1; i <= pages; i++) {
                TextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                String pageText = PdfTextExtractor.getTextFromPage(reader, i, strategy);
                textBuilder.append(pageText).append("\n");
            }
            
            return textBuilder.toString();
        } finally {
            if (reader != null) {
                reader.close();
            }
        }
    }
    
    // 获取PDF元数据
    public Map<String, String> getPdfMetadata(String pdfPath) throws IOException {
        PdfReader reader = new PdfReader(pdfPath);
        Map<String, String> metadata = new HashMap<>();
        
        try {
            metadata.put("Title", reader.getInfo().get("Title"));
            metadata.put("Author", reader.getInfo().get("Author"));
            metadata.put("Subject", reader.getInfo().get("Subject"));
            metadata.put("Keywords", reader.getInfo().get("Keywords"));
            metadata.put("Creator", reader.getInfo().get("Creator"));
            metadata.put("Producer", reader.getInfo().get("Producer"));
            metadata.put("Creation Date", reader.getInfo().get("CreationDate"));
            metadata.put("Modification Date", reader.getInfo().get("ModDate"));
            metadata.put("Page Count", String.valueOf(reader.getNumberOfPages()));
            
            return metadata;
        } finally {
            if (reader != null) {
                reader.close();
            }
        }
    }
}

4.3 从PDF中提取表格数据

提取PDF中的表格是一个复杂任务,可以使用如Tabula-Java等专门库:

<dependency>
    <groupId>technology.tabula</groupId>
    <artifactId>tabula</artifactId>
    <version>1.0.5</version>
</dependency>
import technology.tabula.ObjectExtractor;
import technology.tabula.Page;
import technology.tabula.PageIterator;
import technology.tabula.Rectangle;
import technology.tabula.Table;
import technology.tabula.extractors.SpreadsheetExtractionAlgorithm;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

@Service
public class PdfTableExtractorService {

    public List<String[][]> extractTablesFromPdf(String pdfPath) throws IOException {
        // 打开PDF文件
        PDDocument document = PDDocument.load(new File(pdfPath));
        
        List<String[][]> allTables = new ArrayList<>();
        
        try {
            // 创建ObjectExtractor
            ObjectExtractor extractor = new ObjectExtractor(document);
            
            // 提取所有页面
            PageIterator iterator = extractor.extract();
            
            // 表格提取算法
            SpreadsheetExtractionAlgorithm sea = new SpreadsheetExtractionAlgorithm();
            
            // 处理每一页
            while (iterator.hasNext()) {
                Page page = iterator.next();
                
                // 提取表格
                List<Table> tables = sea.extract(page);
                
                // 处理每个表格
                for (Table table : tables) {
                    int rowCount = table.getRowCount();
                    int colCount = table.getColCount();
                    
                    String[][] tableData = new String[rowCount][colCount];
                    
                    // 提取单元格数据
                    for (int i = 0; i < rowCount; i++) {
                        for (int j = 0; j < colCount; j++) {
                            if (j < table.getRows().get(i).size()) {
                                tableData[i][j] = table.getRows().get(i).get(j).getText();
                            } else {
                                tableData[i][j] = "";
                            }
                        }
                    }
                    
                    allTables.add(tableData);
                }
            }
        } finally {
            if (document != null) {
                document.close();
            }
        }
        
        return allTables;
    }
    
    // 打印表格数据(用于测试)
    public void printTableData(String[][] tableData) {
        for (String[] row : tableData) {
            for (String cell : row) {
                System.out.print(cell + " | ");
            }
            System.out.println();
        }
    }
} 

5. 修改现有PDF文件

修改现有PDF文件是常见需求,包括添加新内容、修改文本、删除页面、合并文档等操作。

5.1 添加水印和页码

使用iText添加水印
import com.itextpdf.text.BaseColor;
import com.itextpdf.text.Document;
import com.itextpdf.text.Element;
import com.itextpdf.text.Font;
import com.itextpdf.text.FontFactory;
import com.itextpdf.text.Rectangle;
import com.itextpdf.text.pdf.BaseFont;
import com.itextpdf.text.pdf.PdfContentByte;
import com.itextpdf.text.pdf.PdfGState;
import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.PdfStamper;

import java.io.FileOutputStream;
import java.io.IOException;

@Service
public class PdfWatermarkService {

    public void addWatermark(String inputPath, String outputPath, String watermarkText) 
            throws IOException, DocumentException {
        // 打开现有PDF
        PdfReader reader = new PdfReader(inputPath);
        PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(outputPath));
        
        // 创建基本字体
        BaseFont baseFont = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED);
        Font font = new Font(baseFont, 30, Font.BOLD, BaseColor.GRAY);
        
        // 获取PDF页数
        int pageCount = reader.getNumberOfPages();
        
        // 对每页添加水印
        for (int i = 1; i <= pageCount; i++) {
            // 获取页面尺寸
            Rectangle pageRect = reader.getPageSize(i);
            float width = pageRect.getWidth();
            float height = pageRect.getHeight();
            
            // 获取内容字节层(在内容下方)
            PdfContentByte under = stamper.getUnderContent(i);
            
            // 设置透明度
            PdfGState gs = new PdfGState();
            gs.setFillOpacity(0.3f);
            under.setGState(gs);
            
            // 保存图形状态
            under.saveState();
            
            // 设置字体和颜色
            under.setFontAndSize(baseFont, 30);
            under.setColorFill(BaseColor.GRAY);
            
            // 添加水印文本(旋转45度)
            under.beginText();
            // 文本旋转和位置
            under.showTextAligned(Element.ALIGN_CENTER, watermarkText, 
                                 width / 2, height / 2, 45);
            under.endText();
            
            // 恢复图形状态
            under.restoreState();
        }
        
        // 关闭资源
        stamper.close();
        reader.close();
        
        System.out.println("水印已添加: " + outputPath);
    }
}
使用PDFBox添加页码
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.PDPageContentStream.AppendMode;
import org.apache.pdfbox.pdmodel.font.PDType1Font;

import java.io.IOException;

@Service
public class PdfPageNumberService {

    public void addPageNumbers(String inputPath, String outputPath) throws IOException {
        // 打开PDF文档
        PDDocument document = PDDocument.load(new File(inputPath));
        
        try {
            // 获取页数
            int pageCount = document.getNumberOfPages();
            
            // 为每一页添加页码
            for (int i = 0; i < pageCount; i++) {
                PDPage page = document.getPage(i);
                
                // 创建内容流以追加内容(AppendMode.APPEND表示添加到现有内容后)
                PDPageContentStream contentStream = new PDPageContentStream(
                        document, page, AppendMode.APPEND, true, true);
                
                // 获取页面尺寸
                float pageWidth = page.getMediaBox().getWidth();
                float pageHeight = page.getMediaBox().getHeight();
                
                // 设置页码文本
                String pageNumberText = "第 " + (i + 1) + " 页,共 " + pageCount + " 页";
                
                // 使用PDFBox内置字体(注意中文需要使用支持中文的字体)
                contentStream.setFont(PDType1Font.HELVETICA, 10);
                
                // 添加文本(居中于页面底部)
                contentStream.beginText();
                // 计算文本宽度以居中
                float textWidth = PDType1Font.HELVETICA.getStringWidth(pageNumberText) / 1000 * 10;
                float xPosition = (pageWidth - textWidth) / 2;
                
                contentStream.newLineAtOffset(xPosition, 20); // 距底部20点
                contentStream.showText(pageNumberText);
                contentStream.endText();
                
                // 关闭内容流
                contentStream.close();
            }
            
            // 保存修改后的文档
            document.save(outputPath);
            
        } finally {
            if (document != null) {
                document.close();
            }
        }
        
        System.out.println("已添加页码: " + outputPath);
    }
}

5.2 合并多个PDF文件

使用PDFBox合并PDF
import org.apache.pdfbox.multipdf.PDFMergerUtility;
import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.File;
import java.io.IOException;
import java.util.List;

@Service
public class PdfMergeService {

    public void mergePdfFiles(List<String> inputPaths, String outputPath) throws IOException {
        // 创建PDF合并工具
        PDFMergerUtility merger = new PDFMergerUtility();
        
        // 设置目标文件
        merger.setDestinationFileName(outputPath);
        
        // 添加源文件
        for (String path : inputPaths) {
            File file = new File(path);
            merger.addSource(file);
        }
        
        // 执行合并
        merger.mergeDocuments(null);
        
        System.out.println("PDF合并完成: " + outputPath);
    }
    
    // 指定页面范围合并
    public void mergeWithPageRanges(String outputPath) throws IOException {
        // 创建空文档接收合并结果
        PDDocument mergedDocument = new PDDocument();
        
        try {
            // 打开第一个PDF
            PDDocument doc1 = PDDocument.load(new File("pdf1.pdf"));
            // 仅添加第1页和第3页
            mergedDocument.addPage(doc1.getPage(0)); // 0表示第1页
            mergedDocument.addPage(doc1.getPage(2)); // 2表示第3页
            doc1.close();
            
            // 打开第二个PDF
            PDDocument doc2 = PDDocument.load(new File("pdf2.pdf"));
            // 添加所有页面
            for (int i = 0; i < doc2.getNumberOfPages(); i++) {
                mergedDocument.addPage(doc2.getPage(i));
            }
            doc2.close();
            
            // 打开第三个PDF
            PDDocument doc3 = PDDocument.load(new File("pdf3.pdf"));
            // 只添加最后一页
            mergedDocument.addPage(doc3.getPage(doc3.getNumberOfPages() - 1));
            doc3.close();
            
            // 保存合并后的文档
            mergedDocument.save(outputPath);
            
        } finally {
            if (mergedDocument != null) {
                mergedDocument.close();
            }
        }
        
        System.out.println("选择性页面合并完成: " + outputPath);
    }
}
使用iText合并PDF
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfCopy;
import com.itextpdf.text.pdf.PdfImportedPage;
import com.itextpdf.text.pdf.PdfReader;

import java.io.FileOutputStream;
import java.io.IOException;
import java.util.List;

@Service
public class ITextPdfMergeService {

    public void mergePdfFiles(List<String> inputPaths, String outputPath) 
            throws IOException, DocumentException {
        // 创建一个文档对象
        Document document = new Document();
        
        // 创建PdfCopy实例
        PdfCopy copy = new PdfCopy(document, new FileOutputStream(outputPath));
        
        // 打开文档
        document.open();
        
        try {
            // 遍历每个输入PDF
            for (String path : inputPaths) {
                // 创建PdfReader实例
                PdfReader reader = new PdfReader(path);
                
                // 获取页数
                int pageCount = reader.getNumberOfPages();
                
                // 复制每一页
                for (int i = 1; i <= pageCount; i++) {
                    PdfImportedPage page = copy.getImportedPage(reader, i);
                    copy.addPage(page);
                }
                
                // 关闭reader
                reader.close();
            }
        } finally {
            // 关闭document
            if (document.isOpen()) {
                document.close();
            }
        }
        
        System.out.println("iText PDF合并完成: " + outputPath);
    }
}

5.3 分割PDF文件

import org.apache.pdfbox.multipdf.Splitter;
import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.File;
import java.io.IOException;
import java.util.Iterator;
import java.util.List;

@Service
public class PdfSplitService {

    // 将PDF拆分为单页文件
    public void splitPdfToSinglePages(String inputPath, String outputFolder) throws IOException {
        // 加载PDF
        PDDocument document = PDDocument.load(new File(inputPath));
        
        try {
            // 创建分割器
            Splitter splitter = new Splitter();
            
            // 执行分割(每个文档一页)
            List<PDDocument> pages = splitter.split(document);
            
            // 获取输入文件名(不带扩展名)
            String fileNameWithoutExt = new File(inputPath).getName();
            if (fileNameWithoutExt.contains(".")) {
                fileNameWithoutExt = fileNameWithoutExt.substring(0, 
                                    fileNameWithoutExt.lastIndexOf('.'));
            }
            
            // 创建输出目录(如果不存在)
            File outputDir = new File(outputFolder);
            if (!outputDir.exists()) {
                outputDir.mkdirs();
            }
            
            // 保存每一页为单独文件
            Iterator<PDDocument> iterator = pages.iterator();
            int pageNumber = 1;
            
            while (iterator.hasNext()) {
                PDDocument pd = iterator.next();
                String outputPath = outputFolder + File.separator 
                                   + fileNameWithoutExt + "_page_" + pageNumber + ".pdf";
                pd.save(outputPath);
                pd.close();
                pageNumber++;
            }
            
            System.out.println("PDF分割完成,共 " + (pageNumber - 1) + " 页已保存到: " + outputFolder);
            
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
    
    // 将PDF按页范围拆分
    public void splitPdfByPageRange(String inputPath, String outputPath, int startPage, int endPage) 
            throws IOException {
        // 加载PDF
        PDDocument document = PDDocument.load(new File(inputPath));
        
        try {
            // 验证页面范围
            int totalPages = document.getNumberOfPages();
            if (startPage < 1 || endPage > totalPages || startPage > endPage) {
                throw new IllegalArgumentException("无效的页面范围: " + startPage + "-" + endPage 
                                                  + ",文档共有 " + totalPages + " 页");
            }
            
            // 创建新文档
            PDDocument newDocument = new PDDocument();
            
            // 复制指定页面
            for (int i = startPage; i <= endPage; i++) {
                // 注意PDFBox页码从0开始
                newDocument.addPage(document.getPage(i - 1));
            }
            
            // 保存新文档
            newDocument.save(outputPath);
            newDocument.close();
            
            System.out.println("已提取页面 " + startPage + " 到 " + endPage + " 并保存到: " + outputPath);
            
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
}

5.4 加密与解密PDF

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.encryption.AccessPermission;
import org.apache.pdfbox.pdmodel.encryption.StandardProtectionPolicy;

import java.io.File;
import java.io.IOException;

@Service
public class PdfEncryptionService {

    // 加密PDF文件
    public void encryptPdf(String inputPath, String outputPath, 
                         String userPassword, String ownerPassword) throws IOException {
        // 加载PDF
        PDDocument document = PDDocument.load(new File(inputPath));
        
        try {
            // 设置访问权限
            AccessPermission accessPermission = new AccessPermission();
            
            // 禁止打印
            accessPermission.setCanPrint(false);
            
            // 禁止修改内容
            accessPermission.setCanModify(false);
            
            // 禁止复制内容
            accessPermission.setCanExtractContent(false);
            
            // 禁止添加或修改注释
            accessPermission.setCanModifyAnnotations(false);
            
            // 创建保护策略(用户密码、所有者密码、密钥长度)
            StandardProtectionPolicy policy = new StandardProtectionPolicy(
                    ownerPassword, userPassword, accessPermission);
            
            // 设置加密密钥长度(128位)
            policy.setEncryptionKeyLength(128);
            
            // 应用加密
            document.protect(policy);
            
            // 保存加密后的文档
            document.save(outputPath);
            
            System.out.println("PDF加密完成: " + outputPath);
            
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
    
    // 解密PDF文件(需要提供密码)
    public void decryptPdf(String inputPath, String outputPath, String password) throws IOException {
        // 加载加密的PDF(提供密码)
        PDDocument document = PDDocument.load(new File(inputPath), password);
        
        try {
            // 检查文档是否加密
            if (document.isEncrypted()) {
                // 移除加密
                document.setAllSecurityToBeRemoved(true);
                
                // 保存解密后的文档
                document.save(outputPath);
                
                System.out.println("PDF解密完成: " + outputPath);
            } else {
                System.out.println("PDF未加密,无需解密");
                // 可以选择直接复制文件
                document.save(outputPath);
            }
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
}

5.5 删除和重新排序页面

import org.apache.pdfbox.pdmodel.PDDocument;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

@Service
public class PdfPageManipulationService {

    // 删除指定页面
    public void removePages(String inputPath, String outputPath, List<Integer> pagesToRemove) 
            throws IOException {
        // 加载PDF
        PDDocument document = PDDocument.load(new File(inputPath));
        
        try {
            // 按降序排列要删除的页码(从后往前删除避免索引变化)
            Collections.sort(pagesToRemove, Collections.reverseOrder());
            
            // 删除指定页面
            for (int pageNum : pagesToRemove) {
                // 验证页码是否有效
                if (pageNum >= 1 && pageNum <= document.getNumberOfPages()) {
                    // PDFBox页码从0开始
                    document.removePage(pageNum - 1);
                } else {
                    System.out.println("警告:页码 " + pageNum + " 超出范围,已忽略");
                }
            }
            
            // 保存修改后的文档
            document.save(outputPath);
            
            System.out.println("页面删除完成,保存到: " + outputPath);
            
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
    
    // 重新排序页面
    public void reorderPages(String inputPath, String outputPath, int[] newOrder) throws IOException {
        // 加载PDF
        PDDocument document = PDDocument.load(new File(inputPath));
        
        try {
            // 获取原始页数
            int pageCount = document.getNumberOfPages();
            
            // 验证新顺序数组长度
            if (newOrder.length != pageCount) {
                throw new IllegalArgumentException(
                    "新顺序数组长度(" + newOrder.length + ")与页数(" + pageCount + ")不匹配");
            }
            
            // 创建临时文档
            PDDocument newDocument = new PDDocument();
            
            // 按新顺序添加页面
            for (int i = 0; i < newOrder.length; i++) {
                // 确保索引从1开始转为0开始
                int oldIndex = newOrder[i] - 1;
                
                // 验证索引有效性
                if (oldIndex < 0 || oldIndex >= pageCount) {
                    throw new IllegalArgumentException("新顺序中包含无效页码: " + (oldIndex + 1));
                }
                
                // 导入页面
                newDocument.addPage(document.getPage(oldIndex));
            }
            
            // 保存新文档
            newDocument.save(outputPath);
            newDocument.close();
            
            System.out.println("页面重新排序完成,保存到: " + outputPath);
            
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
}

5.6 填充PDF表单

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm;
import org.apache.pdfbox.pdmodel.interactive.form.PDField;

import java.io.File;
import java.io.IOException;
import java.util.Map;

@Service
public class PdfFormFillingService {

    // 填充PDF表单
    public void fillPdfForm(String templatePath, String outputPath, 
                           Map<String, String> formData) throws IOException {
        // 加载PDF模板
        PDDocument document = PDDocument.load(new File(templatePath));
        
        try {
            // 获取表单
            PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
            
            if (acroForm != null) {
                // 遍历表单字段并填充数据
                for (Map.Entry<String, String> entry : formData.entrySet()) {
                    String fieldName = entry.getKey();
                    String fieldValue = entry.getValue();
                    
                    // 查找表单字段
                    PDField field = acroForm.getField(fieldName);
                    
                    if (field != null) {
                        // 设置字段值
                        field.setValue(fieldValue);
                    } else {
                        System.out.println("警告:找不到表单字段 '" + fieldName + "'");
                    }
                }
                
                // 设置表单为不可编辑(可选)
                acroForm.setNeedAppearances(true);
                
                // 保存填充后的表单
                document.save(outputPath);
                
                System.out.println("PDF表单填充完成,保存到: " + outputPath);
            } else {
                System.out.println("错误:PDF文档不包含表单");
            }
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
    
    // 列出PDF文档中的所有表单字段(用于调试)
    public void listFormFields(String pdfPath) throws IOException {
        // 加载PDF
        PDDocument document = PDDocument.load(new File(pdfPath));
        
        try {
            // 获取表单
            PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
            
            if (acroForm != null) {
                // 获取所有字段
                List<PDField> fields = acroForm.getFields();
                
                System.out.println("PDF文档包含 " + fields.size() + " 个表单字段:");
                
                // 打印字段名称和类型
                for (PDField field : fields) {
                    System.out.println("字段名称: " + field.getFullyQualifiedName() + 
                                      ", 类型: " + field.getClass().getSimpleName());
                }
            } else {
                System.out.println("PDF文档不包含表单");
            }
        } finally {
            if (document != null) {
                document.close();
            }
        }
    }
} 

6. Web应用中的PDF处理

在SpringBoot Web应用中,PDF处理是一个常见需求,包括PDF的生成与下载、在线预览、上传与解析等功能。本节将介绍如何在SpringBoot应用中实现这些功能。

6.1 PDF下载功能实现

在Web应用中,通常需要提供PDF下载功能,如报表下载、证书下载等。下面是一个基本的PDF下载控制器实现:

@RestController
@RequestMapping("/api/pdf")
public class PdfDownloadController {

    private final PdfGenerationService pdfService;
    
    @Autowired
    public PdfDownloadController(PdfGenerationService pdfService) {
        this.pdfService = pdfService;
    }
    
    /**
     * 生成并下载简单PDF
     */
    @GetMapping("/download/simple")
    public ResponseEntity<byte[]> downloadSimplePdf() {
        try {
            byte[] pdfBytes = pdfService.generateSimplePdf();
            
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            headers.setContentDispositionFormData("attachment", "document.pdf");
            headers.setCacheControl("must-revalidate, post-check=0, pre-check=0");
            
            return new ResponseEntity<>(pdfBytes, headers, HttpStatus.OK);
        } catch (Exception e) {
            return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
        }
    }
    
    /**
     * 根据ID生成并下载报表
     */
    @GetMapping("/download/report/{id}")
    public ResponseEntity<byte[]> downloadReportById(@PathVariable("id") Long id) {
        try {
            byte[] pdfBytes = pdfService.generateReportPdf(id);
            if (pdfBytes == null) {
                return new ResponseEntity<>(HttpStatus.NOT_FOUND);
            }
            
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            headers.setContentDispositionFormData("attachment", "report-" + id + ".pdf");
            headers.setCacheControl("must-revalidate, post-check=0, pre-check=0");
            
            return new ResponseEntity<>(pdfBytes, headers, HttpStatus.OK);
        } catch (Exception e) {
            return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
        }
    }
}

以下是使用模板生成证书并下载的示例:

@Controller
@RequestMapping("/api/pdf/templates")
public class PdfTemplateController {

    private final PdfTemplateService templateService;
    
    @Autowired
    public PdfTemplateController(PdfTemplateService templateService) {
        this.templateService = templateService;
    }
    
    /**
     * 生成证书并下载
     */
    @GetMapping("/certificate/{userId}")
    public ResponseEntity<byte[]> generateCertificate(@PathVariable Long userId) {
        try {
            // 获取用户信息
            UserDto user = userService.findById(userId);
            if (user == null) {
                return new ResponseEntity<>(HttpStatus.NOT_FOUND);
            }
            
            // 生成证书
            byte[] certificateBytes = templateService.generateCertificateFromTemplate(user);
            
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            headers.setContentDispositionFormData("attachment", 
                    "certificate-" + user.getUsername() + ".pdf");
            
            return new ResponseEntity<>(certificateBytes, headers, HttpStatus.OK);
        } catch (Exception e) {
            return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
        }
    }
}

6.2 在线PDF预览实现

为了提供更好的用户体验,有时需要在浏览器中直接预览PDF,而不是下载到本地。下面是一个实现在线PDF预览的控制器:

@Controller
@RequestMapping("/pdf/view")
public class PdfViewController {

    private final DocumentService documentService;
    
    @Autowired
    public PdfViewController(DocumentService documentService) {
        this.documentService = documentService;
    }
    
    /**
     * 返回PDF预览页面
     */
    @GetMapping("/{documentId}")
    public String viewPdf(@PathVariable Long documentId, Model model) {
        // 检查文档是否存在
        if (!documentService.exists(documentId)) {
            return "error/404";
        }
        
        model.addAttribute("documentId", documentId);
        model.addAttribute("documentName", documentService.getName(documentId));
        return "pdf/viewer";  // 返回Thymeleaf模板
    }
    
    /**
     * 提供PDF数据的端点
     */
    @GetMapping("/data/{documentId}")
    @ResponseBody
    public ResponseEntity<byte[]> getPdfData(@PathVariable Long documentId) {
        try {
            byte[] pdfData = documentService.getPdfContent(documentId);
            if (pdfData == null) {
                return new ResponseEntity<>(HttpStatus.NOT_FOUND);
            }
            
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            // 使用inline而不是attachment,这样浏览器会直接显示PDF而不是下载
            headers.add("Content-Disposition", "inline; filename=document-" + documentId + ".pdf");
            
            return new ResponseEntity<>(pdfData, headers, HttpStatus.OK);
        } catch (Exception e) {
            return new ResponseEntity<>(HttpStatus.INTERNAL_SERVER_ERROR);
        }
    }
}

Thymeleaf模板(viewer.html)示例:

<!DOCTYPE html>
<html xmlns:th="http://www.thymeleaf.org">
<head>
    <title th:text="${documentName} + ' - PDF预览'"></title>
    <meta charset="UTF-8"/>
    <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.11.338/pdf.min.js"></script>
    <style>
        body { margin: 0; padding: 0; }
        #pdf-container { width: 100%; height: 100vh; overflow: auto; }
        #pdf-viewer { width: 100%; height: 100%; }
    </style>
</head>
<body>
    <div id="pdf-container">
        <iframe id="pdf-viewer" th:src="@{'/pdf/view/data/' + ${documentId}}" frameborder="0"></iframe>
    </div>
</body>
</html>

6.3 PDF文件上传和处理

在许多应用中,需要允许用户上传PDF文件并进行处理。以下是一个处理PDF上传的控制器示例:

@RestController
@RequestMapping("/api/pdf/upload")
public class PdfUploadController {

    private final PdfAnalysisService pdfAnalysisService;
    
    @Autowired
    public PdfUploadController(PdfAnalysisService pdfAnalysisService) {
        this.pdfAnalysisService = pdfAnalysisService;
    }
    
    /**
     * 处理PDF上传并分析内容
     */
    @PostMapping("/analyze")
    public ResponseEntity<?> uploadAndAnalyzePdf(@RequestParam("file") MultipartFile file) {
        // 检查文件是否为空
        if (file.isEmpty()) {
            return ResponseEntity.badRequest().body("请选择要上传的PDF文件");
        }
        
        // 检查文件类型
        if (!file.getContentType().equals("application/pdf")) {
            return ResponseEntity.badRequest().body("只支持PDF文件上传");
        }
        
        try {
            // 读取文件内容
            byte[] pdfBytes = file.getBytes();
            
            // 分析PDF内容
            PdfAnalysisResult result = pdfAnalysisService.analyzePdf(pdfBytes);
            
            return ResponseEntity.ok(result);
        } catch (IOException e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body("PDF处理失败: " + e.getMessage());
        }
    }
    
    /**
     * 上传并保存PDF文件
     */
    @PostMapping("/save")
    public ResponseEntity<?> uploadAndSavePdf(
            @RequestParam("file") MultipartFile file,
            @RequestParam("title") String title,
            @RequestParam("description") String description) {
        
        if (file.isEmpty()) {
            return ResponseEntity.badRequest().body("请选择要上传的PDF文件");
        }
        
        try {
            // 保存PDF文件并获取文档ID
            Long documentId = pdfAnalysisService.savePdfDocument(
                    file.getBytes(), 
                    file.getOriginalFilename(), 
                    title, 
                    description);
            
            Map<String, Object> response = new HashMap<>();
            response.put("success", true);
            response.put("documentId", documentId);
            response.put("message", "PDF文件上传成功");
            
            return ResponseEntity.ok(response);
        } catch (Exception e) {
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body("PDF上传失败: " + e.getMessage());
        }
    }
}

6.4 批量PDF处理

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.UUID;

@RestController
@RequestMapping("/api/pdf/batch")
public class PdfBatchController {

    @Autowired
    private PdfBatchService pdfBatchService;
    
    // 临时目录用于保存上传的文件
    private final Path tempDir = Paths.get(System.getProperty("java.io.tmpdir"));
    
    @PostMapping("/merge")
    public ResponseEntity<?> mergePdfs(@RequestParam("files") MultipartFile[] files) {
        
        Map<String, Object> response = new HashMap<>();
        List<String> tempFilePaths = new ArrayList<>();
        
        try {
            // 检查文件
            if (files.length < 2) {
                response.put("success", false);
                response.put("message", "请至少提供两个PDF文件进行合并");
                return ResponseEntity.badRequest().body(response);
            }
            
            // 保存上传的文件到临时目录
            for (MultipartFile file : files) {
                // 验证文件类型
                if (!file.getContentType().equals("application/pdf")) {
                    response.put("success", false);
                    response.put("message", "文件 " + file.getOriginalFilename() + " 不是PDF格式");
                    return ResponseEntity.badRequest().body(response);
                }
                
                // 创建唯一文件名
                String tempFileName = UUID.randomUUID().toString() + ".pdf";
                Path tempFile = tempDir.resolve(tempFileName);
                
                // 保存文件
                Files.copy(file.getInputStream(), tempFile);
                tempFilePaths.add(tempFile.toString());
            }
            
            // 生成合并后的PDF文件名
            String outputFileName = "merged_" + UUID.randomUUID().toString() + ".pdf";
            Path outputPath = tempDir.resolve(outputFileName);
            
            // 执行合并
            pdfBatchService.mergePdfFiles(tempFilePaths, outputPath.toString());
            
            // 读取合并后的文件
            byte[] mergedPdfBytes = Files.readAllBytes(outputPath);
            
            // 清理临时文件
            for (String path : tempFilePaths) {
                Files.deleteIfExists(Paths.get(path));
            }
            Files.deleteIfExists(outputPath);
            
            // 设置响应
            response.put("success", true);
            response.put("message", "PDF文件已成功合并");
            response.put("mergedFileSize", mergedPdfBytes.length);
            
            // 可以返回合并后的文件作为Base64字符串(适用于小文件)
            response.put("mergedPdfBase64", 
                    java.util.Base64.getEncoder().encodeToString(mergedPdfBytes));
            
            return ResponseEntity.ok(response);
            
        } catch (Exception e) {
            // 清理临时文件
            for (String path : tempFilePaths) {
                try {
                    Files.deleteIfExists(Paths.get(path));
                } catch (IOException ex) {
                    // 忽略清理错误
                }
            }
            
            response.put("success", false);
            response.put("message", "合并PDF时出错: " + e.getMessage());
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).body(response);
        }
    }
}

批量处理服务:

import org.apache.pdfbox.multipdf.PDFMergerUtility;
import org.springframework.stereotype.Service;

import java.io.File;
import java.io.IOException;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.stream.Collectors;

@Service
public class PdfBatchService {

    // 用于异步处理的线程池
    private final ExecutorService executor = Executors.newFixedThreadPool(5);
    
    // 合并PDF文件
    public void mergePdfFiles(List<String> inputPaths, String outputPath) throws IOException {
        PDFMergerUtility merger = new PDFMergerUtility();
        merger.setDestinationFileName(outputPath);
        
        for (String path : inputPaths) {
            merger.addSource(new File(path));
        }
        
        merger.mergeDocuments(null);
    }
    
    // 异步处理多个PDF
    public CompletableFuture<List<String>> processMultiplePdfsAsync(
            List<String> inputPaths, String outputDir, String operation) {
        
        // 创建CompletableFuture列表
        List<CompletableFuture<String>> futures = inputPaths.stream()
            .map(path -> CompletableFuture.supplyAsync(() -> {
                try {
                    // 根据操作类型处理PDF
                    String outputPath = outputDir + File.separator 
                                      + new File(path).getName().replace(".pdf", "_processed.pdf");
                    
                    switch (operation) {
                        case "watermark":
                            addWatermark(path, outputPath, "机密文档");
                            break;
                        case "compress":
                            compressPdf(path, outputPath);
                            break;
                        case "encrypt":
                            encryptPdf(path, outputPath, "password123", "owner456");
                            break;
                        // 添加其他操作...
                        default:
                            throw new IllegalArgumentException("不支持的操作: " + operation);
                    }
                    
                    return outputPath;
                } catch (Exception e) {
                    throw new RuntimeException("处理文件失败: " + path, e);
                }
            }, executor))
            .collect(Collectors.toList());
        
        // 组合所有Future
        CompletableFuture<Void> allOf = CompletableFuture.allOf(
                futures.toArray(new CompletableFuture[0]));
        
        // 处理完成后收集结果
        return allOf.thenApply(v ->
            futures.stream()
                .map(CompletableFuture::join)
                .collect(Collectors.toList())
        );
    }
    
    // 添加水印方法
    private void addWatermark(String inputPath, String outputPath, String watermarkText) 
            throws IOException {
        // 实现与PdfWatermarkService类似
        // ...
    }
    
    // 压缩PDF方法
    private void compressPdf(String inputPath, String outputPath) throws IOException {
        // 实现PDF压缩
        // ...
    }
    
    // 加密PDF方法
    private void encryptPdf(String inputPath, String outputPath, 
                          String userPassword, String ownerPassword) throws IOException {
        // 实现与PdfEncryptionService类似
        // ...
    }
}

7. PDF安全性与高级功能

在处理PDF文档时,安全性是一个重要的考虑因素,尤其是在处理敏感信息时。同时,高级功能如水印、数字签名等可以为PDF文档添加更多实用功能。

7.1 PDF文档加密与权限控制

使用iText可以很容易地对PDF文档进行加密和设置权限:

@Service
public class PdfSecurityService {

    /**
     * 创建带密码保护的PDF
     */
    public byte[] createEncryptedPdf(String content, String userPassword, String ownerPassword) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfWriter writer = new PdfWriter(baos);
        PdfDocument pdf = new PdfDocument(writer);
        
        // 设置PDF文档的加密选项
        WriterProperties writerProperties = new WriterProperties();
        
        // 设置用户密码(打开文档需要)和所有者密码(修改权限需要)
        writerProperties.setStandardEncryption(
                userPassword.getBytes(),
                ownerPassword.getBytes(),
                EncryptionConstants.ALLOW_PRINTING, // 允许打印
                EncryptionConstants.ENCRYPTION_AES_128 // 使用AES 128位加密
        );
        
        writer.setWriterProperties(writerProperties);
        
        // 创建文档内容
        Document document = new Document(pdf);
        document.add(new Paragraph(content));
        document.close();
        
        return baos.toByteArray();
    }
    
    /**
     * 创建带权限控制的PDF
     */
    public byte[] createPermissionControlledPdf(String content, String ownerPassword, int permissions) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        WriterProperties writerProperties = new WriterProperties();
        writerProperties.setStandardEncryption(
                null, // 无用户密码
                ownerPassword.getBytes(),
                permissions, // 自定义权限
                EncryptionConstants.ENCRYPTION_AES_256 // 使用AES 256位加密
        );
        
        PdfWriter writer = new PdfWriter(baos, writerProperties);
        PdfDocument pdf = new PdfDocument(writer);
        Document document = new Document(pdf);
        document.add(new Paragraph(content));
        document.close();
        
        return baos.toByteArray();
    }
    
    /**
     * 检查PDF是否加密
     */
    public boolean isPdfEncrypted(byte[] pdfData) throws IOException {
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        PdfDocument pdf = new PdfDocument(reader);
        boolean isEncrypted = pdf.isEncrypted();
        pdf.close();
        return isEncrypted;
    }
}

7.2 添加水印

水印是一种常见的PDF高级功能,可以用来保护文档版权或标记文档状态:

@Service
public class PdfWatermarkService {

    /**
     * 添加文本水印
     */
    public byte[] addTextWatermark(byte[] originalPdf, String watermarkText) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(originalPdf));
        PdfWriter writer = new PdfWriter(baos);
        PdfDocument pdf = new PdfDocument(reader, writer);
        
        // 获取页数
        int numberOfPages = pdf.getNumberOfPages();
        
        // 创建透明的水印文本
        PdfFont font = PdfFontFactory.createFont(StandardFonts.HELVETICA);
        Paragraph watermark = new Paragraph(watermarkText)
                .setFont(font)
                .setFontSize(30)
                .setFontColor(new DeviceRgb(0.5f, 0.5f, 0.5f), 0.3f); // 灰色,30%不透明度
        
        // 在每一页添加水印
        for (int i = 1; i <= numberOfPages; i++) {
            PdfPage page = pdf.getPage(i);
            Rectangle pageSize = page.getPageSize();
            float x = (pageSize.getLeft() + pageSize.getRight()) / 2;
            float y = (pageSize.getBottom() + pageSize.getTop()) / 2;
            
            // 创建一个新的Canvas用于绘制水印
            PdfCanvas canvas = new PdfCanvas(page);
            canvas.saveState();
            
            // 应用旋转变换
            canvas.setFillColor(new DeviceRgb(0.5f, 0.5f, 0.5f));
            canvas.setExtGState(new PdfExtGState().setFillOpacity(0.3f));
            
            // 使用Canvas绘制文本
            Canvas watermarkCanvas = new Canvas(canvas, pdf, page.getPageSize());
            watermarkCanvas.showTextAligned(watermark, x, y, i, TextAlignment.CENTER, VerticalAlignment.MIDDLE, (float) Math.PI / 6);
            
            canvas.restoreState();
        }
        
        pdf.close();
        return baos.toByteArray();
    }
    
    /**
     * 添加图片水印
     */
    public byte[] addImageWatermark(byte[] originalPdf, byte[] imageData) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(originalPdf));
        PdfWriter writer = new PdfWriter(baos);
        PdfDocument pdf = new PdfDocument(reader, writer);
        
        // 转换图片数据为ImageData
        ImageData imageDataObj = ImageDataFactory.create(imageData);
        Image image = new Image(imageDataObj);
        
        // 设置图片大小和不透明度
        image.scaleAbsolute(100, 100);
        image.setOpacity(0.3f);
        
        // 在每一页添加水印
        int numberOfPages = pdf.getNumberOfPages();
        for (int i = 1; i <= numberOfPages; i++) {
            PdfPage page = pdf.getPage(i);
            Rectangle pageSize = page.getPageSize();
            
            // 计算图片位置(居中)
            float x = (pageSize.getLeft() + pageSize.getRight()) / 2 - image.getImageScaledWidth() / 2;
            float y = (pageSize.getBottom() + pageSize.getTop()) / 2 - image.getImageScaledHeight() / 2;
            
            // 创建Canvas并添加图片
            PdfCanvas canvas = new PdfCanvas(page);
            canvas.addImage(imageDataObj, x, y, false);
        }
        
        pdf.close();
        return baos.toByteArray();
    }
}

7.3 数字签名

数字签名是确保PDF文档完整性和真实性的重要手段:

@Service
public class PdfSignatureService {

    /**
     * 使用数字证书签署PDF
     */
    public byte[] signPdf(byte[] pdfData, KeyStore keystore, String alias, char[] password) throws Exception {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        PdfSigner signer = new PdfSigner(reader, baos, new StampingProperties());
        
        // 配置签名外观
        PdfSignatureAppearance appearance = signer.getSignatureAppearance();
        appearance.setReason("证明文档真实性")
                 .setLocation("北京")
                 .setSignatureCreator("PDF签名系统")
                 .setReuseAppearance(false);
        
        // 配置签名矩形区域(位于最后一页的左下角)
        Rectangle rect = new Rectangle(36, 36, 200, 50);
        appearance.setPageRect(rect)
                 .setPageNumber(reader.getNumberOfPages());
        
        // 设置签名信息
        PrivateKey pk = (PrivateKey) keystore.getKey(alias, password);
        Certificate[] chain = keystore.getCertificateChain(alias);
        
        // 创建签名者
        IExternalSignature pks = new PrivateKeySignature(pk, DigestAlgorithms.SHA256, null);
        IExternalDigest digest = new BouncyCastleDigest();
        
        // 执行签名
        signer.signDetached(digest, pks, chain, null, null, null, 0, PdfSigner.CryptoStandard.CMS);
        
        return baos.toByteArray();
    }
    
    /**
     * 创建自签名证书(仅用于测试)
     */
    public KeyStore createSelfSignedCertificate() throws Exception {
        // 生成密钥对
        KeyPairGenerator keyGen = KeyPairGenerator.getInstance("RSA");
        keyGen.initialize(2048);
        KeyPair keyPair = keyGen.generateKeyPair();
        
        // 创建自签名证书
        X509Certificate cert = generateSelfSignedCertificate(keyPair);
        
        // 创建KeyStore并存储证书
        KeyStore keyStore = KeyStore.getInstance(KeyStore.getDefaultType());
        keyStore.load(null, null);
        keyStore.setKeyEntry("pdf-signer", keyPair.getPrivate(), "password".toCharArray(), 
                new java.security.cert.Certificate[]{cert});
        
        return keyStore;
    }
    
    /**
     * 生成自签名证书
     */
    private X509Certificate generateSelfSignedCertificate(KeyPair keyPair) throws Exception {
        // 使用Bouncy Castle实现
        Security.addProvider(new org.bouncycastle.jce.provider.BouncyCastleProvider());
        
        // 当前时间
        long now = System.currentTimeMillis();
        
        // 证书有效期为1年
        Date startDate = new Date(now);
        Date endDate = new Date(now + 365 * 24 * 60 * 60 * 1000);
        
        // 证书序列号
        BigInteger serialNumber = BigInteger.valueOf(now);
        
        // 证书信息
        X500Name subject = new X500Name("CN=PDF Signer, O=Example Organization, L=Beijing, C=CN");
        
        // 证书生成
        X509v3CertificateBuilder builder = new JcaX509v3CertificateBuilder(
                subject, 
                serialNumber, 
                startDate, 
                endDate, 
                subject, 
                keyPair.getPublic()
        );
        
        // 签名算法
        ContentSigner contentSigner = new JcaContentSignerBuilder("SHA256WithRSAEncryption")
                .setProvider("BC").build(keyPair.getPrivate());
        
        // 生成证书
        X509CertificateHolder holder = builder.build(contentSigner);
        X509Certificate cert = new JcaX509CertificateConverter()
                .setProvider("BC").getCertificate(holder);
        
        return cert;
    }
    
    /**
     * 验证PDF签名
     */
    public List<SignatureVerificationResult> verifyPdfSignatures(byte[] pdfData) throws IOException {
        List<SignatureVerificationResult> results = new ArrayList<>();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        PdfDocument pdf = new PdfDocument(reader);
        
        SignatureUtil signUtil = new SignatureUtil(pdf);
        List<String> sigNames = signUtil.getSignatureNames();
        
        for (String name : sigNames) {
            PdfPKCS7 pkcs7 = signUtil.readSignatureData(name);
            
            // 获取签名时间
            Calendar cal = pkcs7.getSignDate();
            
            // 获取签名信息
            String reason = pkcs7.getReason();
            String location = pkcs7.getLocation();
            
            // 验证签名
            boolean isSignatureValid = false;
            boolean isDocumentModified = false;
            
            try {
                isSignatureValid = pkcs7.verifySignatureIntegrityAndAuthenticity();
                isDocumentModified = signUtil.signatureCoversWholeDocument(name);
            } catch (Exception e) {
                // 验证过程出错
            }
            
            // 记录验证结果
            SignatureVerificationResult result = new SignatureVerificationResult(
                    name, cal.getTime(), reason, location, isSignatureValid, !isDocumentModified);
            results.add(result);
        }
        
        pdf.close();
        return results;
    }
    
    // 签名验证结果类
    public static class SignatureVerificationResult {
        private String name;
        private Date date;
        private String reason;
        private String location;
        private boolean valid;
        private boolean modified;
        
        // 构造函数、getter和setter省略
        
        public SignatureVerificationResult(String name, Date date, String reason, String location, 
                                           boolean valid, boolean modified) {
            this.name = name;
            this.date = date;
            this.reason = reason;
            this.location = location;
            this.valid = valid;
            this.modified = modified;
        }
    }
}

7.4 PDF表单与交互功能

PDF表单允许创建交互式文档,用户可以填写并提交这些表单:

@Service
public class PdfFormService {

    /**
     * 创建包含表单的PDF
     */
    public byte[] createPdfWithForm() throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfWriter writer = new PdfWriter(baos);
        PdfDocument pdf = new PdfDocument(writer);
        Document document = new Document(pdf);
        
        // 添加标题
        document.add(new Paragraph("用户注册表单").setFontSize(20).setBold());
        document.add(new Paragraph("请填写以下信息:").setFontSize(12));
        document.add(new Paragraph("\n"));
        
        // 创建表单
        PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
        
        // 姓名字段
        document.add(new Paragraph("姓名:"));
        Rectangle nameRect = new Rectangle(100, 700, 200, 20);
        PdfTextFormField nameField = PdfTextFormField.createText(pdf, nameRect, "name", "");
        form.addField(nameField);
        
        // 邮箱字段
        document.add(new Paragraph("邮箱:"));
        Rectangle emailRect = new Rectangle(100, 650, 200, 20);
        PdfTextFormField emailField = PdfTextFormField.createText(pdf, emailRect, "email", "");
        form.addField(emailField);
        
        // 性别单选按钮
        document.add(new Paragraph("性别:"));
        // 创建单选按钮组
        PdfButtonFormField genderGroup = PdfFormField.createRadioGroup(pdf, "gender", "");
        
        // 男性选项
        Rectangle maleRect = new Rectangle(100, 600, 20, 20);
        PdfFormField male = PdfFormField.createRadioButton(pdf, maleRect, genderGroup, "男");
        form.addField(male);
        document.add(new Paragraph("男").setFixedPosition(125, 600, 50));
        
        // 女性选项
        Rectangle femaleRect = new Rectangle(160, 600, 20, 20);
        PdfFormField female = PdfFormField.createRadioButton(pdf, femaleRect, genderGroup, "女");
        form.addField(female);
        document.add(new Paragraph("女").setFixedPosition(185, 600, 50));
        
        form.addField(genderGroup);
        
        // 兴趣复选框
        document.add(new Paragraph("兴趣爱好:"));
        
        // 阅读选项
        Rectangle readingRect = new Rectangle(100, 550, 20, 20);
        PdfFormField reading = PdfFormField.createCheckBox(pdf, readingRect, "reading", "Yes", PdfFormField.TYPE_CHECK);
        form.addField(reading);
        document.add(new Paragraph("阅读").setFixedPosition(125, 550, 50));
        
        // 旅行选项
        Rectangle travelRect = new Rectangle(180, 550, 20, 20);
        PdfFormField travel = PdfFormField.createCheckBox(pdf, travelRect, "travel", "Yes", PdfFormField.TYPE_CHECK);
        form.addField(travel);
        document.add(new Paragraph("旅行").setFixedPosition(205, 550, 50));
        
        // 音乐选项
        Rectangle musicRect = new Rectangle(260, 550, 20, 20);
        PdfFormField music = PdfFormField.createCheckBox(pdf, musicRect, "music", "Yes", PdfFormField.TYPE_CHECK);
        form.addField(music);
        document.add(new Paragraph("音乐").setFixedPosition(285, 550, 50));
        
        // 提交按钮
        Rectangle submitRect = new Rectangle(100, 500, 100, 30);
        PdfButtonFormField submit = PdfFormField.createPushButton(pdf, submitRect, "submit", "提交");
        submit.setAction(PdfAction.createSubmitForm("/submit-form", null, PdfAction.SUBMIT_HTML_FORMAT, 0));
        form.addField(submit);
        
        document.close();
        return baos.toByteArray();
    }
    
    /**
     * 从提交的表单中提取数据
     */
    public Map<String, Object> extractFormData(byte[] pdfData) throws IOException {
        Map<String, Object> formData = new HashMap<>();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        PdfDocument pdf = new PdfDocument(reader);
        PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, false);
        
        if (form != null) {
            // 获取所有表单字段
            Map<String, PdfFormField> fields = form.getFormFields();
            
            // 提取每个字段的值
            for (Map.Entry<String, PdfFormField> entry : fields.entrySet()) {
                String fieldName = entry.getKey();
                PdfFormField field = entry.getValue();
                
                // 根据字段类型处理不同的表单元素
                if (field.getFormType() == PdfName.Tx) {
                    // 文本字段
                    formData.put(fieldName, field.getValueAsString());
                } else if (field.getFormType() == PdfName.Btn) {
                    // 按钮(复选框或单选按钮)
                    if (field.isCheckBox()) {
                        formData.put(fieldName, "Yes".equals(field.getValueAsString()));
                    } else if (field.isRadioButton()) {
                        formData.put(fieldName, field.getValueAsString());
                    }
                } else if (field.getFormType() == PdfName.Ch) {
                    // 选择字段(下拉列表或列表框)
                    formData.put(fieldName, field.getValueAsString());
                }
            }
        }
        
        pdf.close();
        return formData;
    }
    
    /**
     * 填充PDF表单
     */
    public byte[] fillPdfForm(byte[] pdfTemplate, Map<String, Object> formData) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfTemplate));
        PdfWriter writer = new PdfWriter(baos);
        PdfDocument pdf = new PdfDocument(reader, writer);
        PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
        
        // 设置表单为不可更改
        form.setNeedAppearances(false);
        
        // 填充表单字段
        for (Map.Entry<String, Object> entry : formData.entrySet()) {
            String fieldName = entry.getKey();
            Object value = entry.getValue();
            
            if (form.getField(fieldName) != null) {
                if (value instanceof String) {
                    form.getField(fieldName).setValue((String) value);
                } else if (value instanceof Boolean) {
                    boolean checked = (Boolean) value;
                    form.getField(fieldName).setValue(checked ? "Yes" : "Off");
                }
            }
        }
        
        // 设置所有字段为只读
        form.flattenFields();
        
        pdf.close();
        return baos.toByteArray();
    }
}

通过上述示例代码,我们演示了如何实现PDF文档的安全性控制(加密与权限控制)、添加水印、数字签名以及创建和处理PDF表单等高级功能。这些功能可根据实际应用需求进行组合和定制,以满足不同的业务场景需求。

8. PDF处理最佳实践

在SpringBoot应用中处理PDF文件,遵循一些最佳实践可以让您的应用程序更加高效、安全和易于维护。

8.1 性能优化

处理PDF文件,特别是大型PDF文件时,性能是一个重要的考虑因素。

8.1.1 内存管理
@Service
public class PdfMemoryOptimizationService {

    /**
     * 高效处理大型PDF文件
     */
    public void processLargePdf(String inputPath, String outputPath) throws IOException {
        // 使用RandomAccessFile而不是将整个文件加载到内存中
        RandomAccessFile raf = new RandomAccessFile(new File(inputPath), "r");
        FileChannel channel = raf.getChannel();
        
        // 使用内存映射文件来高效访问大文件
        ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
        
        // 处理PDF文件...
        PDFParser parser = new PDFParser(new RandomAccessBufferedFileInputStream(new ByteBufferInputStream(buf)));
        parser.parse();
        PDDocument document = parser.getPDDocument();
        
        // 按页处理,而不是一次加载所有页面
        int pageCount = document.getNumberOfPages();
        for (int i = 0; i < pageCount; i++) {
            PDPage page = document.getPage(i);
            // 处理每一页...
            processPage(page);
            
            // 处理完后清理页面资源,释放内存
            page.clear();
            
            // 定期调用垃圾回收(生产环境一般不推荐,这里仅作示例)
            if (i % 100 == 0) {
                System.gc();
            }
        }
        
        // 保存处理后的文档
        document.save(outputPath);
        document.close();
        channel.close();
        raf.close();
    }
    
    private void processPage(PDPage page) {
        // 页面处理逻辑...
    }
    
    /**
     * 使用PDFBox内存设置优化
     */
    public void configureMemorySettings() {
        // 设置最大主内存缓存大小(字节)
        System.setProperty("org.apache.pdfbox.maxMemory", String.valueOf(50 * 1024 * 1024)); // 50MB
        
        // 设置临时文件目录
        System.setProperty("java.io.tmpdir", "/path/to/temp/directory");
        
        // 禁用字体缓存(减少内存使用,但可能影响性能)
        System.setProperty("org.apache.pdfbox.fontcache.disablenew", "true");
    }
}
8.1.2 并行处理
@Service
public class PdfParallelProcessingService {

    private final ExecutorService executor = Executors.newFixedThreadPool(
            Runtime.getRuntime().availableProcessors());
    
    /**
     * 并行处理多个PDF文件
     */
    public List<CompletableFuture<ProcessingResult>> processPdfFilesInParallel(List<String> filePaths) {
        return filePaths.stream()
                .map(path -> CompletableFuture.supplyAsync(() -> {
                    try {
                        // 处理单个PDF文件
                        return processSinglePdf(path);
                    } catch (Exception e) {
                        throw new CompletionException(e);
                    }
                }, executor))
                .collect(Collectors.toList());
    }
    
    /**
     * 并行处理单个PDF的多个页面
     */
    public ProcessingResult processMultiPagePdfInParallel(String pdfPath) throws IOException {
        PDDocument document = PDDocument.load(new File(pdfPath));
        int pageCount = document.getNumberOfPages();
        
        List<CompletableFuture<PageResult>> futures = new ArrayList<>();
        
        // 并行处理每一页
        for (int i = 0; i < pageCount; i++) {
            final int pageNum = i;
            futures.add(CompletableFuture.supplyAsync(() -> {
                try {
                    PDPage page = document.getPage(pageNum);
                    return processPageContent(page, pageNum);
                } catch (Exception e) {
                    throw new CompletionException(e);
                }
            }, executor));
        }
        
        // 等待所有页面处理完成
        List<PageResult> results = futures.stream()
                .map(CompletableFuture::join)
                .collect(Collectors.toList());
        
        document.close();
        
        return new ProcessingResult(pdfPath, results);
    }
    
    private ProcessingResult processSinglePdf(String path) throws IOException {
        // 单个PDF文件处理逻辑
        // ...
        return new ProcessingResult(path, new ArrayList<>());
    }
    
    private PageResult processPageContent(PDPage page, int pageNum) throws IOException {
        // 单页处理逻辑
        // ...
        return new PageResult(pageNum, "Processed");
    }
    
    // 结果类
    @Data
    @AllArgsConstructor
    public static class ProcessingResult {
        private String filePath;
        private List<PageResult> pageResults;
    }
    
    @Data
    @AllArgsConstructor
    public static class PageResult {
        private int pageNumber;
        private String result;
    }
}

8.2 异常处理与日志

良好的异常处理和日志记录对于排查PDF处理问题至关重要。

@Service
@Slf4j  // 使用Lombok的日志注解
public class PdfProcessingService {

    /**
     * 处理PDF文件,包含完善的异常处理和日志
     */
    public ProcessingResult processPdf(String inputPath) {
        log.info("开始处理PDF文件: {}", inputPath);
        
        PDDocument document = null;
        ProcessingResult result = new ProcessingResult();
        result.setFilePath(inputPath);
        
        try {
            // 验证文件存在
            File file = new File(inputPath);
            if (!file.exists() || !file.isFile()) {
                throw new FileNotFoundException("找不到PDF文件: " + inputPath);
            }
            
            log.debug("文件验证通过,开始加载PDF");
            
            // 加载文档
            try {
                document = PDDocument.load(file);
            } catch (InvalidPasswordException e) {
                log.error("PDF文件受密码保护: {}", inputPath, e);
                result.setStatus(ProcessingStatus.PASSWORD_PROTECTED);
                return result;
            } catch (IOException e) {
                log.error("无法加载PDF文件: {}", inputPath, e);
                result.setStatus(ProcessingStatus.LOAD_ERROR);
                result.setErrorMessage("无法加载PDF文件: " + e.getMessage());
                return result;
            }
            
            // 检查是否为空文档
            if (document.getNumberOfPages() <= 0) {
                log.warn("PDF文件不包含任何页面: {}", inputPath);
                result.setStatus(ProcessingStatus.EMPTY_DOCUMENT);
                return result;
            }
            
            log.info("成功加载PDF,共{}页", document.getNumberOfPages());
            
            // 处理文档内容
            try {
                processDocumentContent(document, result);
                result.setStatus(ProcessingStatus.SUCCESS);
                log.info("PDF文件处理成功: {}", inputPath);
            } catch (Exception e) {
                log.error("处理PDF内容时出错: {}", inputPath, e);
                result.setStatus(ProcessingStatus.PROCESSING_ERROR);
                result.setErrorMessage("处理内容出错: " + e.getMessage());
            }
            
        } catch (Exception e) {
            log.error("处理PDF时发生未预期的异常: {}", inputPath, e);
            result.setStatus(ProcessingStatus.UNEXPECTED_ERROR);
            result.setErrorMessage("未预期的错误: " + e.getMessage());
        } finally {
            // 确保资源释放
            if (document != null) {
                try {
                    document.close();
                    log.debug("PDF文档已关闭");
                } catch (IOException e) {
                    log.warn("关闭PDF文档时出错", e);
                }
            }
        }
        
        return result;
    }
    
    private void processDocumentContent(PDDocument document, ProcessingResult result) {
        // 文档处理逻辑...
    }
    
    // 处理结果类
    @Data
    public static class ProcessingResult {
        private String filePath;
        private ProcessingStatus status;
        private String errorMessage;
        private Map<String, Object> extractedData = new HashMap<>();
    }
    
    // 处理状态枚举
    public enum ProcessingStatus {
        SUCCESS, 
        PASSWORD_PROTECTED, 
        LOAD_ERROR, 
        EMPTY_DOCUMENT, 
        PROCESSING_ERROR, 
        UNEXPECTED_ERROR
    }
}

8.3 安全性建议

8.3.1 文件上传安全性
@Service
public class PdfSecurityService {

    // 允许的最大PDF文件大小
    private static final long MAX_FILE_SIZE = 10 * 1024 * 1024; // 10MB
    
    // 文件类型验证
    public boolean isValidPdfFile(MultipartFile file) {
        // 检查文件大小
        if (file.getSize() > MAX_FILE_SIZE) {
            throw new FileValidationException("文件大小超过限制");
        }
        
        // 检查内容类型
        String contentType = file.getContentType();
        if (contentType == null || !contentType.equals("application/pdf")) {
            throw new FileValidationException("文件类型必须是PDF");
        }
        
        // 检查文件扩展名
        String originalFilename = file.getOriginalFilename();
        if (originalFilename == null || !originalFilename.toLowerCase().endsWith(".pdf")) {
            throw new FileValidationException("文件必须是.pdf格式");
        }
        
        // 检查文件内容(魔术字节)
        try (InputStream is = file.getInputStream()) {
            byte[] header = new byte[5];
            int bytesRead = is.read(header);
            if (bytesRead < 5 || !new String(header).equals("%PDF-")) {
                throw new FileValidationException("无效的PDF文件内容");
            }
        } catch (IOException e) {
            throw new FileValidationException("无法验证文件内容");
        }
        
        return true;
    }
    
    // 安全处理上传的PDF
    public File securelyProcessUploadedPdf(MultipartFile file) throws IOException {
        // 验证文件
        isValidPdfFile(file);
        
        // 创建一个临时文件
        File tempFile = File.createTempFile("secure-pdf-", ".pdf");
        
        try (FileOutputStream fos = new FileOutputStream(tempFile)) {
            // 将上传的文件内容写入临时文件
            fos.write(file.getBytes());
        }
        
        // 扫描文件是否包含恶意内容
        scanForMaliciousContent(tempFile);
        
        return tempFile;
    }
    
    // 扫描恶意内容
    private void scanForMaliciousContent(File pdfFile) throws IOException {
        try (PDDocument document = PDDocument.load(pdfFile)) {
            // 检查JavaScript
            if (hasJavaScript(document)) {
                throw new SecurityException("PDF包含可能不安全的JavaScript");
            }
            
            // 检查外部链接
            if (hasExternalLinks(document)) {
                // 可以选择警告而不是阻止
                log.warn("PDF包含外部链接: {}", pdfFile.getName());
            }
            
            // 检查嵌入式文件
            if (hasEmbeddedFiles(document)) {
                throw new SecurityException("PDF包含嵌入式文件,可能存在安全风险");
            }
            
            // 更多安全检查...
        }
    }
    
    // 检查JavaScript
    private boolean hasJavaScript(PDDocument document) {
        PDDocumentCatalog catalog = document.getDocumentCatalog();
        PDAcroForm acroForm = catalog.getAcroForm();
        if (acroForm != null) {
            // 检查表单中的JavaScript
            // ...
            return false; // 假设实现
        }
        return false;
    }
    
    // 检查外部链接
    private boolean hasExternalLinks(PDDocument document) {
        // 遍历页面和注释,检查外部URL
        // ...
        return false; // 假设实现
    }
    
    // 检查嵌入式文件
    private boolean hasEmbeddedFiles(PDDocument document) {
        PDDocumentNameDictionary names = document.getDocumentCatalog().getNames();
        if (names != null) {
            PDEmbeddedFilesNameTreeNode embeddedFiles = names.getEmbeddedFiles();
            return embeddedFiles != null && !embeddedFiles.getNames().isEmpty();
        }
        return false;
    }
    
    // 自定义验证异常
    public static class FileValidationException extends RuntimeException {
        public FileValidationException(String message) {
            super(message);
        }
    }
}
8.3.2 敏感信息保护
@Service
public class PdfDataProtectionService {

    // 添加敏感信息水印
    public byte[] addConfidentialWatermark(byte[] pdfData) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        PdfStamper stamper = new PdfStamper(reader, baos);
        
        int pageCount = reader.getNumberOfPages();
        BaseFont baseFont = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
        
        for (int i = 1; i <= pageCount; i++) {
            PdfContentByte content = stamper.getUnderContent(i);
            content.beginText();
            content.setFontAndSize(baseFont, 60);
            content.setColorFill(BaseColor.LIGHT_GRAY);
            content.setTextMatrix(30, 30);
            content.showTextAligned(Element.ALIGN_CENTER, "机密文件 - 请勿传播", 
                                  reader.getPageSize(i).getWidth()/2, 
                                  reader.getPageSize(i).getHeight()/2, 45);
            content.endText();
        }
        
        stamper.close();
        reader.close();
        
        return baos.toByteArray();
    }
    
    // 文档脱敏
    public byte[] redactSensitiveInformation(byte[] pdfData, List<String> patternsToRedact) 
            throws IOException {
        // 注意:真正的PDF编辑和脱敏需要更复杂的处理
        // 这里仅做示例
        
        // 1. 提取文本
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        int pageCount = reader.getNumberOfPages();
        
        // 2. 遍历每页,查找并标记匹配的模式
        List<PDFRedactionInfo> redactions = new ArrayList<>();
        
        for (int i = 1; i <= pageCount; i++) {
            String pageText = PdfTextExtractor.getTextFromPage(reader, i);
            
            for (String pattern : patternsToRedact) {
                Pattern regex = Pattern.compile(pattern);
                Matcher matcher = regex.matcher(pageText);
                
                while (matcher.find()) {
                    // 这里需要实际位置信息,简化版只记录页码
                    redactions.add(new PDFRedactionInfo(i, matcher.start(), matcher.end()));
                }
            }
        }
        reader.close();
        
        // 3. 应用脱敏
        // 实际实现需要使用PDFBox的PDFRedactor或iText的PdfCleanUp
        // 下面只是概念演示
        
        // 创建脱敏后的PDF
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        // ... 复杂的脱敏处理 ...
        
        return baos.toByteArray();
    }
    
    // 保存敏感PDF时加密
    public byte[] encryptForStorage(byte[] pdfData) throws DocumentException, IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        
        PdfReader reader = new PdfReader(new ByteArrayInputStream(pdfData));
        PdfStamper stamper = new PdfStamper(reader, baos);
        
        // 生成随机密码
        String password = generateSecureRandomPassword(16);
        
        // 应用强加密,仅允许打开和打印
        stamper.setEncryption(password.getBytes(), 
                            password.getBytes(), 
                            PdfWriter.ALLOW_PRINTING, 
                            PdfWriter.ENCRYPTION_AES_256);
        
        stamper.close();
        reader.close();
        
        // 注意:在实际应用中,需要安全地存储密码
        storePasswordSecurely(password);
        
        return baos.toByteArray();
    }
    
    private String generateSecureRandomPassword(int length) {
        SecureRandom random = new SecureRandom();
        String chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789!@#$%^&*()";
        
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < length; i++) {
            int randomIndex = random.nextInt(chars.length());
            sb.append(chars.charAt(randomIndex));
        }
        
        return sb.toString();
    }
    
    private void storePasswordSecurely(String password) {
        // 实际应用中,应使用安全的密钥管理系统
        // 例如:HashiCorp Vault, AWS KMS等
    }
    
    // 脱敏信息记录类
    @Data
    @AllArgsConstructor
    private static class PDFRedactionInfo {
        private int pageNumber;
        private int startPosition;
        private int endPosition;
    }
}

8.4 测试PDF处理功能

编写全面的测试对于确保PDF处理功能的正确性和可靠性至关重要。

@SpringBootTest
public class PdfGenerationServiceTest {

    @Autowired
    private PdfGenerationService pdfService;
    
    @TempDir
    Path tempDir;
    
    @Test
    public void testGenerateSimplePdf() throws Exception {
        // 安排
        String outputPath = tempDir.resolve("test-output.pdf").toString();
        
        // 执行
        pdfService.generateSimplePdf(outputPath);
        
        // 断言
        File outputFile = new File(outputPath);
        assertTrue(outputFile.exists(), "生成的PDF文件应该存在");
        assertTrue(outputFile.length() > 0, "PDF文件不应为空");
        
        // 验证PDF内容
        PDDocument document = PDDocument.load(outputFile);
        assertEquals(1, document.getNumberOfPages(), "PDF应该有1页");
        
        // 验证文本内容
        PDFTextStripper stripper = new PDFTextStripper();
        String text = stripper.getText(document);
        assertTrue(text.contains("Hello World"), "PDF应包含预期文本");
        
        document.close();
    }
    
    @Test
    public void testGeneratePdfWithTable() throws Exception {
        // 安排
        String outputPath = tempDir.resolve("table-output.pdf").toString();
        List<UserDto> users = Arrays.asList(
            new UserDto(1L, "张三", "admin"),
            new UserDto(2L, "李四", "user"),
            new UserDto(3L, "王五", "editor")
        );
        
        // 执行
        pdfService.generatePdfWithTable(outputPath, users);
        
        // 断言
        File outputFile = new File(outputPath);
        assertTrue(outputFile.exists());
        
        // 验证内容(表格验证比较复杂,这里只做基本检查)
        PDDocument document = PDDocument.load(outputFile);
        assertTrue(document.getNumberOfPages() > 0);
        
        // 验证文本是否包含用户名
        PDFTextStripper stripper = new PDFTextStripper();
        String text = stripper.getText(document);
        assertTrue(text.contains("张三"));
        assertTrue(text.contains("李四"));
        assertTrue(text.contains("王五"));
        
        document.close();
    }
    
    @Test
    public void testPdfGeneration_withInvalidInput_shouldThrowException() {
        // 安排
        String outputPath = tempDir.resolve("invalid-output.pdf").toString();
        
        // 断言
        assertThrows(IllegalArgumentException.class, () -> {
            // 执行
            pdfService.generatePdfWithInvalidInput(outputPath);
        });
    }
}

8.5 部署最佳实践

@Configuration
public class PdfServiceConfig {

    @Bean
    public PdfGenerationService pdfGenerationService(
            @Value("${pdf.fonts.directory:/app/fonts}") String fontsDirectory,
            @Value("${pdf.output.directory:/app/output}") String outputDirectory) {
        
        // 验证目录存在
        File fontsDir = new File(fontsDirectory);
        if (!fontsDir.exists()) {
            fontsDir.mkdirs();
        }
        
        File outputDir = new File(outputDirectory);
        if (!outputDir.exists()) {
            outputDir.mkdirs();
        }
        
        // 返回配置好的服务
        return new PdfGenerationService(fontsDirectory, outputDirectory);
    }
    
    @Bean
    public PdfProcessingTaskExecutor pdfTaskExecutor(
            @Value("${pdf.processing.thread-pool-size:4}") int threadPoolSize,
            @Value("${pdf.processing.queue-capacity:100}") int queueCapacity) {
        
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(threadPoolSize);
        executor.setMaxPoolSize(threadPoolSize * 2);
        executor.setQueueCapacity(queueCapacity);
        executor.setThreadNamePrefix("pdf-proc-");
        executor.initialize();
        
        return new PdfProcessingTaskExecutor(executor);
    }
    
    // PDF处理健康检查
    @Bean
    public HealthIndicator pdfServiceHealthIndicator(PdfGenerationService pdfService) {
        return () -> {
            try {
                // 尝试生成一个简单的PDF来验证服务是否正常
                ByteArrayOutputStream baos = new ByteArrayOutputStream();
                pdfService.generateTestPdf(baos);
                
                if (baos.size() > 0) {
                    return Health.up().build();
                } else {
                    return Health.down()
                            .withDetail("reason", "PDF生成输出为空")
                            .build();
                }
            } catch (Exception e) {
                return Health.down()
                        .withDetail("reason", "PDF生成失败")
                        .withDetail("error", e.getMessage())
                        .build();
            }
        };
    }
}

8.6 错误处理与恢复机制

@Service
@Slf4j
public class PdfProcessingErrorHandler {

    /**
     * 处理PDF文件,包含重试机制
     */
    @Retryable(
        value = {IOException.class, TemporaryPdfProcessingException.class},
        maxAttempts = 3,
        backoff = @Backoff(delay = 1000, multiplier = 2)
    )
    public ProcessingResult processPdfWithRetry(String pdfPath) throws IOException {
        log.info("尝试处理PDF文件: {}", pdfPath);
        
        // PDF处理逻辑,可能抛出异常
        return doProcessPdf(pdfPath);
    }
    
    /**
     * 重试失败后的恢复处理
     */
    @Recover
    public ProcessingResult recoverFromFailure(Exception e, String pdfPath) {
        log.error("处理PDF文件失败,无法恢复: {}", pdfPath, e);
        
        // 创建失败结果
        ProcessingResult result = new ProcessingResult();
        result.setStatus(ProcessingStatus.FAILED_WITH_RECOVERY);
        result.setMessage("处理失败: " + e.getMessage());
        
        // 记录故障
        recordFailure(pdfPath, e);
        
        // 发送警报
        sendAlert(pdfPath, e);
        
        return result;
    }
    
    /**
     * 尝试修复损坏的PDF
     */
    public byte[] attemptToRepairCorruptedPdf(byte[] corruptedPdfData) {
        ByteArrayOutputStream repairedOutput = new ByteArrayOutputStream();
        
        try {
            // 使用PDFBox的修复功能尝试恢复
            PDFParser parser = new PDFParser(new RandomAccessBufferedFileInputStream(
                                             new ByteArrayInputStream(corruptedPdfData)));
            parser.setLenient(true); // 宽容模式
            parser.parse();
            
            PDDocument document = parser.getPDDocument();
            
            // 添加空白页(如果文档为空)
            if (document.getNumberOfPages() == 0) {
                document.addPage(new PDPage());
            }
            
            // 保存修复后的文档
            document.save(repairedOutput);
            document.close();
            
            log.info("成功修复损坏的PDF文件");
            return repairedOutput.toByteArray();
            
        } catch (Exception e) {
            log.error("无法修复损坏的PDF", e);
            // 如果仍然无法修复,返回null或抛出异常
            return null;
        }
    }
    
    /**
     * 记录文件处理故障
     */
    private void recordFailure(String pdfPath, Exception e) {
        // 将故障信息记录到数据库或日志系统
        PdfProcessingFailure failure = new PdfProcessingFailure();
        failure.setFilePath(pdfPath);
        failure.setTimestamp(new Date());
        failure.setErrorMessage(e.getMessage());
        failure.setStackTrace(ExceptionUtils.getStackTrace(e));
        
        // 保存到数据库
        // pdfFailureRepository.save(failure);
    }
    
    /**
     * 发送警报
     */
    private void sendAlert(String pdfPath, Exception e) {
        // 当处理重要文件失败时发送警报
        // alertService.sendAlert("PDF处理失败", "文件 " + pdfPath + " 处理失败: " + e.getMessage());
    }
    
    /**
     * 实际的PDF处理逻辑
     */
    private ProcessingResult doProcessPdf(String pdfPath) throws IOException {
        // 实现PDF处理逻辑...
        return new ProcessingResult();
    }
    
    // 临时处理异常(可重试)
    public static class TemporaryPdfProcessingException extends RuntimeException {
        public TemporaryPdfProcessingException(String message) {
            super(message);
        }
    }
    
    // 处理结果类
    @Data
    public static class ProcessingResult {
        private ProcessingStatus status;
        private String message;
        // 其他字段...
    }
    
    // 处理状态枚举
    public enum ProcessingStatus {
        SUCCESS, FAILED, FAILED_WITH_RECOVERY
    }
    
    // 故障记录实体
    @Data
    public static class PdfProcessingFailure {
        private String filePath;
        private Date timestamp;
        private String errorMessage;
        private String stackTrace;
    }
}

9. 常见问题与解决方案

在使用SpringBoot处理PDF文件时,开发人员经常会遇到各种问题。本节整理了最常见的问题及其解决方案,帮助您快速解决开发中遇到的困难。

9.1 乱码与字体问题

中文或特殊字符显示为乱码是PDF处理中最常见的问题之一。

问题:PDF中中文显示为方框或乱码

原因:默认情况下,很多PDF库使用的标准字体不支持中文字符。

解决方案

// 使用iText解决中文显示问题
public byte[] generatePdfWithChineseText() throws IOException, DocumentException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    
    // 创建文档
    Document document = new Document();
    PdfWriter.getInstance(document, baos);
    document.open();
    
    // 方法1:使用中文字体(需要字体文件)
    BaseFont baseFont = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED);
    Font chineseFont = new Font(baseFont, 12, Font.NORMAL);
    
    document.add(new Paragraph("这是中文内容", chineseFont));
    
    // 方法2:使用嵌入字体(会增加文件大小)
    String fontPath = "path/to/fonts/msyh.ttf"; // 微软雅黑字体
    BaseFont customFont = BaseFont.createFont(fontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
    Font embeddedFont = new Font(customFont, 12, Font.NORMAL);
    
    document.add(new Paragraph("这是使用嵌入字体的中文", embeddedFont));
    
    document.close();
    return baos.toByteArray();
}

// 使用PDFBox解决中文显示问题
public void addChineseTextWithPdfBox(String pdfPath) throws IOException {
    // 创建文档
    PDDocument document = new PDDocument();
    PDPage page = new PDPage();
    document.addPage(page);
    
    // 加载中文字体
    PDType0Font font = PDType0Font.load(document, new File("path/to/fonts/msyh.ttf"));
    
    // 创建内容流
    PDPageContentStream contentStream = new PDPageContentStream(document, page);
    
    // 设置字体
    contentStream.beginText();
    contentStream.setFont(font, 12);
    contentStream.newLineAtOffset(25, 700);
    contentStream.showText("这是PDFBox生成的中文内容");
    contentStream.endText();
    
    contentStream.close();
    document.save(pdfPath);
    document.close();
}

最佳实践

  1. 在应用中包含常用的中文字体
  2. 使用字体子集嵌入减小文件大小
  3. 创建字体工厂类管理和复用字体实例

9.2 图像处理问题

问题:图片在PDF中模糊或变形

原因:图片DPI设置不正确,或缩放比例不当。

解决方案

public void addHighQualityImage(Document document, String imagePath) throws IOException, DocumentException {
    // 加载图片
    Image image = Image.getInstance(imagePath);
    
    // 设置适当的DPI
    image.setDpi(300, 300);
    
    // 保持原始宽高比
    float width = document.getPageSize().getWidth() - 80; // 左右各40点边距
    float aspectRatio = image.getWidth() / image.getHeight();
    float height = width / aspectRatio;
    
    // 限制高度不超过页面高度的2/3
    float maxHeight = document.getPageSize().getHeight() * 2/3;
    if (height > maxHeight) {
        height = maxHeight;
        width = height * aspectRatio;
    }
    
    // 设置大小并保持比例
    image.scaleToFit(width, height);
    
    // 居中显示
    image.setAlignment(Image.MIDDLE);
    
    document.add(image);
}
问题:PDF文件大小过大

原因:图片未压缩或使用了无损格式。

解决方案

public byte[] compressImage(byte[] imageData, String format) throws IOException {
    ByteArrayInputStream bais = new ByteArrayInputStream(imageData);
    BufferedImage image = ImageIO.read(bais);
    
    // 创建输出流
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    
    // 对于JPEG格式,设置压缩质量
    if ("jpg".equalsIgnoreCase(format) || "jpeg".equalsIgnoreCase(format)) {
        Iterator<ImageWriter> writers = ImageIO.getImageWritersByFormatName(format);
        if (writers.hasNext()) {
            ImageWriter writer = writers.next();
            ImageWriteParam param = writer.getDefaultWriteParam();
            
            param.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
            param.setCompressionQuality(0.7f); // 70%质量,调整以平衡大小和质量
            
            ImageOutputStream ios = ImageIO.createImageOutputStream(baos);
            writer.setOutput(ios);
            writer.write(null, new IIOImage(image, null, null), param);
            ios.close();
            writer.dispose();
        }
    } else {
        // 对于其他格式,使用默认压缩
        ImageIO.write(image, format, baos);
    }
    
    return baos.toByteArray();
}

9.3 表单与交互性问题

问题:填充PDF表单后字段值不显示

原因:表单需要重新计算外观,或者字体不兼容。

解决方案

public byte[] fillPdfForm(byte[] templateBytes, Map<String, String> formData) throws IOException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    
    // 打开PDF模板
    PdfReader reader = new PdfReader(new ByteArrayInputStream(templateBytes));
    PdfStamper stamper = new PdfStamper(reader, baos);
    
    // 获取表单
    AcroFields form = stamper.getAcroFields();
    
    // 设置表单需要重新计算外观
    stamper.setFormFlattening(true);
    form.setGenerateAppearances(true);
    
    // 添加中文字体支持
    BaseFont bf = BaseFont.createFont("STSong-Light", "UniGB-UCS2-H", BaseFont.NOT_EMBEDDED);
    form.addSubstitutionFont(bf);
    
    // 填充表单字段
    for (Map.Entry<String, String> entry : formData.entrySet()) {
        form.setField(entry.getKey(), entry.getValue());
    }
    
    // 关闭文档
    stamper.close();
    reader.close();
    
    return baos.toByteArray();
}
问题:无法在PDF中添加交互式元素

解决方案

public byte[] createInteractivePdf() throws IOException, DocumentException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    
    // 创建文档
    Document document = new Document();
    PdfWriter writer = PdfWriter.getInstance(document, baos);
    document.open();
    
    // 添加正文内容
    document.add(new Paragraph("这是一个包含交互元素的PDF"));
    document.add(new Paragraph("请点击下方链接或按钮:"));
    document.add(new Paragraph(" "));
    
    // 添加超链接
    Anchor anchor = new Anchor("点击访问官网");
    anchor.setReference("https://www.example.com");
    document.add(anchor);
    document.add(new Paragraph(" "));
    
    // 创建一个按钮
    Rectangle rect = new Rectangle(100, 100, 200, 130);
    PushbuttonField button = new PushbuttonField(writer, rect, "submitButton");
    button.setText("提交表单");
    button.setBackgroundColor(new BaseColor(0, 122, 204));
    button.setTextColor(BaseColor.WHITE);
    button.setVisibility(PushbuttonField.VISIBLE);
    
    // 为按钮添加JavaScript动作
    button.setAction(PdfAction.javaScript(
        "app.alert('按钮被点击了!');", writer));
    
    // 添加按钮到文档
    PdfFormField field = button.getField();
    writer.addAnnotation(field);
    
    document.close();
    return baos.toByteArray();
}

9.4 性能与内存问题

问题:处理大型PDF文件时内存溢出(OutOfMemoryError)

原因:整个PDF文件被一次性加载到内存中。

解决方案

public void processLargePdfMemoryEfficient(String inputPath, String outputPath) throws IOException {
    // 1. 使用随机访问文件而不是将整个文件读入内存
    RandomAccessFile raf = new RandomAccessFile(new File(inputPath), "r");
    RandomAccessBufferedFileInputStream input = new RandomAccessBufferedFileInputStream(raf);
    
    // 2. 解析PDF文件
    PDFParser parser = new PDFParser(input);
    parser.parse();
    PDDocument document = parser.getPDDocument();
    
    // 3. 按页处理,而不是一次处理所有页面
    PDFRenderer renderer = new PDFRenderer(document);
    PDPageTree pages = document.getPages();
    
    // 创建输出文档
    PDDocument outputDocument = new PDDocument();
    
    // 4. 逐页处理,释放资源
    for (int i = 0; i < pages.getCount(); i++) {
        // 处理当前页
        PDPage page = pages.get(i);
        
        // 执行页面处理
        // 例如:提取文本、修改内容等
        
        // 添加处理后的页面到新文档
        PDPage newPage = new PDPage(page.getMediaBox());
        outputDocument.addPage(newPage);
        
        // 复制内容(简化示例)
        PDPageContentStream contentStream = new PDPageContentStream(
                outputDocument, newPage, PDPageContentStream.AppendMode.OVERWRITE, true);
        
        // ...处理和复制内容...
        
        contentStream.close();
        
        // 定期清理,每处理10页执行一次垃圾回收
        if (i % 10 == 0) {
            System.gc();
        }
    }
    
    // 5. 保存并关闭文档
    outputDocument.save(outputPath);
    outputDocument.close();
    document.close();
    input.close();
    raf.close();
}
问题:PDF处理速度慢

解决方案

// 多线程并行处理PDF
public void parallelPdfProcessing(List<String> pdfPaths) {
    int parallelism = Math.min(Runtime.getRuntime().availableProcessors(), pdfPaths.size());
    ExecutorService executor = Executors.newFixedThreadPool(parallelism);
    
    try {
        // 创建任务列表
        List<Future<ProcessingResult>> futures = new ArrayList<>();
        
        for (String path : pdfPaths) {
            futures.add(executor.submit(() -> processSinglePdf(path)));
        }
        
        // 收集结果
        for (Future<ProcessingResult> future : futures) {
            try {
                ProcessingResult result = future.get();
                // 处理结果...
                System.out.println("处理完成: " + result.getFilePath());
            } catch (Exception e) {
                // 处理异常...
                e.printStackTrace();
            }
        }
    } finally {
        executor.shutdown();
    }
}

private ProcessingResult processSinglePdf(String path) {
    // 单个PDF处理逻辑
    // ...
    return new ProcessingResult(path, true);
}

@Data
@AllArgsConstructor
private static class ProcessingResult {
    private String filePath;
    private boolean success;
}

9.5 安全问题

问题:如何防止PDF注入攻击

解决方案

public void securePdfGeneration(String content, String outputPath) {
    // 1. 内容验证和清理
    content = cleanContent(content);
    
    try {
        // 2. 创建PDF
        Document document = new Document();
        PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputPath));
        
        // 3. 禁用JavaScript
        writer.setEncryption(null, null, 
                        PdfWriter.ALLOW_PRINTING | PdfWriter.ALLOW_COPY, 
                        PdfWriter.ENCRYPTION_AES_128);
        
        document.open();
        document.add(new Paragraph(content));
        document.close();
        
    } catch (Exception e) {
        throw new SecurityException("PDF生成失败", e);
    }
}

private String cleanContent(String content) {
    // 清理潜在的恶意内容
    // 例如:移除JavaScript代码、限制特殊字符等
    
    // 简单示例:移除<script>标签
    content = content.replaceAll("(?i)<script.*?>.*?</script>", "");
    
    // 移除可能的PDF注入命令 (例如: %PDF-, startxref, xref, trailer等)
    content = content.replaceAll("(?i)(%PDF-|startxref|xref|trailer)", "");
    
    // 应用更复杂的内容清理逻辑...
    
    return content;
}
问题:如何安全地处理上传的PDF文件

解决方案

@Service
public class SecurePdfUploadService {

    public boolean validateAndProcessPdfUpload(MultipartFile file) throws IOException {
        // 1. 验证MIME类型
        if (!file.getContentType().equals("application/pdf")) {
            throw new SecurityException("只接受PDF文件");
        }
        
        // 2. 验证文件扩展名
        String filename = file.getOriginalFilename();
        if (filename == null || !filename.toLowerCase().endsWith(".pdf")) {
            throw new SecurityException("文件必须是PDF格式");
        }
        
        // 3. 检查文件大小
        if (file.getSize() > 10 * 1024 * 1024) { // 10MB
            throw new SecurityException("PDF文件大小不能超过10MB");
        }
        
        // 4. 验证PDF文件头
        byte[] content = file.getBytes();
        if (content.length < 5 || !isPdfHeader(content)) {
            throw new SecurityException("无效的PDF文件格式");
        }
        
        // 5. 扫描PDF内容是否安全
        if (!scanPdfForThreats(content)) {
            throw new SecurityException("PDF文件可能包含恶意内容");
        }
        
        // 6. 处理文件内容
        return processPdfContent(content);
    }
    
    private boolean isPdfHeader(byte[] content) {
        // 检查PDF文件头 (%PDF-)
        String header = new String(Arrays.copyOf(content, 5));
        return header.equals("%PDF-");
    }
    
    private boolean scanPdfForThreats(byte[] content) {
        try {
            PDDocument document = PDDocument.load(new ByteArrayInputStream(content));
            
            // 检查是否包含JavaScript
            boolean hasJavaScript = checkForJavaScript(document);
            
            // 检查是否包含外部链接
            boolean hasExternalLinks = checkForExternalLinks(document);
            
            // 检查是否包含嵌入式文件
            boolean hasEmbeddedFiles = checkForEmbeddedFiles(document);
            
            document.close();
            
            // 根据安全策略决定是否安全
            return !hasJavaScript && !hasEmbeddedFiles; // 外部链接可能允许
            
        } catch (Exception e) {
            // 解析失败,可能是损坏或恶意文件
            return false;
        }
    }
    
    private boolean checkForJavaScript(PDDocument document) {
        // 检查文档中的JavaScript代码
        // ...
        return false; // 示例返回
    }
    
    private boolean checkForExternalLinks(PDDocument document) {
        // 检查外部链接
        // ...
        return false; // 示例返回
    }
    
    private boolean checkForEmbeddedFiles(PDDocument document) {
        // 检查嵌入文件
        PDDocumentCatalog catalog = document.getDocumentCatalog();
        PDDocumentNameDictionary names = catalog.getNames();
        if (names != null) {
            PDEmbeddedFilesNameTreeNode embeddedFiles = names.getEmbeddedFiles();
            return embeddedFiles != null && !embeddedFiles.getNames().isEmpty();
        }
        return false;
    }
    
    private boolean processPdfContent(byte[] content) {
        // 安全地处理PDF内容
        // ...
        return true; // 处理成功
    }
}

9.6 布局与分页问题

问题:内容跨页不正确或分页不合理

解决方案

public void createDocumentWithProperPagination(String outputPath) throws IOException, DocumentException {
    Document document = new Document(PageSize.A4, 50, 50, 50, 50);
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(outputPath));
    
    // 添加分页事件监听器
    writer.setPageEvent(new PaginationHandler());
    
    document.open();
    
    // 设置合适的字体和段落间距
    Font normalFont = new Font(Font.FontFamily.TIMES_ROMAN, 12);
    Font headingFont = new Font(Font.FontFamily.TIMES_ROMAN, 16, Font.BOLD);
    
    // 添加标题
    Paragraph title = new Paragraph("文档标题", headingFont);
    title.setAlignment(Element.ALIGN_CENTER);
    title.setSpacingAfter(20);
    document.add(title);
    
    // 添加内容段落
    for (int i = 1; i <= 5; i++) {
        Paragraph heading = new Paragraph("章节 " + i, headingFont);
        heading.setSpacingBefore(20);
        heading.setSpacingAfter(10);
        
        // 确保章节标题不会独自出现在页脚
        heading.setKeepTogether(true);
        
        document.add(heading);
        
        // 添加段落
        for (int j = 1; j <= 3; j++) {
            Paragraph para = new Paragraph("这是第" + i + "章节的第" + j + "个段落。" +
                "这是示例文本,用于展示分页效果。这是示例文本,用于展示分页效果。" +
                "这是示例文本,用于展示分页效果。", normalFont);
            
            para.setAlignment(Element.ALIGN_JUSTIFIED);
            para.setSpacingAfter(10);
            document.add(para);
        }
        
        // 添加表格(确保表格不会被拆分到两页)
        if (i == 3) {
            PdfPTable table = new PdfPTable(3);
            table.setWidthPercentage(100);
            table.setKeepTogether(true); // 保持表格不被分页
            
            // 添加表头
            table.addCell(new PdfPCell(new Phrase("列 1", headingFont)));
            table.addCell(new PdfPCell(new Phrase("列 2", headingFont)));
            table.addCell(new PdfPCell(new Phrase("列 3", headingFont)));
            
            // 添加表格数据
            for (int k = 1; k <= 5; k++) {
                table.addCell("数据 " + k + "-1");
                table.addCell("数据 " + k + "-2");
                table.addCell("数据 " + k + "-3");
            }
            
            table.setSpacingBefore(15);
            table.setSpacingAfter(15);
            document.add(table);
        }
    }
    
    document.close();
}

// 分页处理类
private static class PaginationHandler extends PdfPageEventHelper {
    @Override
    public void onEndPage(PdfWriter writer, Document document) {
        PdfContentByte cb = writer.getDirectContent();
        
        // 添加页码
        String pageText = "第 " + writer.getPageNumber() + " 页";
        
        // 设置字体
        cb.beginText();
        cb.setFontAndSize(BaseFont.createFont(), 10);
        
        // 在页脚居中显示页码
        float x = (document.right() - document.left()) / 2 + document.leftMargin();
        float y = document.bottom() - 20;
        cb.showTextAligned(PdfContentByte.ALIGN_CENTER, pageText, x, y, 0);
        
        cb.endText();
        
        // 如果需要,添加页眉
        String headerText = "文档标题";
        cb.beginText();
        cb.setFontAndSize(BaseFont.createFont(), 10);
        cb.showTextAligned(PdfContentByte.ALIGN_CENTER, headerText, x, document.top() + 10, 0);
        cb.endText();
    }
}

9.7 PDF转换问题

问题:如何将HTML转换为PDF

解决方案

public byte[] convertHtmlToPdf(String htmlContent) throws IOException, DocumentException {
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    
    // 使用iText的XMLWorkerHelper
    Document document = new Document();
    PdfWriter writer = PdfWriter.getInstance(document, baos);
    document.open();
    
    // 转换HTML为PDF
    XMLWorkerHelper.getInstance().parseXHtml(writer, document,
            new ByteArrayInputStream(htmlContent.getBytes(StandardCharsets.UTF_8)));
    
    document.close();
    
    return baos.toByteArray();
}

// 更复杂的HTML转PDF(使用Flying Saucer)
@Service
public class HtmlToPdfService {

    public byte[] convertHtmlToPdfWithCss(String htmlContent, String baseUrl) throws IOException {
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        
        try {
            // 准备HTML内容
            String xHtml = convertToXhtml(htmlContent);
            
            // 创建渲染器
            ITextRenderer renderer = new ITextRenderer();
            
            // 设置字体解析器(支持中文)
            renderer.getFontResolver().addFont("fonts/simsun.ttc", 
                    BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
            
            // 设置基础URL,用于解析相对路径的资源(CSS、图片等)
            if (baseUrl != null) {
                renderer.setDocumentFromString(xHtml, baseUrl);
            } else {
                renderer.setDocumentFromString(xHtml);
            }
            
            // 布局文档
            renderer.layout();
            
            // 渲染PDF
            renderer.createPDF(outputStream);
            
            return outputStream.toByteArray();
            
        } finally {
            outputStream.close();
        }
    }
    
    private String convertToXhtml(String html) {
        // 转换普通HTML为XHTML
        Tidy tidy = new Tidy();
        tidy.setInputEncoding("UTF-8");
        tidy.setOutputEncoding("UTF-8");
        tidy.setXHTML(true);
        
        ByteArrayInputStream inputStream = new ByteArrayInputStream(
                html.getBytes(StandardCharsets.UTF_8));
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        
        tidy.parse(inputStream, outputStream);
        
        return outputStream.toString(StandardCharsets.UTF_8);
    }
}
问题:如何将PDF转换为图片

解决方案

public List<BufferedImage> convertPdfToImages(byte[] pdfData, int dpi) throws IOException {
    List<BufferedImage> images = new ArrayList<>();
    
    // 加载PDF文档
    PDDocument document = PDDocument.load(new ByteArrayInputStream(pdfData));
    
    try {
        // 创建PDF渲染器
        PDFRenderer renderer = new PDFRenderer(document);
        
        // 逐页转换为图片
        for (int i = 0; i < document.getNumberOfPages(); i++) {
            // 渲染图片(RGB模式,指定DPI)
            BufferedImage image = renderer.renderImageWithDPI(i, dpi, ImageType.RGB);
            images.add(image);
        }
    } finally {
        document.close();
    }
    
    return images;
}

// 保存为图片文件
public void savePdfPagesAsImages(byte[] pdfData, String outputDir, String format) 
        throws IOException {
    List<BufferedImage> images = convertPdfToImages(pdfData, 300);
    
    // 确保输出目录存在
    File dir = new File(outputDir);
    if (!dir.exists()) {
        dir.mkdirs();
    }
    
    // 保存每一页为单独的图片文件
    for (int i = 0; i < images.size(); i++) {
        BufferedImage image = images.get(i);
        
        File outputFile = new File(dir, "page_" + (i + 1) + "." + format);
        ImageIO.write(image, format, outputFile);
    }
}

9.8 Spring Boot 集成问题

问题:如何在Spring Boot中优雅地处理PDF生成失败

解决方案

@RestController
@RequestMapping("/api/pdf")
public class PdfController {

    private final PdfService pdfService;
    private final Logger logger = LoggerFactory.getLogger(PdfController.class);
    
    @Autowired
    public PdfController(PdfService pdfService) {
        this.pdfService = pdfService;
    }
    
    @GetMapping("/generate/{id}")
    public ResponseEntity<?> generatePdf(@PathVariable Long id) {
        try {
            // 尝试生成PDF
            byte[] pdfData = pdfService.generatePdf(id);
            
            // 设置响应头
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            String filename = "document-" + id + ".pdf";
            headers.setContentDispositionFormData("attachment", filename);
            
            return new ResponseEntity<>(pdfData, headers, HttpStatus.OK);
            
        } catch (ResourceNotFoundException e) {
            // 资源不存在
            logger.warn("尝试生成不存在的资源PDF: {}", id);
            return ResponseEntity.status(HttpStatus.NOT_FOUND)
                    .body(new ErrorResponse("资源不存在", e.getMessage()));
                    
        } catch (PdfGenerationException e) {
            // PDF生成错误
            logger.error("PDF生成失败: {}", e.getMessage(), e);
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body(new ErrorResponse("PDF生成错误", e.getMessage()));
                    
        } catch (Exception e) {
            // 未预期的错误
            logger.error("处理PDF请求时发生未预期错误", e);
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body(new ErrorResponse("系统错误", "处理请求时发生错误"));
        }
    }
    
    // 自定义异常
    public static class PdfGenerationException extends RuntimeException {
        public PdfGenerationException(String message) {
            super(message);
        }
        
        public PdfGenerationException(String message, Throwable cause) {
            super(message, cause);
        }
    }
    
    // 错误响应DTO
    @Data
    @AllArgsConstructor
    public static class ErrorResponse {
        private String error;
        private String message;
    }
}

// 全局异常处理器
@ControllerAdvice
public class GlobalExceptionHandler {

    private final Logger logger = LoggerFactory.getLogger(GlobalExceptionHandler.class);
    
    @ExceptionHandler(PdfController.PdfGenerationException.class)
    public ResponseEntity<PdfController.ErrorResponse> handlePdfGenerationException(
            PdfController.PdfGenerationException e) {
        
        logger.error("PDF生成异常被全局处理器捕获", e);
        
        return ResponseEntity
                .status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(new PdfController.ErrorResponse("PDF生成错误", e.getMessage()));
    }
}
try {
    // 创建PDF渲染器
    PDFRenderer renderer = new PDFRenderer(document);
    
    // 逐页转换为图片
    for (int i = 0; i < document.getNumberOfPages(); i++) {
        // 渲染图片(RGB模式,指定DPI)
        BufferedImage image = renderer.renderImageWithDPI(i, dpi, ImageType.RGB);
        images.add(image);
    }
} finally {
    document.close();
}

return images;

}

// 保存为图片文件
public void savePdfPagesAsImages(byte[] pdfData, String outputDir, String format)
throws IOException {
List images = convertPdfToImages(pdfData, 300);

// 确保输出目录存在
File dir = new File(outputDir);
if (!dir.exists()) {
    dir.mkdirs();
}

// 保存每一页为单独的图片文件
for (int i = 0; i < images.size(); i++) {
    BufferedImage image = images.get(i);
    
    File outputFile = new File(dir, "page_" + (i + 1) + "." + format);
    ImageIO.write(image, format, outputFile);
}

}


### 9.8 Spring Boot 集成问题

#### 问题:如何在Spring Boot中优雅地处理PDF生成失败

**解决方案**:

```java
@RestController
@RequestMapping("/api/pdf")
public class PdfController {

    private final PdfService pdfService;
    private final Logger logger = LoggerFactory.getLogger(PdfController.class);
    
    @Autowired
    public PdfController(PdfService pdfService) {
        this.pdfService = pdfService;
    }
    
    @GetMapping("/generate/{id}")
    public ResponseEntity<?> generatePdf(@PathVariable Long id) {
        try {
            // 尝试生成PDF
            byte[] pdfData = pdfService.generatePdf(id);
            
            // 设置响应头
            HttpHeaders headers = new HttpHeaders();
            headers.setContentType(MediaType.APPLICATION_PDF);
            String filename = "document-" + id + ".pdf";
            headers.setContentDispositionFormData("attachment", filename);
            
            return new ResponseEntity<>(pdfData, headers, HttpStatus.OK);
            
        } catch (ResourceNotFoundException e) {
            // 资源不存在
            logger.warn("尝试生成不存在的资源PDF: {}", id);
            return ResponseEntity.status(HttpStatus.NOT_FOUND)
                    .body(new ErrorResponse("资源不存在", e.getMessage()));
                    
        } catch (PdfGenerationException e) {
            // PDF生成错误
            logger.error("PDF生成失败: {}", e.getMessage(), e);
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body(new ErrorResponse("PDF生成错误", e.getMessage()));
                    
        } catch (Exception e) {
            // 未预期的错误
            logger.error("处理PDF请求时发生未预期错误", e);
            return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
                    .body(new ErrorResponse("系统错误", "处理请求时发生错误"));
        }
    }
    
    // 自定义异常
    public static class PdfGenerationException extends RuntimeException {
        public PdfGenerationException(String message) {
            super(message);
        }
        
        public PdfGenerationException(String message, Throwable cause) {
            super(message, cause);
        }
    }
    
    // 错误响应DTO
    @Data
    @AllArgsConstructor
    public static class ErrorResponse {
        private String error;
        private String message;
    }
}

// 全局异常处理器
@ControllerAdvice
public class GlobalExceptionHandler {

    private final Logger logger = LoggerFactory.getLogger(GlobalExceptionHandler.class);
    
    @ExceptionHandler(PdfController.PdfGenerationException.class)
    public ResponseEntity<PdfController.ErrorResponse> handlePdfGenerationException(
            PdfController.PdfGenerationException e) {
        
        logger.error("PDF生成异常被全局处理器捕获", e);
        
        return ResponseEntity
                .status(HttpStatus.INTERNAL_SERVER_ERROR)
                .body(new PdfController.ErrorResponse("PDF生成错误", e.getMessage()));
    }
}

通过以上解决方案,您应该能够解决在SpringBoot应用中处理PDF时遇到的大多数常见问题。如果遇到更复杂的情况,可能需要结合多种技术和方法,或者考虑使用专门的PDF处理服务。


网站公告

今日签到

点亮在社区的每一天
去签到