spring ai-openai-vl模型应用qwen-vl\gpt-文字识别-java-EW帮帮网

场景：pdf文档的识别
思想：将pdf文档使用pdfbox工具转换成图片，对每个图片进行一次llm处理，返回所有的文档内容
解决：
1选择具有识别功能的llm-这里我测试了三个：gpt-4.1-mini、 qwen2.5-vl-72b-instruct（百炼官方推荐图片的文档内容解析）、qwen-vl-max-latest（这个对于图片中的大量的文字解析效果较差）
这是官方推荐的用法qwen2.5-vl-72b-instruct

仅Qwen2.5-VL模型支持将图像类的文档（如扫描件/图片PDF）解析为 QwenVL HTML格式，该格式不仅能精准识别文本，还能获取图像、表格等元素的位置信息。

Prompt技巧：您需要在提示词中引导模型输出QwenVL HTML，否则将解析为不带位置信息的HTML格式的文本：

推荐系统提示词：“You are an AI specialized in recognizing and extracting text
from images. Your mission is to analyze the image document and
generate the result in QwenVL Document Parser HTML format using
specified tags while maintaining user privacy and data integrity.”

推荐用户提示词：“QwenVL HTML”

2确认调用方式-使用openai还是dashscop或者其他方式
3这里有个点,如何将多媒体形式的文件传入，看百炼平台是有url、 byte、resource。大多都支持这几种方式。
如果你的文件的url为公开的，则直接传入url是最简单的
如果为非公开的，就需要处理文件，存储为本地文件，然后获取byte或者resource传入，这样在传入的时候使用临时文件，就可以对这个临时文件进行胡作非为了，这是是先从s拿到文件的资源，然后进行创建临时文件。由于创建临时文件并写文件，finally要删掉并且临时文件的写操作是允许的（如linux的临时地址的写或者读是需要给与的，一般默认是允许的）。如果不想对临时文件进行文件名称或者必须是入参是文件原本的格式的要求的话，在ossClient.fileDownload(remoteFile.getUrl());这一步就结束了，因为这一步就已经拿到文件了，只是tmp后缀的文件而已。如果想要原文件的类型如名称或者文件类型，就需要创建tempDir.resolve(sanitizedName);。看应用场景具体应用

//注意如果创建临时文件之后要删掉
 Path downloadedPath = null;
            try {
                downloadedPath = fileUtilUtil.getTempFullFilePath(chatModelDto.getInputOssId());
                if (!StrUtil.isBlankIfStr(downloadedPath)) {
                    String strPath = downloadedPath.toAbsolutePath().toString();
                    localFilePath = strPath;
                } else {
                    return "获取文件失败，请联系管理员" ;
                }

          } catch (Exception e) {
                log.error("处理ossId为 {} 的文件失败: {}", chatModelDto.getInputOssId(), e.getMessage(), e);

            } finally {
                Path tempDir = null;
                if (null != downloadedPath) {
                    tempDir = downloadedPath.getParent().toAbsolutePath();
                }

                if (null != tempDir) {
                    // 递归删除目录（包括所有子目录和文件）
                    Files.walkFileTree(tempDir, new SimpleFileVisitor<Path>() {
                        @Override
                        public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
                            // 删除文件
                            Files.delete(file);
                            return FileVisitResult.CONTINUE;
                        }

                        @Override
                        public FileVisitResult postVisitDirectory(Path dir, IOException exc) throws IOException {
                            // 删除空目录
                            Files.delete(dir);
                            return FileVisitResult.CONTINUE;
                        }
                    });
                }
                log.info("临时目录已删除！");
            }

或者绕过签名获取url（开发环境不推荐），绕过的话可以使用ssl忽略证书，这样在使用的时候

//先忽略然后创建UrlResource
  SslUtils.ignoreSsl();
 UrlResource urlResource = new UrlResource(被签名认证的非公开文档url);

这里是openai

下面的cusChatClientBuilder是已经创建好的chatclient,使用chatmodel或者chatclient都是可以的，
open ai+绕过签名处理文件+url


    @PostMapping(value = "/pdfVlTest")
    public String pdfVlTest(@RequestBody AAA chatModelDto) throws Exception {

        OpenAiChatOptions chatOptions = OpenAiChatOptions.builder().model(chatModelDto.getModelName()).build();
        ChatClient chatClient = ChatClient.builder(chatModel).defaultOptions(chatOptions).defaultAdvisors(new SimpleLoggerAdvisor()).build();
        List<Media> media;
   String mdUrl =chatModelDto.getUrl;
            SslUtils.ignoreSsl();
            UrlResource urlResource = new UrlResource(mdUrl);
            //例如这里想要获取Media的type类型，如果不是固定IMAGE_JPEG的话就需要获取文件的属性，可以从临时文件或者文件本身的属性获取,或者FileUtils.getMimeType(suffix),其中suffix是文件后缀
            media = List.of(new Media(MimeTypeUtils.IMAGE_JPEG, urlResource));  
        UserMessage.Builder userM = UserMessage.builder().media(media).text(chatModelDto.getMessage());
        UserMessage userMessage = userM.build();

        SystemMessage.Builder sys = SystemMessage.builder().text("You are an AI specialized in recognizing and extracting text from images. Your mission is to analyze the image document and generate the result in QwenVL Document Parser HTML format using specified tags while maintaining user privacy and data integrity.");
        SystemMessage systemMessage = sys.build();

        Prompt prompt = new Prompt(List.of(userMessage, systemMessage));
        return chatClient.prompt(prompt).call().content();


    }

open ai+本地文件处理


    /**
     * vl pdf测试 本地文件+openai
     */
    @PostMapping(value = "/pdfVlTest")
    public String pdfVlTest(@RequestBody AAA chatModelDto) throws IOException   {

        String localFilePath = "";
        if (null != chatModelDto.getInputOssId()) {
            Path downloadedPath = null;
            try {
                downloadedPath = fileUtilUtil.getTempFullFilePath(chatModelDto.getInputOssId());
                if (!StrUtil.isBlankIfStr(downloadedPath)) {
                    String strPath = downloadedPath.toAbsolutePath().toString();
                    localFilePath = strPath;
                } else {
                    return "获取文件失败，请联系管理员" ;
                }


                ChatClient.Builder builder = cusChatClientBuilder.mutate()
                    .defaultOptions(OpenAiChatOptions.builder().streamUsage(true)
                        .model(chatModelDto.getModelName())

                        .build());
                ChatClient chatClient = builder.build();
                List<Media> media = List.of();

                    //使用本地文件转
                    File file = new File(localFilePath );
                    Resource fileResource = new FileSystemResource(file);
                MediaType mediaType = MediaType.parseMediaType(getMimeType(localFilePath));
                media.add(new Media(mediaType,fileResource));
                UserMessage.Builder userM = UserMessage.builder().media(media).text(chatModelDto.getMessage());
                UserMessage userMessage = userM.build();

                SystemMessage.Builder sys = SystemMessage.builder().text("You are an AI specialized in recognizing and extracting text from images. Your mission is to analyze the image document and generate the result in QwenVL Document Parser HTML format using specified tags while maintaining user privacy and data integrity.");
                SystemMessage systemMessage = sys.build();

                Prompt prompt = new Prompt(List.of(userMessage, systemMessage));
                return chatClient.prompt(prompt).call().content();

            } catch (Exception e) {
                log.error("处理ossId为 {} 的文件失败: {}", chatModelDto.getInputOssId(), e.getMessage(), e);

            } finally {
                Path tempDir = null;
                if (null != downloadedPath) {
                    tempDir = downloadedPath.getParent().toAbsolutePath();
                }

                if (null != tempDir) {
                    // 递归删除目录（包括所有子目录和文件）
                    Files.walkFileTree(tempDir, new SimpleFileVisitor<Path>() {
                        @Override
                        public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
                            // 删除文件
                            Files.delete(file);
                            return FileVisitResult.CONTINUE;
                        }

                        @Override
                        public FileVisitResult postVisitDirectory(Path dir, IOException exc) throws IOException {
                            // 删除空目录
                            Files.delete(dir);
                            return FileVisitResult.CONTINUE;
                        }
                    });
                }
                log.info("临时目录已删除！");
            }
        }


        return "";
    }

这里是dashscop

配置为：

  ai:
    retry:
      max-attempts: 2
    dashscope :
      api-key : sk

   /**
    * dashVlTest dashscop
    */
   @PostMapping(value = "/dashVlTest")
   public String dashVlTest( ) throws NoApiKeyException, UploadFileException {

           MultiModalConversation conv = new MultiModalConversation();
           MultiModalMessage systemMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
               .content(Arrays.asList(
                   Collections.singletonMap("text", "You are a helpful assistant."))).build();
           MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
               .content(Arrays.asList(
                   Collections.singletonMap("image", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"),
                   Collections.singletonMap("text", "请解析图中文字内容并原文输出"))).build();
           MultiModalConversationParam param = MultiModalConversationParam.builder()
                .apiKey("sk-")
               .model("qwen-vl-max-latest")
               .messages(Arrays.asList(systemMessage, userMessage))
               .build();
           MultiModalConversationResult result = conv.call(param);
           System.out.println(result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
        return result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text").toString();

   }

在这里插入图片描述

解析结果还是不错的
在这里插入图片描述
ssl忽略证书

public class SslUtils {

    public static void trustAllHttpsCertificates() throws Exception {
        TrustManager[] trustAllCerts = new TrustManager[1];
        TrustManager tm = new miTM();
        trustAllCerts[0] = tm;
        SSLContext sc = SSLContext.getInstance("SSL");
        sc.init(null, trustAllCerts, null);
        HttpsURLConnection.setDefaultSSLSocketFactory(sc.getSocketFactory());
    }

    static class miTM implements TrustManager, X509TrustManager {
        public X509Certificate[] getAcceptedIssuers() {
            return null;
        }

        public boolean isServerTrusted(X509Certificate[] certs) {
            return true;
        }

        public boolean isClientTrusted(X509Certificate[] certs) {
            return true;
        }

        public void checkServerTrusted(X509Certificate[] certs, String authType)
            throws CertificateException {
            return;
        }

        public void checkClientTrusted(X509Certificate[] certs, String authType)
            throws CertificateException {
            return;
        }
    }

    /**
     * 忽略HTTPS请求的SSL证书，必须在openConnection之前调用
     *
     * @throws Exception
     */
    public static void ignoreSsl() throws Exception {
        HostnameVerifier hv = new HostnameVerifier() {
            public boolean verify(String urlHostName, SSLSession session) {
                return true;
            }
        };
        trustAllHttpsCertificates();
        HttpsURLConnection.setDefaultHostnameVerifier(hv);
    }
}

文件处理-获取到临时文件

public Path getTempFullFilePath(Long ossId) {
        RemoteFile remoteFile = remoteFileService.getById(ossId);
//        String fileSuffix = remoteFile.getFileSuffix();
        Path downloadedPath = null;
        Path targetPath = null;
        try {

            Path tempDir;
            try {
                tempDir = Files.createTempDirectory("oss_temp_");
            } catch (IOException e) {
                throw new RuntimeException("无法创建临时目录", e);
            }
            // 步骤1: 下载原始文件 -下载后的文件在本地的临时路径
            downloadedPath = ossClient.fileDownload(remoteFile.getUrl());
            if (null == downloadedPath) {
                throw new RuntimeException("从 Amazon S3 到下载文件到临时目录downloadedPath为空");

            }
            String originalName = remoteFile.getOriginalName();
            String sanitizedName = originalName.replaceAll("[\\\\/:*?\"<>|]", "_")  
                .replaceAll("\\s+", "_")          
                .replaceAll("\\.{2,}", ".");     

            targetPath = tempDir.resolve(sanitizedName);
  
            int counter = 1;
            while (Files.exists(targetPath)) {
                String newName = String.format("%s_%d%s.txt", getFileNameWithoutExtension(sanitizedName), counter++, getFileExtension(sanitizedName));
                targetPath = tempDir.resolve(newName);
            }

            Files.copy(downloadedPath, targetPath, StandardCopyOption.REPLACE_EXISTING);

        } catch (Exception e) {
            log.error("处理ossId为 {} 的文件失败: {}", ossId, e.getMessage(), e);

        } finally {
            if (null != downloadedPath) {
                FileUtils.del(downloadedPath);
            }
        }
        return targetPath;
    }

    //获取无扩展名的文件名
    private static String getFileNameWithoutExtension(String fileName) {
        int dotIndex = fileName.lastIndexOf('.');
        return (dotIndex == -1) ? fileName : fileName.substring(0, dotIndex);
    }

    //获取文件扩展名
    private static String getFileExtension(String fileName) {
        int dotIndex = fileName.lastIndexOf('.');
        return (dotIndex == -1) ? "" : fileName.substring(dotIndex);
    }
/**
     * 下载文件从 Amazon S3 到临时目录
     *
     * @param path 文件在 Amazon S3 中的对象键
     * @return 下载后的文件在本地的临时路径
     * @throws OssException 如果下载失败，抛出自定义异常
     */
    public Path fileDownload(String path) {
        // 构建临时文件
        Path tempFilePath = FileUtils.createTempFile().toPath();
        // 使用 S3TransferManager 下载文件
        FileDownload downloadFile = transferManager.downloadFile(
            x -> x.getObjectRequest(
                    y -> y.bucket(properties.getBucketName())
                        .key(removeBaseUrl(path))
                        .build())
                .addTransferListener(LoggingTransferListener.create())
                .destination(tempFilePath)
                .build());
        // 等待文件下载操作完成
        downloadFile.completionFuture().join();
        return tempFilePath;
    }

spring ai-openai-vl模型应用qwen-vl\gpt-文字识别-java

这里是openai

这里是dashscop

网站公告

今日签到

热门文章

最新发布