中文语音识别技术实战-EW帮帮网

语音识别技术，也称为自动语音识别（Automatic Speech Recognition，ASR），其主要目标是将人类的语音中的词汇内容转换为相应的行动或文本。

文章目录

PaddleSpeech
- 环境准备
- 代码示例
DeepSpeech
- 环境准备
- 代码示例
Voice_Translation
- 环境准备
- 代码示例

从整体效果来看，Voice_Translation效果最好，且有标点（支持多种格式音频上传）；PaddleSpeech效果次之，无标点；DeepSpeech效果很差（不知道是不是主要针对英文的，所以对于中文的效果很差）

PaddleSpeech

PaddleSpeech项目介绍：语音识别与语音合成–百度PaddleSpeech

环境准备

pip install paddlepaddle
pip install pytest-runner
pip install paddlespeech

代码示例

由于不能直接识别amr格式，故转换为wav格式进行。

import subprocess
from paddlespeech.cli.asr.infer import ASRExecutor
import os

asr = ASRExecutor()

def convert_amr_to_wav(input_amr, output_wav,sample_rate=16000):
    """将 AMR 文件转换为 WAV 格式"""
    try:
        subprocess.run(['ffmpeg', '-i', input_amr, '-ar', str(sample_rate), output_wav], check=True)
        print(f"Converted {input_amr} to {output_wav}")
    except subprocess.CalledProcessError as e:
        print(f"Error during conversion: {e}")
    except FileNotFoundError:
        print("ffmpeg is not installed or not found in system PATH")

# 下载并转换 AMR 文件为 WAV 格式
def download_and_convert_amr(url, output_wav):
    try:
        # 下载 AMR 文件
        audio_file_amr = url.split("/")[-1]
        os.system(f"wget {url} -O {audio_file_amr}")
        
        # 转换 AMR 到 WAV
        convert_amr_to_wav(audio_file_amr, output_wav)
        
        # 删除临时 AMR 文件
        os.remove(audio_file_amr)
        
        return output_wav
    except Exception as e:
        print(f"Error downloading or converting file: {e}")
        return None


# 输入音频文件 URL（AMR 格式）
audio_file = "https://test/07f3826c0ac77b7d74124c505fa1c35a.amr"
# 输出地址
output_file = "/root/Desktop/PaddleSpeech/audio.wav"
# # 下载并转换 AMR 文件
wav_file = download_and_convert_amr(audio_file, output_file)

# 使用模型进行推理（识别转换后的 WAV 文件）
if wav_file:
    result = asr(audio_file=wav_file)
    print(f"Recognition result for audio: {result}")

DeepSpeech

DeepSpeech项目介绍：DeepSpeech理论与实战

环境准备

下载所需权重：deepspeech-0.9.3-models-zh-CN.pbmm和deepspeech-0.9.3-models-zh-CN.scorer

代码示例

import deepspeech
import numpy as np
 
# 加载模型
model = deepspeech.Model('/root/Desktop/PaddleSpeech/deepspeech-0.9.3-models-zh-CN.pbmm')
model.enableExternalScorer('/root/Desktop/PaddleSpeech/deepspeech-0.9.3-models-zh-CN.scorer')
 
# 读取音频文件
audio = np.frombuffer(open("/root/Desktop/PaddleSpeech/audio.wav", "rb").read(), np.int16)
 
# 进行识别
text = model.stt(audio)
try:
    print(text)
except UnicodeEncodeError:
    print(text.encode('utf-8', errors='replace').decode('utf-8'))

Voice_Translation

Voice_Translation项目介绍：基于深度学习的中文标点预测模型-中文标点重建（Transformer模型）【已开源】

环境准备

pip install funasr

配置所需权重：Voice_translation_model.pt、Endpoint_detection_model.pt、Ct_punc_model.pt
Attention：需要将前缀删掉放在目录里。
在这里插入图片描述

具体目录结构：

│
├─Ct_punc
│      config.yaml
│      configuration.json
│      model.pt
│      tokens.json
│
├─Endpoint_detection
│      am.mvn
│      config.yaml
│      configuration.json
│      model.pt
│
├─Voice_translation
│      am.mvn
│      config.yaml
│      configuration.json
│      model.pt
│      seg_dict
│      tokens.json
└─voice_translation_test.py

代码示例

voice_translation_test.py

from funasr import AutoModel

import sys
import os
from flask import Flask, request, jsonify
from flask_cors import CORS
app = Flask(__name__)
CORS(app)  # 允许所有路由的跨域请求  
import warnings
warnings.simplefilter(action='ignore')

current_dir = "/root/Desktop/PaddleSpeech/voice_translation"

def voice_translation(audio):
    model = AutoModel(model=os.path.join(current_dir, "Voice_translation"), model_revision="v2.0.4",
                        vad_model=os.path.join(current_dir, "Endpoint_detection"), vad_model_revision="v2.0.4",
                        punc_model=os.path.join(current_dir, "Ct_punc"), punc_model_revision="v2.0.4",
                        disable_update=True)
    res = model.generate(input=audio, 
                    batch_size_s=300, 
                    hotword='test')
    return res[0]['text']


@app.route('/speech_recognition', methods=['GET', 'POST'])
def submit():
    data = request.get_json()
    try:
        # 从请求体中提取参数
        audiourl= data.get('audioUrl')
        result_text = voice_translation(audiourl)
        return jsonify({"code":200, "message": f"Successfully.", "data":result_text}), 200
    except Exception as e:
        # 处理可能的异常
        return jsonify({"code":500, "message": f"An error: {e}.", "data":None}), 200
    
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=20000, debug=False)  # 在局域网内可访问

中文语音识别技术实战

文章目录

PaddleSpeech

环境准备

代码示例

DeepSpeech

环境准备

代码示例

Voice_Translation

环境准备

代码示例

网站公告

今日签到

热门文章

最新发布