FunASR语言识别的环境安装、推理_1

发布于:2025-02-10 ⋅ 阅读:(168) ⋅ 点赞:(0)

目录

一、环境配置

1、创建虚拟环境

2、安装环境及pytorch

官网:pytorch下载地址

3、安装funasr之前,确保已经安装了下面依赖环境:

python代码调用(推荐)

4、模型下载

5、启动funasr服务

二、 客户端连接

2.1 html连接

三、推理识别模型

1、实时语音识别

2、非实时语音识别


一、环境配置

源码地址:FunASR

FunASR/README_zh.md at main · alibaba-damo-academy/FunASR · GitHub

1、创建虚拟环境
conda create -n funasr python==3.9 -y

conda activate funasr
2、安装环境及pytorch
官网:pytorch下载地址

pip3 install -e ./ -i https://pypi.tuna.tsinghua.edu.cn/simple

conda install pytorch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 pytorch-cuda=12.4 -c pytorch -c nvidia -y

pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu124
3、安装funasr之前,确保已经安装了下面依赖环境:
pip3 install -U funasr -i https://pypi.tuna.tsinghua.edu.cn/simple

或者

touch requirements.txt

# Ultralytics requirements
# Usage: pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
 
# Base ----------------------------------------
matplotlib>=3.2.2
numpy>=1.22.2 # pinned by Snyk to avoid a vulnerability
opencv-python>=4.6.0
pillow>=7.1.2
pyyaml>=5.3.1
requests>=2.23.0
scipy>=1.4.1
torch>=1.7.0
torchvision>=0.8.1
tqdm>=4.64.0
 
# Logging -------------------------------------
# tensorboard>=2.13.0
# dvclive>=2.12.0
# clearml
# comet
 
# Plotting ------------------------------------
pandas>=1.1.4
seaborn>=0.11.0
 
# Export --------------------------------------
# coremltools>=7.0.b1  # CoreML export
# onnx>=1.12.0  # ONNX export
# onnxsim>=0.4.1  # ONNX simplifier
# nvidia-pyindex  # TensorRT export
# nvidia-tensorrt  # TensorRT export
# scikit-learn==0.19.2  # CoreML quantization
# tensorflow>=2.4.1  # TF exports (-cpu, -aarch64, -macos)
# tflite-support
# tensorflowjs>=3.9.0  # TF.js export
# openvino-dev>=2023.0  # OpenVINO export
 
# Extras --------------------------------------
psutil  # system utilization
py-cpuinfo  # display CPU info
# thop>=0.1.1  # FLOPs computation
# ipython  # interactive notebook
# albumentations>=1.0.3  # training augmentations
# pycocotools>=2.0.6  # COCO mAP
# roboflow
 torchaudio

pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

pip install torchaudio -i https://pypi.tuna.tsinghua.edu.cn/simple

自己录了一段语音测试效果还不错:

funasr ++model=paraformer-zh ++vad_model="fsmn-vad" ++punc_model="ct-punc" ++input=/home/sxj/FunASR/outputs/c.wav

模型保存路径:

/home/sxj/.cache/modelscope/hub/iic

python代码调用(推荐)
from funasr import AutoModel

model = AutoModel(model="paraformer-zh")

res = model.generate(input="https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/test_audio/vad_example.wav")
print(res)
4、模型下载

实时语音识别模型地址:FunASR语音识别模型下载

测试音频(中文英文

5、启动funasr服务
cd runtime

bash run_server_2pass.sh

启动成功,端口号为【10095】

二、 服务器部署

1、

cd /home/sxj/FunASR/runtime/python/websocket

2、服务器端

先安装环境

pip install -r requirements_client.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

python funasr_wss_server.py

3、客户端

python funasr_wss_client.py

4、运行html5页面:/home/sxj/FunASR/runtime/html5/static

funasr_samples文件夹中包含多种客户端连接方式,此处以html和python为例

2.1 html连接

打开file:///home/sxj/FunASR/web-pages/public/static/online/index.html文件夹,使用网页运行index.html

将asr服务器地址更改为 【ws://127.0.0.1:10095】,点击连接进行测试,如连接失败更改端口为【10096】

cd /home/sxj/FunASR/runtime/html5


 

python3 h5Server.py

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

部署与开发文档

部署模型来自于ModelScope,或者用户finetune,支持用户定制服务,详细文档参考(点击此处

三、推理识别模型

快速开始

funasr ++model=paraformer-zh ++vad_model="fsmn-vad" ++punc_model="ct-punc" ++input=asr_example_zh.wav

1、实时语音识别
from funasr import AutoModel

chunk_size = [0, 10, 5] #[0, 10, 5] 600ms, [0, 8, 4] 480ms
encoder_chunk_look_back = 4 #number of chunks to lookback for encoder self-attention
decoder_chunk_look_back = 1 #number of encoder chunks to lookback for decoder cross-attention

model = AutoModel(model="paraformer-zh-streaming")

import soundfile
import os

wav_file = os.path.join(model.model_path, "example/asr_example.wav")
speech, sample_rate = soundfile.read(wav_file)
chunk_stride = chunk_size[1] * 960 # 600ms

cache = {}
total_chunk_num = int(len((speech)-1)/chunk_stride+1)
for i in range(total_chunk_num):
    speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
    is_final = i == total_chunk_num - 1
    res = model.generate(input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back)
    print(res)

注:chunk_size为流式延时配置,[0,10,5]表示上屏实时出字粒度为10*60=600ms,未来信息为5*60=300ms。每次推理输入为600ms(采样点数为16000*0.6=960),输出为对应文字,最后一个语音片段输入需要设置is_final=True来强制输出最后一个字。

2、非实时语音识别
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess

model_dir = "iic/SenseVoiceSmall"

model = AutoModel(
    model=model_dir,
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cuda:0",
)

# en
res = model.generate(
    input=f"{model.model_path}/example/en.mp3",
    cache={},
    language="auto",  # "zn", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
    merge_vad=True,  #
    merge_length_s=15,
)
text = rich_transcription_postprocess(res[0]["text"])
print(text)

参数说明:

  • model_dir:模型名称,或本地磁盘中的模型路径。
  • vad_model:表示开启VAD,VAD的作用是将长音频切割成短音频,此时推理耗时包括了VAD与SenseVoice总耗时,为链路耗时,如果需要单独测试SenseVoice模型耗时,可以关闭VAD模型。
  • vad_kwargs:表示VAD模型配置,max_single_segment_time: 表示vad_model最大切割音频时长, 单位是毫秒ms。
  • use_itn:输出结果中是否包含标点与逆文本正则化。
  • batch_size_s 表示采用动态batch,batch中总音频时长,单位为秒s。
  • merge_vad:是否将 vad 模型切割的短音频碎片合成,合并后长度为merge_length_s,单位为秒s。
  • ban_emo_unk:禁用emo_unk标签,禁用后所有的句子都会被赋与情感标签

未完...

参考:FunASR/README_zh.md at main · modelscope/FunASR · GitHub


网站公告

今日签到

点亮在社区的每一天
去签到