前端采用Uniapp,后端采用Springboot
语音识别采用讯飞的短语音识别,使用个人开发可以获得免费试用期。
本人踩坑经历:使用的uniapp转微信小程序,录音之后的音频文件死活不能通过讯飞的识别,于是我在网上寻找到了使用ffmpeg进行格式化音频文件,使其能够被识别,这一部分我是在后端完成的,前端只负责将语音进上传。FFmpeg使用的命令为:
ffmpeg -y -i 源文件 -acodec pcm_s16le -f s16le -ac 1 -ar 16000 生成文件
前端(核心在于options的设置,可以选择多种,但是我最初使用的是pcm,在本地运行是没问题,一到真机就会出问题,于是就改回mp3,在后端使用ffmpeg进行强装):
// 按住录音事件
longpressBtn() {
this.longPress = '2';
this.countdown(60); // 倒计时
clearInterval(init) // 清除定时器
recorderManager.onStop((res) => {
this.tempFilePath = res.tempFilePath;
this.recordingTimer(this.time);
})
const options = {
sampleRate: 16000, //采样率,有效值 8000/16000/44100
numberOfChannels: 1, //录音通道数,有效值 1/2
encodeBitRate: 96000, //编码码率
format: 'mp3',
}
this.recordingTimer();
recorderManager.start(options);
// 监听音频开始事件
recorderManager.onStart((res) => {})
},
// 松开录音事件
async touchendBtn() {
this.longPress = '1';
await recorderManager.onStop((res) => {
this.tempFilePath = res.tempFilePath
console.log(this.tempFilePath)
this.uploadVoice(this.tempFilePath)
})
this.recordingTimer(this.time)
recorderManager.stop()
},
// 上传
uploadVoice(tempFilePath) {
const token = getToken()
if (tempFilePath != '') {
uni.uploadFile({
url: this.endSideUrl1 +"/chat/voice", // 你的服务器上传接口地址
header: {
"token": token,
'Content-Type': 'multipart/form-data; charset=UTF-8'
},
filePath: tempFilePath,
name: 'audioFile',
success: (res) => {
//以下是其他业务代码。。。。
const question = JSON.parse(res.data).data
console.log(question)
if (question == "") {
uni.showToast({
title: '没听清?再讲一遍',
icon: 'none'
})
} else {
this.qaList.push({
content: question,
className: "chatpdfRow chatpdfAsk"
})
this.scrollToBottom();
//流式回答
this.send(`{user:${token},message:${question}}`);
let item = ({
content: "",
className: "chatpdfRow"
})
this.qaList.push(item);
this.scrollToBottom();
this.tempItem = item;
}
},
fail: (e) => {
console.error("没听清?再讲一遍", e)
}
});
}
},
后端代码:
使用一个工具类进行封装
package com.farm.util;
import com.iflytek.cloud.speech.*;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
public class IatTool {
private static final Logger LOGGER = LoggerFactory.getLogger(IatTool.class);
private StringBuilder curRet;
private SpeechRecognizer recognizer;
private CountDownLatch latch;
public IatTool(String appId) {
LOGGER.info("------Speech Utility init iat------");
SpeechUtility.createUtility(SpeechConstant.APPID + "=" + appId);
}
public String RecognizePcmfileByte(byte[] buffer) {
curRet = new StringBuilder();
latch = new CountDownLatch(1); // 初始化CountDownLatch,设置计数为1
try {
if (recognizer == null) {
recognizer = SpeechRecognizer.createRecognizer();
recognizer.setParameter(SpeechConstant.AUDIO_SOURCE, "-1");
recognizer.setParameter(SpeechConstant.RESULT_TYPE, "plain");
}
recognizer.startListening(recListener);
if (buffer == null || buffer.length == 0) {
LOGGER.error("no audio available!");
recognizer.cancel();
} else {
recognizer.writeAudio(buffer, 0, buffer.length);
recognizer.stopListening();
if (!latch.await(5000, TimeUnit.MILLISECONDS)) { // 等待超过5000ms则自行结束等待
LOGGER.warn("Recognition timed out.");
}
return curRet.toString();
}
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
private RecognizerListener recListener = new RecognizerListener() {
@Override
public void onBeginOfSpeech() {
LOGGER.info("onBeginOfSpeech enter");
}
@Override
public void onEndOfSpeech() {
LOGGER.info("onEndOfSpeech enter");
}
@Override
public void onResult(RecognizerResult results, boolean isLast) {
LOGGER.info("onResult enter");
String text = results.getResultString();
curRet.append(text);
if (isLast) {
latch.countDown(); // 唤醒等待的线程
}
}
@Override
public void onVolumeChanged(int volume) {
LOGGER.info("onVolumeChanged volume=" + volume);
}
@Override
public void onError(SpeechError error) {
LOGGER.error("onError enter");
if (null != error) {
LOGGER.error("onError Code:" + error.getErrorCode() + "," + error.getErrorDescription(true));
}
}
@Override
public void onEvent(int eventType, int arg1, int agr2, String msg) {
LOGGER.info("onEvent enter");
}
};
}
编写前端对应的上传接口(这里主要的工作就是将上传的文件进行一个保存,因为要使用一个外部工具进行一个格式化操作,所以好像不能使用流进操作,所以我保存了下来,保存之后在使用java代码调取控制台使用FFmpeg将其进行格式转换,将其保存在其目录中,转后的格式后缀可以换成PCM,讯飞是可以使用MP3的所以我沿用了,还是同样可以识别的出来,识别完毕之后将文件进行一个删除,将磁盘空间返回系统)
IatTool iatTool = new IatTool("填入你自己的ID");
@PostMapping(value = "/voice",produces = "application/json; charset=UTF-8")
public R<String> RecognizePcmfileByte(MultipartFile audioFile) {
String question = "no idea";
String originalFilename = audioFile.getOriginalFilename();
// 截取文件后缀名
String suffix = originalFilename.substring(originalFilename.lastIndexOf("."));
// 利用uuid生成新的文件名
String filename = UUID.randomUUID().toString() + suffix;
// 创建文件目录
File dir = new File(pcmPath);
if (!dir.exists()) {
// 文件不存在开始创建
dir.mkdir();
}
// 将文件转存在本地文件中
try {
File oldFile = new File(pcmPath + filename);
audioFile.transferTo(oldFile);
String newFileName = UUID.randomUUID().toString() + suffix;
String[] command = {
"ffmpeg","-loglevel", "quiet", "-y", "-i", pcmPath + filename, "-acodec", "pcm_s16le", "-f", "s16le", "-ac", "1", "-ar", "16000", pcmPath + newFileName
};
// 执行命令
ProcessBuilder processBuilder = new ProcessBuilder(command);
processBuilder.start().waitFor();
// 等待FFmpeg命令执行完毕
File newFile = new File(pcmPath + newFileName);
FileInputStream fileInputStream = new FileInputStream(newFile);
byte[] bytes = new byte[(int) newFile.length()];
fileInputStream.read(bytes);
question = iatTool.RecognizePcmfileByte(bytes);
//删除两个文件
boolean delete = oldFile.delete();
if(delete)
log.info("第一个文件删除成功");
fileInputStream.close();
boolean delete1 = newFile.delete();
if(delete1)
log.info("第二个文件删除成功");
return R.success(question);
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
return R.error("网络未知错误");
}