目录
计算准确率
转录的文字进行求准确率
训练loss
是把文字转toker,进行分类训练。
WhisperProcessor
WhisperProcessor
是 HuggingFace Transformers 中用于处理 OpenAI Whisper 模型输入输出的工具类。它封装了 WhisperFeatureExtractor
和 WhisperTokenizer
,简化了语音预处理、token 编码和解码流程。
语音识别例子
import torch
import torchaudio
from transformers import WhisperProcessor, WhisperForConditionalGeneration
# 选择模型(你也可以用 "openai/whisper-base"、"whisper-medium"、"whisper-large")
model_name = "openai/whisper-small"
# 加载模型与预处理器
print("加载模型和Processor...")
processor = WhisperProcessor.from_pretrained(model_name)
model = WhisperForConditionalGeneration.from_pretrained(model_name)
# 使用GPU加速(如果可用)
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
# 加载音频文件(确保是单通道)
audio_path = "example.wav" # 替换成你自己的文件路径
speech_array, sr = torchaudio.load(audio_path)
# 重采样为16kHz(Whisper要求)
if sr != 16000:
resampler = torchaudio.transforms.Resample(orig_freq=sr, new_freq=16000)
speech_array = resampler(speech_array)
# 转为1D Tensor(Whisper要求单通道)
speech = speech_array.squeeze()
# 预处理音频
print("提取音频特征...")
inputs = processor(speech, sampling_rate=16000, return_tensors="pt")
input_features = inputs.input_features.to(device)
# 加入语言提示(这里是中文识别,language 改为 "en"/"fr"/"ja" 可识别其他语言)
forced_decoder_ids = processor.get_decoder_prompt_ids(language="zh", task="transcribe")
# 推理(模型输出 token ids)
print("模型推理中...")
predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids)
# 解码为文本
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print("识别结果:", transcription)