端到端语音识别工具包 WeNet 的编译及运行

一、概述

听说端到端语音识别工具包 WeNet 效果还不错，但在测试电脑上用 Docker 进行测试并不成功。在使用源码编译的过程中也遇见些问题，遂记录备忘。

二、安装 libtorch

1	brew install libtorch

三、下载 WeNet 源码

1 2	# 当前目录: /your/folder git clone https://github.com/wenet-e2e/wenet wenet-e2e/wenet

四、编译

1
2
3

# 当前目录: /your/folder
cd ./wenet-e2e/wenet/runtime/server/x86
mkdir build && cd build && cmake .. && cmake --build .

五、下载 WenetSpeech 预训练模型

下载方式在这篇文章有说明，不再赘述。

六、测试

准备一条包含普通话的 16kHz SampleRate、16 BitsPerSample(s16le)的音频。

1、测试 `decoder_main`

# 当前目录: /your/folder/wenet-e2e/wenet/runtime/server/x86/build
export GLOG_logtostderr=1
export GLOG_v=2
time ./decoder_main \
--chunk_size -1 \
--model_path \
/your/folder/SpeechColab/Leaderboard/models/wenet_wenetspeech/assets/final.zip \
--dict_path \
/your/folder/SpeechColab/Leaderboard/models/wenet_wenetspeech/assets/words.txt \
--wav_path \
/your/folder/TestASR-01.wav

音频长度为 2 分钟左右。在测试机上的运行时间：

1	15.71s user 4.52s system 97% cpu 20.799 total

2、测试 `websocket_server_main`

首先开启 WebSocket 服务：

# 当前目录: /your/folder/wenet-e2e/wenet/runtime/server/x86/build
export GLOG_logtostderr=1
export GLOG_v=2
./websocket_server_main \
--chunk_size 16 \
--model_path \
/your/folder/SpeechColab/Leaderboard/models/wenet_wenetspeech/assets/final.zip \
--dict_path \
/your/folder/SpeechColab/Leaderboard/models/wenet_wenetspeech/assets/words.txt

然后在浏览器中打开 /your/folder/wenet-e2e/wenet/runtime/server/x86/web/templates/index.html 。在 WebSocket URL 对应的输入框输入 ws://127.0.0.1:10086 。

(图1)

点击 开始录音 按钮开始录音，点击 停止录音 获取识别文字。

七、问题

测试环境：Apple M1，macOS 12.0.1, Xcode 13.1。也测试过在 Docker 上可以运行但不太稳定，也许和镜像是镜像是基于 x86_64 的有关。Windows 或 Linux 环境尚未测试。

1、下载第三方库失败

如果在 cmake .. 的时候下载第三方库失败，可通过其他方式下载好后放入对应的目录。

-- Downloading...
   dst='/your/folder/wenet-e2e/wenet/runtime/server/x86/fc_base/gflags-subbuild/gflags-populate-prefix/src/v2.2.1.zip'
   timeout='none'
   inactivity timeout='none'
-- Using src='https://github.com/gflags/gflags/archive/v2.2.1.zip'
CMake Error at gflags-subbuild/gflags-populate-prefix/src/gflags-populate-stamp/download-gflags-populate.cmake:170 (message):
  Each download failed!

    error: downloading 'https://github.com/gflags/gflags/archive/v2.2.1.zip' failed
          status_code: 35

比如下载 gflags 失败，从 https://github.com/gflags/gflags/archive/v2.2.1.zip 下载好后放入 /your/foler/wenet-e2e/wenet/runtime/server/x86/fc_base/gflags-subbuild/gflags-populate-prefix/src 目录。

涉及如下库：gflags、googletest、boost 和 libtorch。

提醒：版本和保存的文件名要一致。

2、与 libtorch 相关的链接错误

由于自动下载的 libtorch 是 x86_64 架构。如果在 M1 上进行编译，可使用 brew 安装 libtorch。在链接时其优先级较高，无需再做其他配置。

1	brew install libtorch

3、C++14 相关错误

如果在 cmake --build . 的时候报类似如下的编译错误：

1
2

/your/folder/wenet/runtime/server/x86/fc_base/libtorch-src/include/ATen/ATen.h:4:2: error: C++14 or later compatible compiler is required to use ATen.
#error C++14 or later compatible compiler is required to use ATen.

修改 CMakeLists.txt 文件

# 当前文件: /your/folder/wenet-e2e/wenet/runtime/server/x86/CMakeLists.txt
cmake_minimum_required(VERSION 3.14 FATAL_ERROR)

project(wenet VERSION 0.1)

# 新增下面两行
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

Alby's blog

端到端语音识别工具包 WeNet 的编译及运行

一、概述

二、安装 libtorch

三、下载 WeNet 源码

四、编译

五、下载 WenetSpeech 预训练模型

六、测试

1、测试 `decoder_main`

2、测试 `websocket_server_main`

七、问题

1、下载第三方库失败

2、与 libtorch 相关的链接错误

3、C++14 相关错误

参考资料

一、概述

二、安装 libtorch

三、下载 WeNet 源码

四、编译

五、下载 WenetSpeech 预训练模型

六、测试

1、测试 decoder_main

2、测试 websocket_server_main

七、问题

1、下载第三方库失败

2、与 libtorch 相关的链接错误

3、C++14 相关错误

参考资料

1、测试 `decoder_main`

2、测试 `websocket_server_main`