跳转至

TuringOS文档中心

AI-WIFI语音识别

语音识别接入

接口信息

接口地址

http://smartdevice.ai.turingapi.com/speech/v2/asr

请求方法

Method: POST
Content-Type: multipart/form-data

请求参数

参数	必选	类型	说明
parameters	true	json	语音识别请求参数,参见(parameters 字段说明)
speech	true	opus(推荐)/amr/speex	音频文件

parameters 字段说明

参数	必选	类型	说明
ak	true	String	apiKey,用于权限验证
uid	true	String	设备ID加密后的字符串(参考AIWIFI接入)
token	true	String	请求令牌，首次请求可以为空。(参考AIWIFI接入)
asr	true	int	针对上传音频字段设置控制，当asr=2(不推荐)时：amr_8K_16bit; 当asr=3时：amr_16K_16bit；当asr=4(推荐)时：opus；当asr=5(不推荐)时：speex（需要用特定的编码工具）如果需要其他格式，请联系商务
realTime	false	int	流式识别控制字段。当realTime=0时：非流式识别（默认），上传音频大小不能超过360KB；当realTime=1时：流式识别，分段的每包音频的大小不能超过20KB。
index	false	int	当realTime=1为流式识别时，此字段必选，用以标识音频片段索引。index从1开始计数，且最后一个音频片段索引必须为负数，如index=1、2、3、-4。注意index最大不能超过30
identify	false	int	当realTime=1为流式识别时，该字段有效，用于标识一个流式识别过程，所以每个流式识别过程该identify值必须保证唯一性。(要求32位随机数，可由数字和小写字母组成，不支持大写字母和特殊字符)
encode	false	int	上传音频编码方式，主要用于自定义编码支持，非自定义编码可忽略该字段，当encode=0时：通用编码，即asr字段支持的编码方式（默认）; 当encode=1时：自定编码，若使用该编码方式，则需要提供转码工具，且转码目标格式必须是asr字段支持的格式。
asr_lan	false	int	选择ASR语言，0为中文（默认），1为英文

返回参数

参数	必传	类型	说明
code	true	int	返回码。当code=220时，识别成功。当code = 40000时，表示流式识别的中间过程成功。当code = 4xxxx时 (除code = 4000外)，识别失败。
token	true	String	请求令牌，开发者自己保存起来，用于下一次请求
err	true	String	返回码描述
text	true	String	识别音频的结果

示例

请求参数

POST /speech/v2/asr  HTTP/1.1
Host: smartdevice.ai.turingapi.com
Connection: keep-alive
Content-Length: 26899
Cache-Control: no-cache
Content-Type: multipart/form-data; boundary=----WebKitFormBoundarybtXd96AztrPD9eZT
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN

------WebKitFormBoundarybtXd96AztrPD9eZT
Content-Disposition: form-data; name="parameters"

{
"ak": "8268ff3bce3e45c7b10a6d49a0fddcd5",
"uid": "9C5DE14C6503BCF56821D7A41DA23B4D",
"asr": 1,
"token": "2af62825b9cd49568cc55a6256a86239"
}
------WebKitFormBoundarybtXd96AztrPD9eZT
Content-Disposition: form-data; name="speech"; filename="pcm_8K_16bit_test.pcm"
Content-Type: application/octet-stream

二进制文件内容区
------WebKitFormBoundarybtXd96AztrPD9eZT--

返回参数

{
    "code": 220,
    "text": "你好，明天去哪儿",
    "token": "e9cb1458a9a7414fa27b3bf3391396b1",
    “err”: “成功”
}

附表

code码对照表

code码	返回msg	详细描述
220	成功	语音识别成功
40000	in progress	正在进行流式识别，中间段请求成功
40001	value error	字段错误
40002	illegal value	非法字段
40003	value is null or missing	字段为空或错误
40004	asr failure	语音解析失败
40007	token invalid value	无效token
40008	is expired	过期
40012	request is forbidden	拒绝请求
40013	out of device count limit	请求超出限制
43000	asr io read error	读取asr上传音频流失败
43010	asr service outime	ASR服务器端超时
43020	asr client outime	ASR客户端超时
43030	asr exception	ASR识别抛异常
43035	asr file limit error	请求音频单个大小超过了规定长度
43036	file exceed the limit no realtime	非流式识别，上传音频大小超过了规定长度(360KB)
43037	file package more index > 30	一次流式识别分段上传的包不能超过30个
43038	index %s has fail	表示流式识别中在第几段已经失败了
43040	asr jt error	ASR的服务返回错误
43041	asr wait for data 5s	asr等待数据5s还没有新数据过来
43042	streaming a single packet of voice over 4s	流式传输单包语音超过4s
43043	not streaming a single packet of voice over 20s	非流式传输语音超过20s
47310	robot count day_limit_exceeded	超过了每天的请求数量
47320	robot count hour_limit_exceeded	超过了每小时的请求数量
47330	robot count minute_limit_exceeded	超过了每分钟的请求数量
44010	asr ali idle too long time	流式上传的音频间隔超过了10s
49999	unknown error	未知错误