espeak 命令详解

| 选择喜欢的代码风格  

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

espeak 命令安装:


-bash/zsh: espeak command not found

# Windows (WSL2)
sudo apt-get update sudo apt-get install espeak

# Debian
apt-get install espeak

# Ubuntu
apt-get install espeak

# Alpine
apk add espeak

# Arch Linux
pacman -S espeak

# Kali Linux
apt-get install espeak

# CentOS
yum install espeak

# Fedora
dnf install espeak

# OS X
brew install espeak

# Raspbian
apt-get install espeak

# Dockerfile
dockerfile.run/espeak

# Docker
docker run cmd.cat/espeak espeak

espeak 命令补充说明:


eSpeak 是一款适用于 Linux 和 Windows 的紧凑型开源软件语音合成器,可用于英语和其他语言。http://espeak.sourceforge.net

eSpeak 使用 共振峰合成 方法。这样可以在较小的规模内提供多种语言。语音清晰,可以高速使用,但不如基于人类语音记录的大型合成器自然或流畅。

eSpeak 提供以下版本:

  • 命令行程序(Linux 和 Windows)用于从文件或标准输入读出文本。
  • 供其他程序使用的共享库版本。(在 Windows 上,这是一个 DLL)。
  • 适用于 Windows 的 SAPI5 版本,因此可以与屏幕阅读器和其他支持 Windows SAPI5 接口的程序一起使用。
  • eSpeak 已移植到其他平台,包括 Android、Mac OSX 和 Solaris。

eSpeak 支持语言:南非荷兰语、阿尔巴尼亚语、阿拉贡语、亚美尼亚语、保加利亚语、粤语、加泰罗尼亚语、克罗地亚语、捷克语、丹麦语、荷兰语、英语、世界语、爱沙尼亚语、波斯语、芬兰语、法语、格鲁吉亚语、德语、希腊语、印地语、匈牙利语、冰岛语、印尼语、爱尔兰语、意大利语、卡纳达语、库尔德语、拉脱维亚语、立陶宛语、逻辑语、马其顿语、马来西亚语、马拉雅拉姆语、普通话、尼泊尔语、挪威语、波兰语、葡萄牙语、旁遮普语、罗马尼亚语、俄语、塞尔维亚语、斯洛伐克语、西班牙语、斯瓦希里语、瑞典语、泰米尔语、土耳其语、越南语、威尔士语。

eSpeak 特点:

  • 包括不同的声音,其特征可以改变。
  • 可以生成 WAV 文件的语音输出。
  • 支持 SSML(语音合成标记语言)(不完整),也支持 HTML
  • 体积小巧。该程序及其数据(包括多种语言)总计约 2 MB
  • 可以用作 MBROLA 双音素声音的前端,eSpeak 将文本转换为具有音高和长度信息的音素。
  • 可以将文本转换为音素代码,因此可以将其改编为另一个语音合成引擎的前端。
  • 可能适用于其他语言。其中包括几种处于不同进展阶段的语言。欢迎以这些或其他语言为母语的人提供帮助。
  • 提供用于生成和调整音素数据的开发工具。
  • 用 C 语言编写。

espeak 命令语法:


espeak [options] ["text words"]

espeak 命令选项:


-f <text file>
   Speaks a text file.

--stdin
   Takes the text input from stdin.

If neither -f nor --stdin is given, then the text input is taken from "text words" (a text string within double quotes).
If that is not present then text is taken from stdin, but each line is treated as a separate sentence.

-a <integer>
   Sets amplitude (volume) in a range of 0 to 200. The default is 100.

-p <integer>
   Adjusts the pitch in a range of 0 to 99. The default is 50.

-s <integer>
   Sets the speed in words-per-minute (approximate values for the default English voice, others may differ slightly). The default value is 175. I generally use a faster speed of 260. The lower limit is 80. There is no upper limit, but about 500 is probably a practical maximum.

-b <integer>
   Input text character format.
   1   UTF-8. This is the default.
   2   The 8-bit character set which corresponds to the language (eg. Latin-2 for Polish).
   4   16 bit Unicode.

   Without this option, eSpeak assumes text is UTF-8, but will automatically switch to the 8-bit character set if it finds an illegal UTF-8 sequence.

-g <integer>
   Word gap. This option inserts a pause between words. The value is the length of the pause, in units of 10 mS (at the default speed of 170 wpm).

-h or --help
   The first line of output gives the eSpeak version number.

-k <integer>
   Indicate words which begin with capital letters.
   1   eSpeak uses a click sound to indicate when a word starts with a capital letter, or double click if word is all capitals.
   2   eSpeak speaks the word "capital" before a word which begins with a capital letter.
   Other values:   eSpeak increases the pitch for words which begin with a capital letter. The greater the value, the greater the increase in pitch. Try -k20.


-l <integer>
   Line-break length, default value 0. If set, then lines which are shorter than this are treated as separate clauses and spoken separately with a break between them. This can be useful for some text files, but bad for others.

-m
   Indicates that the text contains SSML (Speech Synthesis Markup Language) tags or other XML tags. Those SSML tags which are supported are interpreted. Other tags, including HTML, are ignored, except that some HTML tags such as <hr> <h2> and <li> ensure a break in the speech.

-q
   Quiet. No sound is generated. This may be useful with options such as -x and --pho.

-v <voice filename>[+<variant>]
   Sets a Voice for the speech, usually to select a language. eg:
      espeak -vaf
   To use the Afrikaans voice. A modifier after the voice name can be used to vary the tone of the voice, eg:
      espeak -vaf+3
   The variants are +m1 +m2 +m3 +m4 +m5 +m6 +m7 for male voices and +f1 +f2 +f3 +f4 which simulate female voices by using higher pitches. Other variants include +croak and +whisper.
   <voice filename> is a file within the espeak-data/voices directory.
   <variant> is a file within the espeak-data/voices/!v directory.

   Voice files can specify a language, alternative pronunciations or phoneme sets, different pitches, tonal qualities, and prosody for the voice. See the voices.html file.
   Voice names which start with mb- are for use with Mbrola diphone voices, see mbrola.html
   Some languages may need additional dictionary data, see languages.html

-w <wave file>
   Writes the speech output to a file in WAV format, rather than speaking it.

-x
   The phoneme mnemonics, into which the input text is translated, are written to stdout. If a phoneme name contains more than one letter (eg. [tS]), the --sep or --tie option can be used to distinguish this from separate phonemes.

-X
   As -x, but in addition, details are shown of the pronunciation rule and dictionary list lookup. This can be useful to see why a certain pronunciation is being produced. Each matching pronunciation rule is listed, together with its score, the highest scoring rule being used in the translation. "Found:" indicates the word was found in the dictionary lookup list, and "Flags:" means the word was found with only properties and not a pronunciation. You can see when a word has been retranslated after removing a prefix or suffix.

-z
   The option removes the end-of-sentence pause which normally occurs at the end of the text.

--stdout
   Writes the speech output to stdout as it is produced, rather than speaking it. The data starts with a WAV file header which indicates the sample rate and format of the data. The length field is set to zero because the length of the data is unknown when the header is produced.

--compile [=<voice name>]
   Compile the pronunciation rule and dictionary lookup data from their source files in the current directory. The Voice determines which language's files are compiled. For example, if it's an English voice, then en_rules, en_list, and en_extra (if present), are compiled to replace en_dict in the speak-data directory. If no Voice is specified then the default Voice is used.

--compile-debug [=<voice name>]
   The same as --compile, but source line numbers from the *_rules file are included. These are included in the rules trace when the -X option is used.

--ipa
   Writes phonemes to stdout, using the International Phonetic Alphabet (IPA).
If a phoneme name contains more than one letter (eg. [tS]), the --sep or --tie option can be used to distinguish this from separate phonemes.

--path [="<directory path>"]
   Specifies the directory which contains the espeak-data directory.

--pho
   When used with an mbrola voice (eg. -v mb-en1), it writes mbrola phoneme data (.pho file format) to stdout. This includes the mbrola phoneme names with duration and pitch information, in a form which is suitable as input to this mbrola voice. The --phonout option can be used to write this data to a file.

--phonout [="<filename>"]
   If specified, the output from -x, -X, --ipa, and --pho options is written to this file, rather than to stdout.

--punct [="<characters>"]
   Speaks the names of punctuation characters when they are encountered in the text. If <characters> are given, then only those listed punctuation characters are spoken, eg. --punct=".,;?"

--sep [=<character>]
   The character is used to separate individual phonemes in the output which is produced by the -x or --ipa options. The default is a space character. The character z means use a ZWNJ character (U+200c).

--split [=<minutes>]
   Used with -w, it starts a new WAV file every <minutes> minutes, at the next sentence boundary.

--tie [=<character>]
   The character is used within multi-letter phonemes in the output which is produced by the -x or --ipa options. The default is the tie character  ͡  U+361. The character z means use a ZWJ character (U+200d).

--voices [=<language code>]
   Lists the available voices.
   If =<language code> is present then only those voices which are suitable for that language are listed.
   --voices=mbrola lists the voices which use mbrola diphone voices. These are not included in the default --voices list
   --voices=variant lists the available voice variants (voice modifiers).

espeak 命令选项:


-f <text file>
   Speaks a text file.

espeak 命令实例:


espeak 大声朗读一句话:

espeak "I like to ride my bike."

espeak 从朗读文本文件内容:

espeak -f path/to/file

espeak 将输出保存为 WAV 音频文件,而不是直接讲出来:

espeak -w command-not-found.wav "Hi, Yang yongyu. It's CommandNotFound"

espeak 使用不同的声音:

espeak -v voice

espeak 扩展阅读:




espeak 命令评论