2023-09-08 12:40:31 +02:00
# 🐢 Tortoise
2023-05-16 04:28:21 +05:30
Tortoise is a very expressive TTS system with impressive voice cloning capabilities. It is based on an GPT like autogressive acoustic model that converts input
2023-12-01 20:42:41 +02:00
text to discritized acoustic tokens, a diffusion model that converts these tokens to melspectrogram frames and a Univnet vocoder to convert the spectrograms to
2023-06-28 11:55:27 +02:00
the final audio signal. The important downside is that Tortoise is very slow compared to the parallel TTS models like VITS.
2023-05-16 04:28:21 +05:30
Big thanks to 👑[@manmay-nakhashi ](https://github.com/manmay-nakhashi ) who helped us implement Tortoise in 🐸TTS.
Example use:
```python
from TTS.tts.configs.tortoise_config import TortoiseConfig
from TTS.tts.models.tortoise import Tortoise
config = TortoiseConfig()
2023-06-28 11:55:27 +02:00
model = Tortoise.init_from_config(config)
2023-05-16 04:28:21 +05:30
model.load_checkpoint(config, checkpoint_dir="paths/to/models_dir/", eval=True)
# with random speaker
output_dict = model.synthesize(text, config, speaker_id="random", extra_voice_dirs=None, **kwargs)
# cloning a speaker
output_dict = model.synthesize(text, config, speaker_id="speaker_n", extra_voice_dirs="path/to/speaker_n/", **kwargs)
```
Using 🐸TTS API:
```python
from TTS.api import TTS
tts = TTS("tts_models/en/multi-dataset/tortoise-v2")
# cloning `lj` voice from `TTS/tts/utils/assets/tortoise/voices/lj`
2023-06-28 11:55:27 +02:00
# with custom inference settings overriding defaults.
tts.tts_to_file(text="Hello, my name is Manmay , how are you?",
2023-05-16 04:28:21 +05:30
file_path="output.wav",
2023-06-28 11:55:27 +02:00
voice_dir="path/to/tortoise/voices/dir/",
2023-05-16 04:28:21 +05:30
speaker="lj",
num_autoregressive_samples=1,
diffusion_iterations=10)
# Using presets with the same voice
2023-06-28 11:55:27 +02:00
tts.tts_to_file(text="Hello, my name is Manmay , how are you?",
2023-05-16 04:28:21 +05:30
file_path="output.wav",
2023-06-28 11:55:27 +02:00
voice_dir="path/to/tortoise/voices/dir/",
2023-05-16 04:28:21 +05:30
speaker="lj",
preset="ultra_fast")
# Random voice generation
2023-06-28 11:55:27 +02:00
tts.tts_to_file(text="Hello, my name is Manmay , how are you?",
2023-05-16 04:28:21 +05:30
file_path="output.wav")
```
Using 🐸TTS Command line:
```console
# cloning the `lj` voice
tts --model_name tts_models/en/multi-dataset/tortoise-v2 \
2023-06-28 11:55:27 +02:00
--text "This is an example." \
--out_path "output.wav" \
--voice_dir path/to/tortoise/voices/dir/ \
2023-05-16 04:28:21 +05:30
--speaker_idx "lj" \
--progress_bar True
# Random voice generation
tts --model_name tts_models/en/multi-dataset/tortoise-v2 \
--text "This is an example." \
2023-06-28 11:55:27 +02:00
--out_path "output.wav" \
2023-05-16 04:28:21 +05:30
--progress_bar True
```
## Important resources & papers
- Original Repo: https://github.com/neonbjb/tortoise-tts
- Faster implementation: https://github.com/152334H/tortoise-tts-fast
- Univnet: https://arxiv.org/abs/2106.07889
- Latent Diffusion:https://arxiv.org/abs/2112.10752
- DALL-E: https://arxiv.org/abs/2102.12092
## TortoiseConfig
```{eval-rst}
.. autoclass:: TTS.tts.configs.tortoise_config.TortoiseConfig
:members:
```
## TortoiseArgs
```{eval-rst}
.. autoclass:: TTS.tts.models.tortoise.TortoiseArgs
:members:
```
## Tortoise Model
```{eval-rst}
.. autoclass:: TTS.tts.models.tortoise.Tortoise
:members:
```