Commit Graph

4134 Commits

Author SHA1 Message Date
Eren Gölge
dba2c3570a Update readme (#1978) 2022-09-16 12:01:46 +02:00
Julian Weber
896e46d0e5 Fix vc (#1971) 2022-09-16 12:01:26 +02:00
Eren Gölge
b95cf3363c Prevent installing mecab-ko (#1967) 2022-09-14 10:28:07 +02:00
Eren Gölge
9e5a469c64 d-vector handling (#1945)
* Update BaseDatasetConfig

- Add dataset_name
- Chane name to formatter_name

* Update compute_embedding

- Allow entering dataset by args
- Use released model by default
- Use the new key format

* Update loading

* Update recipes

* Update other dep code

* Update tests

* Fixup

* Load multiple embedding files

* Fix argument names in dep code

* Update docs

* Fix argument name

* Fix linter
2022-09-13 14:10:33 +02:00
Edresson Casanova
371772c355 Replace pyworld by pyin (#1946)
* Replace pyworld by pyin

* Fix unit tests
2022-09-09 10:43:14 +02:00
happylittlecat
4546b4cbd8 Add espeak support for Chinese (#1905)
* fix description

* add espeak support for chinese

* add espeak support for chinese
2022-09-08 12:32:41 +02:00
harmlessman
5abbe56642 Korean Phonemizer (#1822)
* Update requirements.txt

install jamo for korean

* Update formatters.py

add KSS formatter

KSS is a korean single speech dataset (12hours)

* Add files via upload

add phonemizer for korean

* Add files via upload

add korean phonemizer

* Update requirements.txt

* change code style with `black` and `pylint`

* reflecting pylint's Evaluation

* reflecting pylint's Evaluation

* reflecting pylint's Evaluation-2

* isort

* edit about separator
write test case and add 'nltk' for requirements.txt

* add korean g2p (g2pkk)

* isort

* TTS/tts/utils/text/phonemizers/ko_kr_phonemizer.py:43:24: W0621: Redefining name 'text' from outer scope (line 58) (redefined-outer-name)

TTS/tts/utils/text/korean/korean.py:28:8: R1705: Unnecessary "else" after "return" (no-else-return)

* black
2022-09-08 12:06:07 +02:00
Edresson Casanova
98aa6261d1 Add YourTTS and SC-GlowTTS on available models (#1933) 2022-09-08 11:10:39 +02:00
Edresson Casanova
159eeeef64 Fix find unique phonemes script (#1928)
* Fix find unique phonemes script

* Fix unit tests
2022-09-08 10:17:35 +02:00
KyuubiYoru
3b7dff568a Fixes a race condition with multiple simultaneous get requests. (#1807)
* Fixes a race condition with multiple simultaneous get requests.

* Removed unused import

* Removed unused threading import

* Changed lock style to notation

* make style

Co-authored-by: WeberJulian <julian.weber@hotmail.fr>
2022-09-08 10:16:16 +02:00
Julian Weber
bb59718c03 Add capacitron v2 model (#1768)
* Add capacitron v2 in .models.json

* Put right commit hash
2022-09-08 09:43:56 +02:00
Edresson Casanova
096b35f639 Add VCTK speaker encoder recipe (#1912) 2022-08-26 16:19:03 +02:00
Eren Gölge
e5430a6519 Add new DE Thorsten models (#1898)
- Tacotron2-DDC
- HifiGAN vocoder
2022-08-22 11:27:39 +02:00
Eren G??lge
8845f06fd9 Bump up to v0.8.0 2022-08-22 11:26:47 +02:00
Stanislav Kachnov
2c9f00a808 Fix tune wavegrad (#1844)
* fix imports in tune_wavegrad

* load_config returns Coqpit object instead None

* set action (store true) for flag "--use_cuda"; start to tune if module is running as the main program

* fix var order in the result of batch collating

* make style

* make style with black and isort
v0.8.0_models
2022-08-22 09:55:32 +02:00
Eren Gölge
fcb0bb58ae Handle when no batch sampler (#1882) 2022-08-18 11:26:04 +02:00
Eren Gölge
7442bcefa5 Remove deprecated files (#1873)
- samplers.py is moved
- distribute.py is replaces by the 👟Trainer
2022-08-15 12:16:37 +02:00
Eren Gölge
4333492341 Fix BCE loss issue (#1872)
* Fix BCE loss issue

* Remove import
2022-08-15 11:27:21 +02:00
jchai.me
c30b6485ea updates to dataset analysis notebooks for compatibility with latest version of TTS (#1853) 2022-08-15 11:11:07 +02:00
manmay nakhashi
e4db7c51b5 Update capacitron_layers.py (#1664)
crashing because of dimension miss match   at line no. 57
[batch, 256] vs [batch , 1, 512]
enc_out = torch.cat([enc_out, speaker_embedding], dim=-1)
2022-08-15 11:08:50 +02:00
Eren Gölge
bfc63829ac Implement bucketed weighted sampling for VITS (#1871) 2022-08-15 11:08:11 +02:00
Eren Gölge
d46fbc240c Introduce numpy and torch transforms (#1705)
* Refactor audio processing functions

* Add tests for numpy transforms

* Fix imports

* Fix imports2
2022-08-08 11:57:50 +02:00
manmay nakhashi
7fd9b89ebf fix get_random_embeddings --> get_random_embedding (#1726)
* fix get_random_embeddings --> get_random_embedding

function typo leads to training crash, no such function

* fix typo

get_random_embedding
2022-08-07 14:06:03 +02:00
rbaraglia
75ac9e3f0c Fix language flags generated by espeak-ng phonemizer (#1801)
* fix language flags generated by espeak-ng phonemizer

* Style

* Updated language flag regex to consider all language codes alike
2022-08-07 13:57:40 +02:00
Lars Kiesow
8c645080ac Adjust default to be able to process longer sentences (#1835)
Running `tts --text "$text" --out_path …` with a somewhat longer
sentences in the text will lead to warnings like “Decoder stopped with
max_decoder_steps 500” and the sentences just being cut off in the
resulting WAV file.

This happens quite frequently when feeding longer texts (e.g. a blog
post) to `tts`. It's particular frustrating since the error is not
always obvious in the output. You have to notice that there are missing
parts. This is something other users seem to have run into as well [1].

This patch simply increases the maximum number of steps allowed for the
tacotron decoder to fix this issue, resulting in a smoother default
behavior.

[1] https://github.com/mozilla/TTS/issues/734
2022-08-07 13:51:29 +02:00
p0p4k
903a77c197 Update wavenet.py (#1796)
* Update wavenet.py

Current version does not use "in_channels" argument. 
In glowTTS, we use normalizing flows and so "input dim" == "ouput dim" (channels and length). So, the existing code just uses hidden_channel sized tensor as input to first layer as well as outputs hidden_channel sized tensor. 
However, since it is a generic implementation, I believe it is better to update it for a more general use.

* "in_channels -> hidden_channels"
2022-08-01 12:20:37 +02:00
p0p4k
4fe50801b5 Update README.md; download progress bar in CLI. (#1797)
* Update README.md

- minor PR
- added model_info usage guide based on #1623 in README.md .

* "added tqdm bar for model download"

* Update manage.py

* fixed style

* fixed style

* sort imports
2022-08-01 12:17:47 +02:00
p0p4k
d9bad91a66 Update requirements.txt; inflect==5.6 (#1809)
New inflect version (6.0) depends on pydantic which has some issues irrelevant to 🐸 TTS. #1808 
Force inflect==5.6 (pydantic free) install to solve dependency issue.
2022-08-01 11:48:02 +02:00
Eren G??lge
7d8b1665c8 Fix rand_segment edge case (input_len == seg_len - 1) 2022-08-01 11:37:45 +02:00
vanIvan
5094499eba Fix & update WaveRNN vocoder model (#1749)
* Fixes KeyError bug. Adding logging to dashboard.

* Make pep8 compliant

* Make style compliant

* Still fixing style
2022-07-26 15:05:11 +02:00
Yuri Pourre
1a065fa6ed Update README.md (#1776)
Fix typo in different and code sample
2022-07-26 13:28:21 +02:00
p0p4k
669966d963 Update requirements.txt (#1791)
Support for #1775
2022-07-26 13:06:40 +02:00
p0p4k
10195c4eba Update decoder.py (#1792)
Minor comment correction.
2022-07-26 13:06:06 +02:00
Tsai Meng-Ting
9d32cbc3db Fix type in download_vctk.sh (#1739)
typo in comment
2022-07-20 12:27:42 +02:00
ivan provalov
903d9c791a Fix for FloorDiv Function Warning (#1760)
* Fix for Floor Function Warning

Fix for Floor Function Warning

* Adding double quotes to fix formatting

Adding double quotes to fix formatting

* Update glow_tts.py

* Update glow_tts.py
2022-07-20 11:31:22 +02:00
WeberJulian
4f31402227 Fix aux tests (#1753)
* Set n_jobs to 1 for resample script

* Delete resample test

* Set n_jobs 1 in vad test

* delete vad test

* Revert "Delete resample test"

This reverts commit bb7c8466af.

* Remove tests with resample
2022-07-19 10:06:31 +02:00
Eren Gölge
f7587fc134 Fix SSIM loss correction 2022-07-13 10:47:12 +02:00
Eren Gölge
bc1f93c299 Fix device allocation 2022-07-12 19:05:25 +02:00
Eren Gölge
49bac724c0 Implement VitsAudioConfig (#1556)
* Implement VitsAudioConfig

* Update VITS LJSpeech recipe

* Update VITS VCTK recipe

* Make style

* Add missing decorator

* Add missing param

* Make style

* Update recipes

* Fix test

* Bug fix

* Exclude tests folder

* Make linter

* Make style
2022-07-12 18:49:58 +02:00
a-froghyar
34b80e0280 feat: updated recipes and lr fix (#1718)
- updated the recipes activating more losses for more stable training
- re-enabling guided attention loss
- fixed a bug about not the correct lr fetched for logging
2022-07-12 15:00:53 +02:00
Eren G??lge
48a4f3647f Make lint 2022-07-12 14:58:26 +02:00
WeberJulian
c614f21982 Add durations as aux input for VITS (#1694)
* Add durations as aux input for VITS

* Make style

* Fix tts_tests

* Fix test_get_aux_input
2022-07-12 14:25:21 +02:00
Eren G??lge
2cf89b88c9 Make style 2022-07-12 14:12:57 +02:00
Eren G??lge
a6f73a18cb Fix BCELoss adressing #1192 2022-07-12 14:11:34 +02:00
Eren G??lge
eefd482f51 Separate loss tests 2022-07-12 12:35:46 +02:00
Eren G??lge
c17ff17a18 Fix SSIM loss 2022-07-12 12:35:24 +02:00
Eren G??lge
f1e35596e8 Remove redundant config field 2022-07-11 13:39:41 +02:00
WeberJulian
5cef6facb0 Fix tokenizer for punc only (#1717) 2022-07-06 22:59:41 +02:00
WeberJulian
9e00e31e37 Fix Publish CI (#1597)
* Try out manylinux

* temporary removal of useless pipeline

* remove check and use only manylinux

* Try --plat-name

* Add install requirements

* Add back other actions

* Add PR trigger

* Remove conditions

* Fix sythax

* Roll back some changes

* Add other python versions

* Add test pypi upload

* Add username

* Add back __token__ as username

* Modify name of entry to testpypi

* Set it to release only

* Fix version checking
2022-07-05 11:07:33 +02:00
camillem
5c821d9fa1 Fix the --model_name and --vocoder_name arguments need a <model_type> element (#1469)
Co-authored-by: Eren Gölge <erogol@hotmail.com>
2022-06-27 10:32:43 +02:00