Files
AudioGPT/NeuralSeq/modules/diff/__pycache__/candidate_decoder.cpython-37.pyc

36 lines
3.1 KiB
Plaintext
Raw Normal View History

2023-03-20 15:43:44 +08:00
B
9d<><00>@s<>ddlmZddlZddlmZddlmZddlZddlm Z ddl
m Z ej Z Gdd<08>dej <0A>Zd d
<EFBFBD>ZGd d <0C>d e<01>ZdS) <0A>)<01>FastspeechDecoderN)<01>
functional)<01>hparams<6D>)<01>Mishcs$eZdZ<02>fdd<02>Zdd<04>Z<04>ZS)<05>SinusoidalPosEmbcst<00><00><01>||_dS)N)<03>super<65>__init__<5F>dim)<02>selfr
)<01> __class__<5F><00>./usr/diff/candidate_decoder.pyr s
zSinusoidalPosEmb.__init__cCsz|j}|jd}t<02>d<02>|d}t<04>tj||d<04>| <00>}|dd<00>df|ddd<00>f}tj|<04><08>|<04> <09>fdd<06>}|S)N<>i'r)<01>device<63><65><EFBFBD><EFBFBD><EFBFBD>)r
)
rr
<00>math<74>log<6F>torch<63>exp<78>arange<67>cat<61>sin<69>cos)r <00>xr<00>half_dim<69>embr r r<00>forwards
 zSinusoidalPosEmb.forward)<06>__name__<5F>
__module__<EFBFBD> __qualname__r r<00> __classcell__r r )r rrs rcOstj||<01>}tj<02>|j<04>|S)N)<05>nn<6E>Conv1d<31>init<69>kaiming_normal_<6C>weight)<03>args<67>kwargs<67>layerr r rr#s r#cs(eZdZd<07>fdd<03> Zddd<06>Z<04>ZS) <09>FFTNcs<>t<00>j||||d<01>td}ttd|d<04>|_t|<05>|_t<07>t<07> ||d<00>t
<EFBFBD>t<07> |d|<05><02>|_ t tdddd <09>|_ t td||td<00>|_ dS)
N)<01> num_heads<64>residual_channels<6C>audio_num_mel_binsr<00><00> hidden_size<7A>PT)<01>bias)rr rr#<00>input_projectionr<00>diffusion_embeddingr"<00>
Sequential<EFBFBD>Linearr<00>mlp<6C> get_mel_out<75>get_decode_inp)r r/<00>
num_layers<EFBFBD> kernel_sizer+r
)r r rr $s
z FFT.__init__FcCs<>|dd<01>df}|<00>|<07><01>dddg<03>}|<00>|<02>}|<00>|<02>}|<03>dddg<03>}|jd}|dd<01>ddd<01>f} | <09>d|dg<03>} tj||| gdd<06>}
|<00>|
<EFBFBD>}
|
}|dkr<>|<07> <09><00>
d<05><01> d<02>j n|}d|<04> dd<04><02><0E>dd<01>dd<01>df} |j<0F>r|j|<00>|d<00>} || }tj||j|jd<08>}|<07> dd<04>| }g} x,|jD]"}||||d <09>| }| <0A>|<07><00>q8W|j<17>rt|<00>|<07>| }|<06>r<>t<06>| d<02>}|<07> dd<03>}n |<07> dd<04>}|<00>|<07><01>dddg<03>}|dd<01>ddd<01>dd<01>fS)
z<EFBFBD>
:param spec: [B, 1, 80, T]
:param diffusion_step: [B, 1]
:param cond: [B, M, T]
:return:
Nrrrr)r
).r)<02>p<>training)<02>encoder_padding_mask<73> attn_mask)r2<00>permuter3r6<00>shape<70>repeatrrr8<00>abs<62>sum<75>eq<65>data<74> transpose<73>float<61> use_pos_embed<65>pos_embed_alpha<68>embed_positions<6E>F<>dropoutr<<00>layers<72>append<6E> use_last_norm<72>
layer_norm<EFBFBD>stackr7)r <00>spec<65>diffusion_step<65>cond<6E> padding_maskr><00>return_hiddensr<00>seq_lenZ
time_embed<EFBFBD> decoder_inp<6E>nonpadding_mask_TB<54> positions<6E>hiddensr)r r rr2s<



"&   z FFT.forward)NNNN)NNF)rrr r rr!r r )r rr*#sr*)<11>modules.fastspeech.tts_modulesrr<00>torch.nnrrKr"r<00> utils.hparamsr<00> diffusionrr5<00>Modulerr#r*r r r r<00><module>s