Files
AudioGPT/NeuralSeq/modules/diff/__pycache__/diffusion.cpython-38.pyc

92 lines
13 KiB
Plaintext
Raw Normal View History

2023-03-20 15:43:44 +08:00
U
2023-03-24 17:19:37 +08:00
<00><>di/<00>@s<>ddlZddlZddlmZddlmZddlmZddlZ ddl
2023-03-20 15:43:44 +08:00
Z
ddl m m Zddl
m Z ddlmZddlmZddlmZdd lmZdd
lmZd d <0C>Zd d<0E>Zdd<10>Zdd<12>ZGdd<14>de j<1C>ZGdd<16>de j<1C>ZGdd<18>de j<1C>ZGdd<1A>de j<1C>Z Gdd<1C>de j<1C>Z!Gdd<1E>de j<1C>Z"Gdd <20>d e j<1C>Z#Gd!d"<22>d"e j<1C>Z$Gd#d$<24>d$e j<1C>Z%d%d&<26>Z&d/d(d)<29>Z'd0d+d,<2C>Z(Gd-d.<2E>d.e j<1C>Z)dS)1<>N)<01>partial)<01>
isfunction)<01>Path)<01>nn)<01>tqdm)<01> rearrange)<01> FastSpeech2)<01>FastSpeech2MIDI)<01>hparamscCs|dk S<00>N<>)<01>xr r <00>R/mnt/sdc/hongzhiqing/github/AudioGPT/text_to_sing/DiffSinger/usr/diff/diffusion.py<70>existssrcCst|<00>r |St|<01>r|<01>S|Sr )rr)<02>val<61>dr r r<00>defaultsrccs|D]
}|VqqdSr r )<02>dl<64>datar r r<00>cyclesrcCs0||}||}|g|}|dkr,|<04>|<03>|S)Nr)<01>append)<05>num<75>divisor<6F>groups<70> remainder<65>arrr r r<00> num_to_groups#s 

rcs$eZdZ<02>fdd<02>Zdd<04>Z<04>ZS)<05>Residualcst<00><00><01>||_dSr )<03>super<65>__init__<5F>fn<66><02>selfr <00><01> __class__r rr-s
zResidual.__init__cOs|j|f|<02>|<03>|Sr )r )r"r <00>args<67>kwargsr r r<00>forward1szResidual.forward<72><06>__name__<5F>
__module__<EFBFBD> __qualname__rr'<00> __classcell__r r r#rr,s rcs$eZdZ<02>fdd<02>Zdd<04>Z<04>ZS)<05>SinusoidalPosEmbcst<00><00><01>||_dSr )rr<00>dim<69>r"r.r#r rr6s
zSinusoidalPosEmb.__init__cCsz|j}|jd}t<02>d<02>|d}t<04>tj||d<04>| <00>}|dd<00>df|ddd<00>f}tj|<04><08>|<04> <09>fdd<06>}|S)N<>i'<00><00><01>device<63><65><EFBFBD><EFBFBD><EFBFBD><EFBFBD>r.)
r3r.<00>math<74>log<6F>torch<63>exp<78>arange<67>cat<61>sin<69>cos)r"r r3<00>half_dim<69>embr r rr':s
 zSinusoidalPosEmb.forwardr(r r r#rr-5s r-c@seZdZdd<02>ZdS)<04>MishcCs|t<00>t<02>|<01><01>Sr )r8<00>tanh<6E>F<>softplus<75>r"r r r rr'Esz Mish.forwardN)r)r*r+r'r r r rr@Dsr@cs$eZdZ<02>fdd<02>Zdd<04>Z<04>ZS)<05>Upsamplecs"t<00><00><01>t<02>||ddd<03>|_dS)N<>r0r1)rrr<00>ConvTranspose2d<32>convr/r#r rrJs
zUpsample.__init__cCs
|<00>|<01>Sr <00>rHrDr r rr'NszUpsample.forwardr(r r r#rrEIs rEcs$eZdZ<02>fdd<02>Zdd<04>Z<04>ZS)<05>
Downsamplecs"t<00><00><01>t<02>||ddd<03>|_dS)N<>r0r1)rrr<00>Conv2drHr/r#r rrSs
zDownsample.__init__cCs
|<00>|<01>Sr rIrDr r rr'WszDownsample.forwardr(r r r#rrJRs rJcs$eZdZ<02>fdd<02>Zdd<04>Z<04>ZS)<05>Rezerocs&t<00><00><01>||_t<03>t<05>d<01><01>|_dS<00>Nr1)rrr r<00> Parameterr8<00>zeros<6F>gr!r#r rr\s
zRezero.__init__cCs|<00>|<01>|jSr )r rQrDr r rr'aszRezero.forwardr(r r r#rrM[s rMcs&eZdZd<06>fdd<03> Zdd<05>Z<04>ZS)<07>Block<63>cs6t<00><00><01>t<02>tj||ddd<03>t<02>||<02>t<06><00>|_dS)NrKr1)<01>padding)rrr<00>
SequentialrL<00> GroupNormr@<00>block)r"r.<00>dim_outrr#r rrhs 

<04>zBlock.__init__cCs
|<00>|<01>Sr )rWrDr r rr'psz Block.forward)rSr(r r r#rrRgsrRcs*eZdZdd<02><01>fdd<04>
Zdd<06>Z<04>ZS)<07> ResnetBlockrS)rcs^t<00><00><01>t<02>t<04>t<02>||<02><02>|_t||<02>|_t||<02>|_ ||krPt<02>
||d<01>nt<02> <0B>|_ dSrN) rrrrUr@<00>Linear<61>mlprR<00>block1<6B>block2rL<00>Identity<74>res_conv)r"r.rXZ time_emb_dimrr#r rrus

2023-03-24 17:19:37 +08:00
<EFBFBD>  zResnetBlock.__init__cCsD|<00>|<01>}||<00>|<02>dd<00>dd<00>ddf7}|<00>|<03>}||<00>|<01>Sr )r\r[r]r_)r"r <00>time_emb<6D>hr r rr'<00>s
2023-03-20 15:43:44 +08:00
"
2023-03-24 17:19:37 +08:00
zResnetBlock.forwardr(r r r#rrYts rYcs&eZdZd<07>fdd<04> Zdd<06>Z<04>ZS)<08>LinearAttentionrF<00> csDt<00><00><01>||_||}tj||dddd<04>|_t<03>||d<02>|_dS)NrKr1F)<01>bias)rr<00>headsrrL<00>to_qkv<6B>to_out)r"r.re<00>dim_head<61>
2023-03-20 15:43:44 +08:00
hidden_dimr#r rr<00>s

zLinearAttention.__init__c Csv|j\}}}}|<00>|<01>}t|d|jdd<03>\}}} |jdd<05>}t<05>d|| <09>}
t<05>d|
|<07>} t| d|j||d <09>} |<00>| <0B>S)
2023-03-24 17:19:37 +08:00
Nz*b (qkv heads c) h w -> qkv b heads c (h w)rK)re<00>qkvr4r5zbhdn,bhen->bhdezbhde,bhdn->bhenz"b heads c (h w) -> b (heads c) h w)rera<00>w)<08>shaperfrre<00>softmaxr8<00>einsumrg) r"r <00>b<>crarkrj<00>q<>k<>v<>context<78>outr r rr'<00>s
 zLinearAttention.forward)rFrcr(r r r#rrb<00>srbcCs2|j^}}|<00>d|<01>}|j|fdt|<02>d<00><02>S)Nr4<00>r1r1)rl<00>gather<65>reshape<70>len)<06>a<>t<>x_shapero<00>_rur r r<00>extract<63>s
 r~Fcs,<00><00>fdd<02>}<03><00>fdd<02>}|r&|<03>S|<04>S)Ncs6tjd<05>dd<00><00><02>d<02>j<02>dfdt<03><01>d<00><02>S)Nr1r2rrv)r1)r8<00>randn<64>repeatryr <00>r3rlr r<00><lambda><3E><00>znoise_like.<locals>.<lambda>cstj<01><01>d<01>S)Nr2)r8rr r<>r rr<><00>r<>r )rlr3r<><00> repeat_noise<73>noiser r<>r<00>
noise_like<EFBFBD>sr<><00><><EFBFBD><EFBFBD><EFBFBD>Mb<4D>?cCsv|d}t<00>d||<02>}t<00>|||d|tjd<00>d}||d}d|dd<05>|dd<06>}tj|ddd<08>S) zW
2023-03-20 15:43:44 +08:00
cosine schedule
as proposed in https://openreview.net/forum?id=-NEXDKk8gZ
2023-03-24 17:19:37 +08:00
r1r<00><00>?r0Nr4g+<2B><16><><EFBFBD><EFBFBD>?)<02>a_min<69>a_max)<05>np<6E>linspacer=<00>pi<70>clip)<06> timesteps<70>s<>stepsr <00>alphas_cumprod<6F>betasr r r<00>cosine_beta_schedule<6C>s ( r<>cs<>eZdZd!<21>fdd<05> Zdd<07>Zdd <09>Zd
2023-03-20 15:43:44 +08:00
d <0B>Zed <0C>d d<0E>Ze <09>
<EFBFBD>d"dd<12><01>Z d#dd<14>Z d$dd<16>Z d%dd<18>Zdd<1A>Zdd<1C>Zdd<1E>Zdd <20>Z<12>ZS)&<26>GaussianDiffusion<6F><6E><00>l1Nc 
sZt<00><00><01>||_t<03>d<01>dk r4tdr4t||<02>|_n t||<02>|_d|j_||_ t
|<06>rxt |t j <0A>rr|<06><0E><00><0F><00><10>n|}nt|<04>}d|} tj| dd<04>}
t<12>d|
dd<05><00>} |j\}t|<04>|_||_tt jt jd<06>} |<00>d| |<06><01>|<00>d| |
<EFBFBD><01>|<00>d | | <0B><01>|<00>d
| t<12>|
<EFBFBD><01><01>|<00>d | t<12>d|
<00><01><01>|<00>d | t<12>d|
<00><01><01>|<00>d | t<12>d|
<00><01><01>|<00>d| t<12>d|
d<00><01><01>|d| d|
} |<00>d| | <0A><01>|<00>d| t<12>t<12>| d<12><02><01><01>|<00>d| |t<12>| <0B>d|
<00><01>|<00>d| d| t<12>| <09>d|
2023-03-24 17:19:37 +08:00
<00><01>|<00>dt <0C> |<07>dddtd<00>f<00>|<00>dt <0C> |<08>dddtd<00>f<00>dS)N<>use_midi<64><00>?r)<01>axisr4)<01>dtyper<65>r<><00>alphas_cumprod_prev<65>sqrt_alphas_cumprod<6F>sqrt_one_minus_alphas_cumprod<6F>log_one_minus_alphas_cumprod<6F>sqrt_recip_alphas_cumprod<6F>sqrt_recipm1_alphas_cumprodr1<00>posterior_variance<63>posterior_log_variance_clippedg#B<> <0C><><EFBFBD>;<3B>posterior_mean_coef1<66>posterior_mean_coef2<66>spec_min<69> keep_bins<6E>spec_max)!rr<00>
2023-03-20 15:43:44 +08:00
denoise_fnr
<00>getr <00>fs2r<00>decoder<65>mel_binsr<00>
2023-03-24 17:19:37 +08:00
isinstancer8<00>Tensor<6F>detach<63>cpu<70>numpyr<79>r<><00>cumprodrrl<00>int<6E> num_timesteps<70> loss_typer<00>tensor<6F>float32<33>register_buffer<65>sqrtr7<00>maximum<75> FloatTensor)r"<00> phone_encoder<65>out_dimsr<73>r<>r<>r<>r<>r<><00>alphasr<73>r<><00>to_torchr<68>r#r rr<00>sH
2023-03-20 15:43:44 +08:00
 "
2023-03-24 17:19:37 +08:00
<14><18>$zGaussianDiffusion.__init__cCsBt|j||j<02>|}td|j||j<02>}t|j||j<02>}|||fS)Nr<4E>)r~r<>rlr<>r<>)r"<00>x_startr{<00>mean<61>variance<63> log_variancer r r<00>q_mean_variance<63>sz!GaussianDiffusion.q_mean_variancecCs(t|j||j<02>|t|j||j<02>|Sr )r~r<>rlr<>)r"<00>x_tr{r<>r r r<00>predict_start_from_noise<73>s<12><02>z*GaussianDiffusion.predict_start_from_noisecCsRt|j||j<02>|t|j||j<02>|}t|j||j<02>}t|j||j<02>}|||fSr )r~r<>rlr<>r<>r<>)r"r<>r<>r{<00>posterior_meanr<6E>r<>r r r<00> q_posterior<6F>s<12><02>zGaussianDiffusion.q_posterior)<01> clip_denoisedc
CsP|j|||d<01>}|j|||d<02>}|r0|<06>dd<04>|j|||d<05>\}}} ||| fS)N)<01>cond)r{r<>g<00><>r<EFBFBD>)r<>r<>r{)r<>r<><00>clamp_r<5F>)
r"r r{r<>r<><00>
2023-03-20 15:43:44 +08:00
noise_pred<EFBFBD>x_recon<6F>
2023-03-24 17:19:37 +08:00
model_meanr<EFBFBD><00>posterior_log_variancer r r<00>p_mean_variance<63>s  z!GaussianDiffusion.p_mean_varianceTFc Cs~|j|jf<01><02>^}}}|j||||d<01>\} }}
2023-03-20 15:43:44 +08:00
t|j||<05>} d|dk<02><04>j|fdt|j<00>d<00><02>} | | d|
2023-03-24 17:19:37 +08:00
<00><07>| S)N)r r{r<>r<>r1rrvr<>)rlr3r<>r<><00>floatrxryr9) r"r r{r<>r<>r<>ror}r3r<><00>model_log_variancer<65><00> nonzero_maskr r r<00>p_samples
2023-03-20 15:43:44 +08:00
*zGaussianDiffusion.p_samplecs:t|<03>fdd<02><08>}t|j|<02>j<03><03>t|j|<02>j<03>|S)Ncs
t<00><01><00>Sr <00>r8<00>
2023-03-24 17:19:37 +08:00
randn_liker <00>r<>r rr<>r<>z,GaussianDiffusion.q_sample.<locals>.<lambda>)rr~r<>rlr<>)r"r<>r{r<>r r<>r<00>q_samples
2023-03-20 15:43:44 +08:00
<12><02>zGaussianDiffusion.q_samplec s<>t|<04>fdd<02><08>}|j<01>||d<03>}|<00>|||<03>}|jdkrp|dk r^||<00><04>|<05>d<05><00><06>}q<>||<00><04><00><06>}n|jdkr<>t<07>||<07>}nt <09><00>|S)Ncs
2023-03-24 17:19:37 +08:00
t<00><01><00>Sr r<>r r<>r rr<>r<>z,GaussianDiffusion.p_losses.<locals>.<lambda>)r<>r{r<>r<>r1<00>l2)
rr<>r<>r<><00>abs<62> unsqueezer<65>rB<00>mse_loss<73>NotImplementedError) r"r<>r{r<>r<><00>
nonpadding<EFBFBD>x_noisyr<79><00>lossr r<>r<00>p_lossess
2023-03-20 15:43:44 +08:00

zGaussianDiffusion.p_lossesc  CsL|j|jf<01><02>^} }
} |j|||||||d|d<02> } | d<00>dd<05>} |s<>tjd|j| f| d<07><04><07>}|}|<00>|<0F>}|<0F>dd<05>dd<00>ddd<00>dd<00>f}|dk<03> <09>}|j
||| |d<08>| d <n<>|j}| jdd|j | jdf}tj || d<07>}t ttd|<0E><02>d
2023-03-24 17:19:37 +08:00
|d <0B>D]$}|<00>|tj| f|| tjd <0C>| <0A>}q<>|dd<00>df<00>dd<05>}|<00>|<0F>| d <| S)NT)<02> skip_decoder<65>infer<65> decoder_inpr1r0rr2)r<><00> diff_losszsample time step)<02>desc<73>total)r3r<><00>mel_out)rlr3r<><00> transposer8<00>randintr<74><00>long<6E> norm_specr<63>r<>r<>rr<00>reversed<65>ranger<65><00>full<6C> denorm_spec)r"<00>
txt_tokens<EFBFBD>mel2ph<70> spk_embed<65>ref_mels<6C>f0<66>uv<75>energyr<79>ror}r3<00>retr<74>r{r r<>rl<00>ir r rr',s*<02>
$ "zGaussianDiffusion.forwardcCs||j|j|jddS)Nr0r1)r<>r<>rDr r rr<>DszGaussianDiffusion.norm_speccCs|dd|j|j|jS)Nr1r0)r<>r<>rDr r rr<>GszGaussianDiffusion.denorm_speccCs|j<00>||||<04>Sr )r<><00> cwt2f0_norm)r"<00>cwt_specr<63><00>stdr<64>r r rr<>JszGaussianDiffusion.cwt2f0_normcCs|Sr r rDr r r<00>out2melMszGaussianDiffusion.out2mel)r<>r<>NNN)TF)N)NN)NNNNNNF)r)r*r+rr<>r<>r<><00>boolr<6C>r8<00>no_gradr<64>r<>r<>r'r<>r<>r<>r<>r,r r r#rr<><00>s2<00>3 
2023-03-20 15:43:44 +08:00
 

<00>
2023-03-24 17:19:37 +08:00
r<>)F)r<>)*r6<00>random<6F> functoolsr<00>inspectr<00>pathlibrr<>r<>r8<00>torch.nn.functionalr<00>
2023-03-20 15:43:44 +08:00
functionalrBr<00>einopsr<00>modules.fastspeech.fs2r<00>modules.diffsinger_midi.fs2r <00> utils.hparamsr
2023-03-24 17:19:37 +08:00
rrrr<00>Modulerr-r@rErJrMrRrYrbr~r<>r<>r<>r r r r<00><module>s<               
2023-03-20 15:43:44 +08:00