Files
AudioGPT/NeuralSeq/modules/diff/__pycache__/diffusion.cpython-37.pyc

94 lines
13 KiB
Plaintext
Raw Normal View History

2023-03-20 15:43:44 +08:00
B
2023-03-24 17:19:37 +08:00
<00>Xdi/<00>@s<>ddlZddlZddlmZddlmZddlmZddlZ ddl
2023-03-20 15:43:44 +08:00
Z
ddl m m Zddl
m Z ddlmZddlmZddlmZdd lmZdd
lmZd d <0C>Zd d<0E>Zdd<10>Zdd<12>ZGdd<14>de j<1C>ZGdd<16>de j<1C>ZGdd<18>de j<1C>ZGdd<1A>de j<1C>Z Gdd<1C>de j<1C>Z!Gdd<1E>de j<1C>Z"Gdd <20>d e j<1C>Z#Gd!d"<22>d"e j<1C>Z$Gd#d$<24>d$e j<1C>Z%d%d&<26>Z&d/d(d)<29>Z'd0d+d,<2C>Z(Gd-d.<2E>d.e j<1C>Z)dS)1<>N)<01>partial)<01>
2023-03-24 17:19:37 +08:00
isfunction)<01>Path)<01>nn)<01>tqdm)<01> rearrange)<01> FastSpeech2)<01>FastSpeech2MIDI)<01>hparamscCs|dk S)N<>)<01>xr r <00>U/mnt/sdc/hongzhiqing/code/audio_chatgpt/text_to_sing/DiffSinger/usr/diff/diffusion.py<70>existssrcCst|<00>r |St|<01>r|<01>S|S)N)rr)<02>val<61>dr r r <00>defaultsrccsxx|D]
2023-03-20 15:43:44 +08:00
}|VqWqWdS)Nr )<02>dl<64>datar r r <00>cycles
rcCs0||}||}|g|}|dkr,|<04>|<03>|S)Nr)<01>append)<05>num<75>divisor<6F>groups<70> remainder<65>arrr r r <00> num_to_groups#s 

rcs$eZdZ<02>fdd<02>Zdd<04>Z<04>ZS)<05>Residualcst<00><00><01>||_dS)N)<03>super<65>__init__<5F>fn)<02>selfr)<01> __class__r r r-s
zResidual.__init__cOs|j|f|<02>|<03>|S)N)r)r r <00>args<67>kwargsr r r <00>forward1szResidual.forward)<06>__name__<5F>
__module__<EFBFBD> __qualname__rr$<00> __classcell__r r )r!r r,s rcs$eZdZ<02>fdd<02>Zdd<04>Z<04>ZS)<05>SinusoidalPosEmbcst<00><00><01>||_dS)N)rr<00>dim)r r*)r!r r r6s
zSinusoidalPosEmb.__init__cCsz|j}|jd}t<02>d<02>|d}t<04>tj||d<04>| <00>}|dd<00>df|ddd<00>f}tj|<04><08>|<04> <09>fdd<06>}|S)N<>i'<00>)<01>device<63><65><EFBFBD><EFBFBD><EFBFBD>)r*)
r-r*<00>math<74>log<6F>torch<63>exp<78>arange<67>cat<61>sin<69>cos)r r r-<00>half_dim<69>embr r r r$:s
 zSinusoidalPosEmb.forward)r%r&r'rr$r(r r )r!r r)5s r)c@seZdZdd<02>ZdS)<04>MishcCs|t<00>t<02>|<01><01>S)N)r1<00>tanh<6E>F<>softplus)r r r r r r$Esz Mish.forwardN)r%r&r'r$r r r r r9Dsr9cs$eZdZ<02>fdd<02>Zdd<04>Z<04>ZS)<05>Upsamplecs"t<00><00><01>t<02>||ddd<03>|_dS)N<>r+r,)rrr<00>ConvTranspose2d<32>conv)r r*)r!r r rJs
zUpsample.__init__cCs
|<00>|<01>S)N)r@)r r r r r r$NszUpsample.forward)r%r&r'rr$r(r r )r!r r=Is r=cs$eZdZ<02>fdd<02>Zdd<04>Z<04>ZS)<05>
Downsamplecs"t<00><00><01>t<02>||ddd<03>|_dS)N<>r+r,)rrr<00>Conv2dr@)r r*)r!r r rSs
zDownsample.__init__cCs
|<00>|<01>S)N)r@)r r r r r r$WszDownsample.forward)r%r&r'rr$r(r r )r!r rARs rAcs$eZdZ<02>fdd<02>Zdd<04>Z<04>ZS)<05>Rezerocs&t<00><00><01>||_t<03>t<05>d<01><01>|_dS)Nr,)rrrr<00> Parameterr1<00>zeros<6F>g)r r)r!r r r\s
zRezero.__init__cCs|<00>|<01>|jS)N)rrG)r r r r r r$aszRezero.forward)r%r&r'rr$r(r r )r!r rD[s rDcs&eZdZd<06>fdd<03> Zdd<05>Z<04>ZS)<07>Block<63>cs6t<00><00><01>t<02>tj||ddd<03>t<02>||<02>t<06><00>|_dS)NrBr,)<01>padding)rrr<00>
SequentialrC<00> GroupNormr9<00>block)r r*<00>dim_outr)r!r r rhs


zBlock.__init__cCs
|<00>|<01>S)N)rM)r r r r r r$psz Block.forward)rI)r%r&r'rr$r(r r )r!r rHgsrHcs*eZdZdd<02><01>fdd<04>
Zdd<06>Z<04>ZS)<07> ResnetBlockrI)rcs^t<00><00><01>t<02>t<04>t<02>||<02><02>|_t||<02>|_t||<02>|_ ||krPt<02>
||d<01>nt<02> <0B>|_ dS)Nr,) rrrrKr9<00>Linear<61>mlprH<00>block1<6B>block2rC<00>Identity<74>res_conv)r r*rNZ time_emb_dimr)r!r r rus
  zResnetBlock.__init__cCsD|<00>|<01>}||<00>|<02>dd<00>dd<00>ddf7}|<00>|<03>}||<00>|<01>S)N)rRrQrSrU)r r Ztime_emb<6D>hr r r r$<00>s
"
zResnetBlock.forward)r%r&r'rr$r(r r )r!r rOts rOcs&eZdZd<07>fdd<04> Zdd<06>Z<04>ZS)<08>LinearAttentionr><00> csDt<00><00><01>||_||}tj||dddd<04>|_t<03>||d<02>|_dS)NrBr,F)<01>bias)rr<00>headsrrC<00>to_qkv<6B>to_out)r r*rZZdim_head<61>
hidden_dim)r!r r r<00>s

zLinearAttention.__init__c Csv|j\}}}}|<00>|<01>}t|d|jdd<03>\}}} |jdd<05>}t<05>d|| <09>}
t<05>d|
|<07>} t| d|j||d <09>} |<00>| <0B>S)
Nz*b (qkv heads c) h w -> qkv b heads c (h w)rB)rZ<00>qkvr.)r*zbhdn,bhen->bhdezbhde,bhdn->bhenz"b heads c (h w) -> b (heads c) h w)rZrV<00>w)<08>shaper[rrZ<00>softmaxr1<00>einsumr\) r r <00>b<>crVr_r^<00>q<>k<>v<>context<78>outr r r r$<00>s
 zLinearAttention.forward)r>rX)r%r&r'rr$r(r r )r!r rW<00>srWcCs2|j^}}|<00>d|<01>}|j|fdt|<02>d<00><02>S)Nr.)r,r,)r`<00>gather<65>reshape<70>len)<06>a<>t<>x_shaperc<00>_rir r r <00>extract<63>s
 rqFcs,<00><00>fdd<02>}<03><00>fdd<02>}|r&|<03>S|<04>S)Ncs6tjd<05>dd<00><00><02>d<02>j<02>dfdt<03><01>d<00><02>S)Nr,)r-r)r,)r,)r1<00>randn<64>repeatrlr )r-r`r r <00><lambda><3E><00>znoise_like.<locals>.<lambda>cstj<01><01>d<01>S)N)r-)r1rrr )r-r`r r rt<00>rur )r`r-rs<00> repeat_noise<73>noiser )r-r`r <00>
noise_like<EFBFBD>srx<00><><EFBFBD><EFBFBD><EFBFBD>Mb<4D>?cCsv|d}t<00>d||<02>}t<00>|||d|tjd<00>d}||d}d|dd<05>|dd<06>}tj|ddd<08>S) zW
cosine schedule
as proposed in https://openreview.net/forum?id=-NEXDKk8gZ
r,rg<00>?r+Nr.g+<2B><16><><EFBFBD><EFBFBD>?)<02>a_min<69>a_max)<05>np<6E>linspacer6<00>pi<70>clip)<06> timesteps<70>s<>stepsr <00>alphas_cumprod<6F>betasr r r <00>cosine_beta_schedule<6C>s ( r<>cs<>eZdZd!<21>fdd<05> Zdd<07>Zdd <09>Zd
d <0B>Zed <0C>d d<0E>Ze <09>
<EFBFBD>d"dd<12><01>Z d#dd<14>Z d$dd<16>Z d%dd<18>Zdd<1A>Zdd<1C>Zdd<1E>Zdd <20>Z<12>ZS)&<26>GaussianDiffusion<6F><6E><00>l1Nc 
sZt<00><00><01>||_t<03>d<01>dk r4tdr4t||<02>|_n t||<02>|_d|j_||_ t
|<06>rxt |t j <0A>rr|<06><0E><00><0F><00><10>n|}nt|<04>}d|} tj| dd<04>}
t<12>d|
dd<05><00>} |j\}t|<04>|_||_tt jt jd<06>} |<00>d| |<06><01>|<00>d| |
<EFBFBD><01>|<00>d | | <0B><01>|<00>d
| t<12>|
<EFBFBD><01><01>|<00>d | t<12>d|
<00><01><01>|<00>d | t<12>d|
<00><01><01>|<00>d | t<12>d|
<00><01><01>|<00>d| t<12>d|
d<00><01><01>|d| d|
} |<00>d| | <0A><01>|<00>d| t<12>t<12>| d<12><02><01><01>|<00>d| |t<12>| <0B>d|
<00><01>|<00>d| d| t<12>| <09>d|
<00><01>|<00>dt <0C> |<07>dddtd<00>f<00>|<00>dt <0C> |<08>dddtd<00>f<00>dS)N<>use_midig<00>?r)<01>axisr.)<01>dtyper<65>r<><00>alphas_cumprod_prev<65>sqrt_alphas_cumprod<6F>sqrt_one_minus_alphas_cumprod<6F>log_one_minus_alphas_cumprod<6F>sqrt_recip_alphas_cumprod<6F>sqrt_recipm1_alphas_cumprodr,<00>posterior_variance<63>posterior_log_variance_clippedg#B<> <0C><><EFBFBD>;<3B>posterior_mean_coef1<66>posterior_mean_coef2<66>spec_min<69> keep_bins<6E>spec_max)!rr<00>
denoise_fnr
<00>getr <00>fs2r<00>decoder<65>mel_binsr<00>
isinstancer1<00>Tensor<6F>detach<63>cpu<70>numpyr<79>r|<00>cumprodrr`<00>int<6E> num_timesteps<70> loss_typer<00>tensor<6F>float32<33>register_buffer<65>sqrtr0<00>maximum<75> FloatTensor)r <00> phone_encoder<65>out_dimsr<73>r<>r<>r<>r<>r<><00>alphasr<73>r<><00>to_torchr<68>)r!r r r<00>sD
 "
$zGaussianDiffusion.__init__cCsBt|j||j<02>|}td|j||j<02>}t|j||j<02>}|||fS)Ng<00>?)rqr<>r`r<>r<>)r <00>x_startrn<00>mean<61>variance<63> log_variancer r r <00>q_mean_variance<63>sz!GaussianDiffusion.q_mean_variancecCs(t|j||j<02>|t|j||j<02>|S)N)rqr<>r`r<>)r <00>x_trnrwr r r <00>predict_start_from_noise<73>sz*GaussianDiffusion.predict_start_from_noisecCsRt|j||j<02>|t|j||j<02>|}t|j||j<02>}t|j||j<02>}|||fS)N)rqr<>r`r<>r<>r<>)r r<>r<>rn<00>posterior_meanr<6E>r<>r r r <00> q_posterior<6F>s
zGaussianDiffusion.q_posterior)<01> clip_denoisedc
CsP|j|||d<01>}|j|||d<02>}|r0|<06>dd<04>|j|||d<05>\}}} ||| fS)N)<01>cond)rnrwg<00><>g<00>?)r<>r<>rn)r<>r<><00>clamp_r<5F>)
r r rnr<>r<><00>
noise_pred<EFBFBD>x_recon<6F>
model_meanr<EFBFBD><00>posterior_log_variancer r r <00>p_mean_variance<63>s  z!GaussianDiffusion.p_mean_varianceTFc Cs~|j|jf<01><02>^}}}|j||||d<01>\} }}
t|j||<05>} d|dk<02><04>j|fdt|j<00>d<00><02>} | | d|
<00><07>| S)N)r rnr<>r<>r,r)r,g<00>?)r`r-r<>rx<00>floatrkrlr2) r r rnr<>r<>rvrcrpr-r<><00>model_log_variancerw<00> nonzero_maskr r r <00>p_samples
*zGaussianDiffusion.p_samplecs:t|<03>fdd<02><08>}t|j|<02>j<03><03>t|j|<02>j<03>|S)Ncs
t<00><01><00>S)N)r1<00>
randn_liker )r<>r r rtruz,GaussianDiffusion.q_sample.<locals>.<lambda>)rrqr<>r`r<>)r r<>rnrwr )r<>r <00>q_sampleszGaussianDiffusion.q_samplec s<>t|<04>fdd<02><08>}|j<01>||d<03>}|<00>|||<03>}|jdkrp|dk r^||<00><04>|<05>d<05><00><06>}q<>||<00><04><00><06>}n|jdkr<>t<07>||<07>}nt <09><00>|S)Ncs
t<00><01><00>S)N)r1r<>r )r<>r r rtruz,GaussianDiffusion.p_losses.<locals>.<lambda>)r<>rnrwr<>r,<00>l2)
rr<>r<>r<><00>abs<62> unsqueezer<65>r;<00>mse_loss<73>NotImplementedError) r r<>rnr<>rw<00>
nonpadding<EFBFBD>x_noisyr<79><00>lossr )r<>r <00>p_lossess

zGaussianDiffusion.p_lossesc  CsP|j|jf<01><02>^} }
} |j|||||||d|d<02> } | d<00>dd<05>} |s<>tjd|j| f| d<07><04><07>}|}|<00>|<0F>}|<0F>dd<05>dd<00>ddd<00>dd<00>f}|dk<03> <09>}|j
||| |d<08>| d <n<>|j}| jdd|j | jdf}tj || d<07>}x@t ttd|<0E><02>d
|d <0B>D]$}|<00>|tj| f|| tjd <0C>| <0A>}q<>W|dd<00>df<00>dd<05>}|<00>|<0F>| d <| S)NT)<02> skip_decoder<65>infer<65> decoder_inpr,r+r)r-)r<><00> diff_losszsample time step)<02>desc<73>total)r-r<><00>mel_out)r`r-r<><00> transposer1<00>randintr<74><00>long<6E> norm_specr<63>r<>r<>rrr<00>reversed<65>ranger<65><00>full<6C> denorm_spec)r <00>
txt_tokens<EFBFBD>mel2ph<70> spk_embed<65>ref_mels<6C>f0<66>uv<75>energyr<79>rcrpr-<00>retr<74>rnr r<>r`<00>ir r r r$,s&

$ $zGaussianDiffusion.forwardcCs||j|j|jddS)Nr+r,)r<>r<>)r r r r r r<>DszGaussianDiffusion.norm_speccCs|dd|j|j|jS)Nr,r+)r<>r<>)r r r r r r<>GszGaussianDiffusion.denorm_speccCs|j<00>||||<04>S)N)r<><00> cwt2f0_norm)r <00>cwt_specr<63><00>stdr<64>r r r r<>JszGaussianDiffusion.cwt2f0_normcCs|S)Nr )r r r r r <00>out2melMszGaussianDiffusion.out2mel)r<>r<>NNN)TF)N)NN)NNNNNNF)r%r&r'rr<>r<>r<><00>boolr<6C>r1<00>no_gradr<64>r<>r<>r$r<>r<>r<>r<>r(r r )r!r r<><00>s2 
 


r<>)F)ry)*r/<00>random<6F> functoolsr<00>inspectr<00>pathlibrr<>r|r1<00>torch.nn.functionalr<00>
functionalr;r<00>einopsr<00>modules.fastspeech.fs2r<00>modules.diffsinger_midi.fs2r <00> utils.hparamsr
rrrr<00>Modulerr)r9r=rArDrHrOrWrqrxr<>r<>r r r r <00><module>s<