notesnook

Mirrors/notesnook

Fork 0

mirror of https://github.com/streetwriters/notesnook.git synced 2025-12-22 14:39:34 +01:00

Commit Graph

Author	SHA1	Message	Date
Abdullah Atta	205373dca3	core: use htmlparser2 for html rewriting This replaces DOMParser with htmlparser2 which is much, much faster. How much faster? 80%. This new implementation can parse at 50mb/s which is insane! The old one could only do 5-10mb/s We still haven't gotten rid of the DOMParser though since HTML-to-MD conversion still needs it. This will be done soon though by using `dr-sax`. This uses a custom implementation of htmlparser2 instead of the default one which is 50% faster.	2022-11-10 15:16:13 +05:00
Abdullah Atta	e1fc116994	core: improve content conflict detection using proper HTML diffing (#1183 ) Since HTML is a tree-like language it is futile to compare it character for character. `html1 === html2` is almost always false. This commit introduces a simple diffing algorithm that only checks the text inside the html + a few other attributes to decide whether the 2 HTMLs are actually different or not. This is obviously not foolproof and it will ignore everything aesthetic (b, em, strong tags etc.). This is actually desireable because in our case only the text difference should warrant a conflict. Everything else can easily be brought back. Similarly, this also ignores whitespace differences surrouding the tags. All in all it'll provide a more reliable alternative to MD5 hashing the 2 HTMLs.	2022-10-13 19:22:32 +05:00

Author

SHA1

Message

Date

Abdullah Atta

205373dca3

core: use htmlparser2 for html rewriting

This replaces DOMParser with htmlparser2 which is much, much faster.
How much faster? 80%. This new implementation can parse at 50mb/s
which is insane! The old one could only do 5-10mb/s

We still haven't gotten rid of the DOMParser though since HTML-to-MD
conversion still needs it. This will be done soon though by using `dr-sax`.

This uses a custom implementation of htmlparser2 instead of the default
one which is 50% faster.

2022-11-10 15:16:13 +05:00

Abdullah Atta

e1fc116994

core: improve content conflict detection using proper HTML diffing (#1183 )

Since HTML is a tree-like language it is futile to compare it character
for character. `html1 === html2` is almost always false. This commit
introduces a simple diffing algorithm that only checks the text inside
the html + a few other attributes to decide whether the 2 HTMLs are
actually different or not. This is obviously not foolproof and it will
ignore everything aesthetic (b, em, strong tags etc.). This is actually
desireable because in our case only the text difference should
warrant a conflict. Everything else can easily be brought back.
Similarly, this also ignores whitespace differences surrouding the
tags.

All in all it'll provide a more reliable alternative to MD5 hashing the
2 HTMLs.

2022-10-13 19:22:32 +05:00

2 Commits