This is mostly because I am not 100% certain about how stable our own
optimized version is. While it does perform a lot better, there is
significant risk that things might break in production. To avoid
that, I am replacing it with the upstream version which is much more
heavily tested.
This replaces DOMParser with htmlparser2 which is much, much faster.
How much faster? 80%. This new implementation can parse at 50mb/s
which is insane! The old one could only do 5-10mb/s
We still haven't gotten rid of the DOMParser though since HTML-to-MD
conversion still needs it. This will be done soon though by using `dr-sax`.
This uses a custom implementation of htmlparser2 instead of the default
one which is 50% faster.
Since HTML is a tree-like language it is futile to compare it character
for character. `html1 === html2` is almost always false. This commit
introduces a simple diffing algorithm that only checks the text inside
the html + a few other attributes to decide whether the 2 HTMLs are
actually different or not. This is obviously not foolproof and it will
ignore everything aesthetic (b, em, strong tags etc.). This is actually
desireable because in our case only the text difference should
warrant a conflict. Everything else can easily be brought back.
Similarly, this also ignores whitespace differences surrouding the
tags.
All in all it'll provide a more reliable alternative to MD5 hashing the
2 HTMLs.
by default running `npm run test:core` will only run unit tests.
E2E tests require setting up credentials in the .env file.
Until we figure out a way to streamline this whole process,
this is how the tests will be run.