[Data liberation] wp_rewrite_urls() #1893
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation for the change, related issues
A part of #1894.
Prototypes a
wp_rewrite_urls()
URL rewriter for block markup to migrate the content from, say,<a href="https://adamadam.blog">
to<a href="https://adamziel.com/blog">
.Status:
WP_HTML_Tag_Processor
. Let's update it and find a way of not keeping a copy in this repo.Details
This PR consists of a code ported from https://github.com/adamziel/site-transfer-protocol. It uses a cascade of parsers to pierce through the structured data in a WordPress post and replace the URLs matching the requested domain.
The data flow is as follows:
Parse HTML -> Parse block comments -> Parse attributes JSON -> Parse URLs
On a high level, this parsing cascade is handled by the
WP_Block_Markup_Url_Processor
class:Getting more into details, the
WP_Block_Markup_Url_Processor
extends theWP_HTML_Tag_Processor
class and walks the block markup token by token. It then drills down into:<a href="">
, looking for ones that contain valid URLsThe
next_url()
method moves through the stream of tokens, looking for the next match in one of the above contexts, and theset_raw_url()
knows how to update each node type, e.g. block attributes updates arejson_encode()
-d.Processing tricky inputs
When this code is fed into the migrator:
This actual output is produced:
Remaining work
composer install
)Follow-up work
WP_HTML_Tag_Processor
in WordPress core, see HTML API: Add set_modifiable_text() for replacing text nodes. wordpress-develop#7007 (comment)WP_HTML_Tag_Processor
as a "WordPress polyfill" for standalone usage.Testing Instructions (or ideally a Blueprint)
CI runs the PHP unit tests. To run this on your local machine, do this: