In 2024 I held a talk about improving the performance of a website with 500’000 attachments at WordCamps in Karlsruhe(Video) and in the Netherlands(Video). For Vienna in 2025 I applied this talk as a Workshop. Since I needed a similarly huge database, I needed to come up with a solution for it.
FakerPress, wp-cli-fixtures and other dead ends
My first idea was to use FakerPress to create a randomized/anonymous set of data. But knowing the sheer size I needed, I directly checked for a WP CLI command, FakerPress didn’t come with one, but using that I found wp-cli-fixtures, this looked promising. It looked like I could provide a small .yaml file, that would generate a dummy database for my participants. Well it didnt work.
- ๐ง It didn’t work with PHP 8.4.
- ๐ก 7.4 and switching to a fork, it worked
- ๐ง When trying to generate a few dummy attachments, I saw it actually tried to download these, with a goal of 500k attachments it would have downloaded several TB in the workshop.
- ๐ก Using a 8×8 px jpg stored locally would create roughly 150mb data for the user, which looked good.
- ๐ Trying to create 50k posts via fixtures took about 30 minutes
After seeing this, I had to switch the approach.
Using the project’s database
Using the real data was the next logical solution. In order to be able to share it, I will need to remove/replace any potentially personal or copyright protected content.
Step 1: Deleting any unnecessary table
Since I want to show how to improve the performance, mostly related to posts and attachments, we can remove custom tables by plugins that will not be part of the workshop.
So I deleted any table from the SEO plugins, contact form plugins, cookie consent, and custom plugins.
Step 2: Anonymizing users
With the help of AI and generator for random names, I created a simple WP CLI command to anonymize users which you can find on my Gist.
All names will be replaced by some random combinations of first names and cat and dog names, as well as create matching email addresses and clearing all passwords.
Step 3: Dropping the comments.
The comments won’t be relevant to the workshop, so we’ll truncate the tables. Originally I intended to use a similar CLI command as for the users, which would be a simple adjustment to the user CLI.
TRUNCATE wp_comments;
TRUNCATE wp_commentmeta;
Step 4: Cleaning wp_posts
We’ll drop all post types we don’t need and all revisions and all orphaned post meta.
DELETE FROM wp_posts WHERE post_type IN ( 'acf-field', 'acf-field-group', 'audiopodcast', 'ep-pointer', 'ep-synonym', 'oembed_cache', 'revision', 'wp_block', 'wpcf7_contact_form');
DELETE pm
FROM wp_postmeta pm
LEFT JOIN wp_posts wp ON wp.ID = pm.post_id
WHERE wp.ID IS NULL;
Step 5: Clearing taxonomies
Removing legacy taxonomies was part of the original talk, but replicating this for the workshop will come at the cost of a huge database. Due to time concerns, we will ignore this and remove all taxonomies but category
.
-- 1. Delete from wp_term_relationships
DELETE tr
FROM wp_term_relationships tr
INNER JOIN wp_term_taxonomy tt ON tr.term_taxonomy_id = tt.term_taxonomy_id
WHERE tt.taxonomy IN (
'wpmf-category', 'post_tag', 'brand', 'relevance', 'main_category'
)
OR tt.taxonomy LIKE 'kf\_%';
-- 2. Delete from wp_term_taxonomy
DELETE FROM wp_term_taxonomy
WHERE taxonomy IN (
'wpmf-category', 'post_tag', 'brand', 'relevance', 'main_category'
)
OR taxonomy LIKE 'kf\_%';
-- 3. Delete from wp_termmeta (if you want to remove associated meta)
DELETE tm
FROM wp_termmeta tm
INNER JOIN wp_terms t ON tm.term_id = t.term_id
WHERE t.term_id IN (
SELECT term_id FROM wp_term_taxonomy
WHERE taxonomy IN (
'wpmf-category', 'post_tag', 'brand', 'relevance', 'main_category'
)
OR taxonomy LIKE 'kf\_%'
);
-- 4. Delete from wp_terms
DELETE t
FROM wp_terms t
LEFT JOIN wp_term_taxonomy tt ON t.term_id = tt.term_id
WHERE tt.term_id IS NULL;
This cleared 1.8 million rows in total, decreasing the file size and most importantly, the expected import duration down by a significant amount.
Step 6: Masking the content
I basically needed a single post content that would contain all problems I needed to show, while it should still feel kind of natural. I decided to mask the titles by replace every word with a random word from lorem ipsum and published that to my gist(memo: upload it!) and cleaned up some further data. For the_content
I went for deleting every single excerpt and every post content in order to file size and import time. These were later replaced with a generic dummy post by a plugin inside the project repository.
Step 7: A lot of manual clean up
These steps are too much to name every single of them.
- Clear every email address found in the database
- remove licenses, API keys and similar from
wp_options
- remove transients and other caches to reduce file size
- and much more
In a follow-up to this, I will talk about the preparations of the workshop itself.