Over the past several days, there has been an interesting discussion on the wp-testers mailing list (though, it really belonged on the wp-hackers list, but that’s beside the point) about permalink structures in WordPress. The original question came from matthijs and questioned why WordPress was storing rewrite rules for every page on his site in a database option. Further discussion revealed that this was a side-effect of his particular permalink structure, and some really good information about good and pad permalink patterns. This information could be important for sites that use non-standard URL structures, and I thought it deserved a summary.
First, let’s look at the original question and the situation that brought it about:
Recently I discovered that the current way wordpress handles permalinks is not scalable. All rewrite_rules are at the moment held in a single database field in the wp_options table. If you have a few dozens pages and posts, you have maybe a few hundred rewrite_rules in it and all is well. But as soon as you start to have a few hundred pages and attachments, the amount of rewrite_rules explodes as well as the field size. This also depends on the permalinks settings. On one of my sites I can’t even open the database field to take a look because my browser and text editor crash because of its size.
Before anyone starts to panic, let me that this is not a general problem in WordPress. This person had a particular permalink structure which forced WordPress to store extra rules for every page. This is a situation which can be avoided by choosing a permalink pattern which allows WordPress to find your posts in an efficient way.
WordPress gives site builders a lot of flexibility in how their post URLs are created. There are several attributes which can be used, and ordered how the person likes. The default “pretty permalink” structure looks like this:
Which results in perlink URLs that look like:
There are several structure tags which can be used to form permalinks: %year%, %monthnum%, %day%, %hour%, %minute%, %second%, %postname%, %post_id%, %category%, %tag%, and %author%. As mentioned earlier, this gives a lot of flexibility in how your URLs can appear. However, Ryan Boren pointed out:
Verbose rules are used for structures beginning with %category%, %tag%, %postname%, and %author%. Avoiding such structures is best.
This important note was subsequently added to the Codex page about Using Permalinks:
For performance reasons, it is not a good idea to start your permalink structure with the category, tag, author, or postname fields. The reason is that these are text fields, and using them at the beginning of your permalink structure it takes more time for WordPress to distinguish your Post URLs from Page URLs (which always use the text “page slug” as the URL), and to compensate, WordPress stores a lot of extra information in its database (so much that sites with lots of Pages have experienced difficulties). So, it is best to start your permalink structure with a numeric field, such as the year or post ID.
This would be a problem for any dynamic CMS, not just WordPress. If there isn’t some way to narrow down the information in the URL and map it to a specific page or post, the system must perform a lot of database searches to find the correct entry. Otto provides a really good hypothetical example:
Actually, I think this deserves a bit more discussion… Let’s consider a permalink like %category%/%postname%.
So you’re handed a URL like /mycat/mypost. You start by parsing it into mycat and mypost. You don’t know what these are. They’re just strings to you. So, first, you have to consider what “mycat” is.
First, you query to see if “mycat” is a pagename. This is a select from wp_posts where post_slug = mycat and post_type = page. No joy there.
Next, you query to see if “mycat” is a category. This is a select from wp_terms join wp_term_taxonomy on (term_id = term_id) where term = mycat and taxonomy = category. Hey, we found a mycat, so that’s good. Unfortunately, this just tells us that it’s a category, which is rather useless in retrieving the actual post we’re looking for. So we ignore the category.
Now, we move on to the “mypost”. Again, we start querying:
1. Is it a page? select from wp_posts where post_slug = mypost and post_type = page. Nope.
2. Is it a category? select from wp_terms join wp_term_taxonomy on (term_id = term_id) where term = mypost and taxonomy = category. Nope.
3. Is it a post? select from wp_posts where post_slug = mypost and post_type = post. Bingo.
The whole goal is to determine the specific post being asked for. The category is not helpful in this respect, and we have to do a couple queries just to figure out that we need to ignore it. Five queries to determine what the post is with this structure. Five queries, two of them expensive (joins ain’t cheap). And these have to happen on every load of a post on your site.
Otto then goes on to explain that this isn’t what WordPress actually does. Instead, when WordPress detects that you have an inefficient permalink structure, it stores extra rewrite rules in an option in the database, which it then refers to when presenting a page.
To finish up, let’s look at a couple of quick examples.
In conclusion, when building a site’s permalink structure, choosing carefully can help WordPress locate your articles in the most efficient way possible.