Nice URLs are good interface design. But, it can be a struggle to get nice-looking urls with Apache.
I just spent a frustrating hour fighting with Apache’s
mod_rewrite. There doesn’t seem to be a way to add commentary to the documentation, so I thought I would document a few tips here.
.htaccessis being read and mod_rewrite is available
All of the apache configuration directives are contained in files named
.htaccess. To check that Apache has been configured to read these files, add the following to the
.htaccess in your webserver’s document root directory.
RewriteEngine On RewriteRule ^foo$ rewriting-is-working
Go to the url
http://site.com/foo and you should get a 404 error message like, “The requested URL /rewriting-is-working was not found on this server.” That’s good. mod_rewrite is installed and activated. It just needs to be configured.
If you only see a “Not Found” error message with no mention of “rewriting-is-working”, then it might be that your
.htaccess file is not being read at all.
An easy way to check if the
.htaccess files are being parsed is to put an invalid entry in the file at the root level. If you get a server error, then we know the file is in use.
# the next line is not valid. We should get a 500 Server Error. Foo
If you don’t get an error, then you need to allow the webserver to be configured with
.htaccess files. This is done in the main server config (usually
/etc/apache/httpd.conf ) and if you are on a shared webhost, you will have to ask the administrator to set this up. It will look something like…
# file: /etc/httpd.conf # <Directory "/public_html"> AllowOverride FileInfo # or if you want to do anything in .htaccess: # AllowOverride All </Directory>
(And of course, after any httpd.conf change, restart the webserver.
sudo apachectl restart )
This was a huge stumbling block for me. The filesystem takes precedence over the URL rewriting. Consider this example:
FILE: public_html/about/us.html URL: http://site.com/about/us.html DESIRED URL: http://site.com/about HTACCESS: public_html/.htaccess
This is not possible. Apache will go into the “about” directory and will never see the Rewrite directive at the root level.
The best solution is to rename “about” something like “_about” and then add the following rewrite rules:
RewriteEngine On RewriteRule ^about$ /_about/us.html # - and if you need the old URL to work... # RewriteRule ^about/us.html$ /about/us.html # - or if there are other files under /about ... # RewriteRule ^about/(.+)$ /_about/$1
.htaccess is in a subdirectory, the Rewrite rule cannot see the entire URL. Consider this example:
DESIRED URL: http://site.com/content/latest
The rewrite rule in the
.htaccess file doesn’t “see” the “content” part of the URL. To get the desired URL, the pattern is
RewriteEngine On RewriteRule ^latest$ news.php
When nothing is working, it’s tempting to just start sticking various options to RewriteRule to see what happens.
The options that you can pass to RewriteRule are a bit cyptic. They are:
[L] = Last rule. If there is a match then stop processing further Rewrite rules. [QSA] = Query String Append. Useful for scripts that receive parameters via a GET (foo.php?id=123&name=joe) and you simply want to pass them on. [PT] = Pass through. Mostly used in combination wiAfter a match, pass through to the other handlers. U [R] = Redirect. Send the rewritten URL back to the browser, which the browser automatically loads. The new URL is visible to end-user.
The [L]ast rule is a nice optimization if you have several rules, since the server won’t continue to match rules that you don’t need or want. Having rules that call other rules seems like a recipe for spaghetti code. For example, this is possible:
RewriteEngine On RewriteRule ^xxx$ yyy RewriteRule ^yyy$ zzz RewriteRule ^zzz$ news.php
“xxx”, “yyy” or “zzz” would all take you to “news.php”. Useful? Maybe. It’s certainly convoluted.
Here’s how you add an options to your rule.
RewriteEngine On RewriteRule ^latest$ news.php [L]
[QSA] is only useful if you have a script that receives input multiple variables via a GET request. Assuming that news.php takes several parameters…
RewriteEngine On RewriteRule ^latest$ news.php [QSA,L]
[PT] is only useful if you want mod_alias to process the URL afterwards. (But why not just write the URL correctly the first time?)
The regexp operator
\d which normally matches all digits, doesn’t work! I figured this was a fairly basic to any regular expression engine, but it must be a perl extension. Anyways, keep it simple. If your regexp doesn’t work, simplify it to the basics.
URL: http://site.com/_item.php?id=123 DESIRED URL: http://site.com/item/123
# Bad: Does NOT work! RewriteRule ^item/(\d+) _item.php?id=$1 # Good RewriteRule ^item/([0-9]+) _item.php?id=$1