301 Redirects, Rails, Capistrano, and mongrel_cluster

In the long arc of web design, sites change. Over the years, they are redesigned, reimplemented, ripped apart, and reassembled. In doing this, content, typically, can be lost to the ether. We all know that missing content is simply “not a good thing” and should be avoided.

Inbound links to a site generally come from two main places:

  1. Google/Yahoo/MSN/Search Engines
  2. Links on other pages (e.g. blogs, articles, directories).

The search engines crawl regularly and keep themselves up to date. If they can find your content they will. If you tell them it’s moved permanently, they’ll update their records. Articles and other sites, however, usually won’t. I haven’t checked old outbound links on this site and I doubt many other people do either. In order to keep the end user searching through google and clicking on links from external sites happy, we need to employ some URL redirection to keep the old links pointing to the content with which they are associated.

We want to use 301 redirects.

What is a 301 redirect?

A 301 redirect refers to the HTTP protocol status code delivered by the web server. A 301 is similar to the most famous status code: 404. A 404 is “page not found”. A 301 says "this resource has been permanently moved and here is the new address. Simple.

Redirects are achieved by using the mod_rewrite Apache extention. The semantics of writing rewrite directives can be quite mind boggling to the novice user as evident by the documentation for mod_rewrite

Why 301 redirects?

When you redeploy a project with a new framework, a lot of old links that have google-fu disappear. People clicking on said links will get a 404 error and end up confused believing the content to have been destroyed. Sad web surfer. So, as a good web designer, you want to ensure that all of your old URLs point to their new counterparts.

“But what about meta header redirects?” you say.

Sure, that would work on a small scale, but they have been so abused by spammers in recent years that they will decimate your search engine ranking. Also, when you have over 1900 redirects (as I did when porting this site to Mephisto) you want to make sure that you can set something up that is

  1. Easy to manage
  2. Centralized
  3. Seamless and immediate to the end user.

And that’s what we’re going to setup here.

Why .htaccess doesn’t work properly

In setting up capistrano based on Coda Hale’s instructions you are actually telling Apache to send everything to the mongrel_cluster to handle requests before you reach .htaccess. For some reason, this allows certain requests to be handled (e.g. ones that were related to /feed worked, but /feed has a route associated with it), but it ignores others.

<pre>

  1. Redirect all non-static requests to cluster
    RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
    RewriteRule ^/(.*)$ balancer://mongrel_cluster%{REQUEST_URI} [P,QSA,L]

That basically says “when the condition equals the site with any trailing URL, send that stringto the mongrel_cluster load balancer and let it deal with it.”

The Setup

First off, we need to turn on mod_rewrite for this site. It should be on in your .conf file since Rails uses rewrite to achieve basic routing, so look for the line that says

<pre> RewriteEngine On </pre>

If it is not present, your site probably doesn’t work at all and you have far larger problems than I can address here.

Second, we know we have a lot of these redirects, so we don’t want to muddy up our very nice apache config file with them inline. So let’s use the Include directive to tell Apache to pull the file wholesale and parse it at that point. Place this line immediately after the RewriteEngine On directive.

<pre> Include etc/apache22/Includes/boboroshi_rewrite.conf </pre>

I placed mine in the same directory as my main domain conf file. On Apache versions before 2, you will not see an Includes directory in /usr/local/etc/apache* so you would need to do this differently.

In this file, I placed a series of redirects that I typed up in an external text editor and uploaded to the server. Did you think I would type 1900 lines in nano or vi? Crazy! I digress…

The file looks something like this:

<pre> RewriteRule ^/log/index\.xml$ http://feeds.feedburner.com/boboroshi [R=301,L] RewriteRule ^/index\.xml$ http://feeds.feedburner.com/boboroshi [R=301,L] RewriteRule ^/index\.rdf$ http://feeds.feedburner.com/boboroshi [R=301,L] RewriteRule ^/atom\.xml$ http://feeds.feedburner.com/boboroshi [R=301,L] RewriteRule ^/rsd\.xml$ http://feeds.feedburner.com/boboroshi [R=301,L]

RewriteRule ^/bloggertrue\.mov$ http://versionsix.boboroshi.com/media/video/bloggertrue.mov [R=301,L]

RewriteRule ^/travel/vh1awards/$ http://versionsix.boboroshi.com/viewpoint/travelogue/vh1awards/ [R=301,L]
RewriteRule ^/travel/calinevada62001/$ http://versionsix.boboroshi.com/viewpoint/travelogue/calinevada62001/ [R=301,L]

RewriteRule ^/backlog/2006/11/just_like_starting_over\.php$ http://www.boboroshi.com/2006/11/28/just-like-starting-over [R=301,L]
RewriteRule ^/backlog/2006/10/myspace_data_modeling\.php$ http://www.boboroshi.com/2006/10/31/myspace-data-modeling [R=301,L]
RewriteRule ^/backlog/2006/10/the_killers_sams_town_springsteen_queen_and_deadwood\.php$ http://www.boboroshi.com/2006/10/8/the-killers-sams-town-springsteen-queen-and-deadwood [R=301,L]
RewriteRule ^/backlog/2006/10/photos_soft_complex_monopoli_cedars_at_the_black_cat\.php$ http://www.boboroshi.com/2006/10/7/photos-soft-complex-monopoli-cedars-at-the-black-cat [R=301,L]
RewriteRule ^/backlog/2006/10/the_periodic_spiral\.php$ http://www.boboroshi.com/2006/10/4/the-periodic-spiral [R=301,L]

[……]

What does that mean? Well looking at the mod_rewrite documentation we’re looking at some basic regular expressions. the ^ starts the string and the $ ends the string. Since a period is used to represent any character, you want to slash escape it by placing a backslash character (\) before any period that should be in the URL.

[R=301,L] the R says “force redirect” and apply a 301 status code. The “L” says "stop running through the rules now and load the page. This is good when we have 1900 entries. Once it finds the entry, it stops and gets the content to the client.

So, once you’ve got it written up, toss that file in the directory, restart apache, run cap restart from your deployment workstation, and you should be in business. And your old users will be very very happy that they don’t have to look at a cache page in Google.

  1. josh says:

    If you're using Apache already, mod_alias is probably the way to go for redirects.

    Redirect (path) (url)
    Redirect /backlog/2006/11/just_like_starting_over.php http://www.boboroshi.com/2006/11/28/just-like-starting-over

  2. John Athayde says:

    Josh -

    I initially had those in my .htaccess but they were also choking. I'll give it a whirl updating the main file. Thanks for the tip!

  3. Ben says:

    And you'll probably want to use the 'RedirectPermanent' alias instead of just 'Redirect' (which does a 302 by default).

Post a comment

Name or OpenID (required)


(lesstile enabled - surround code blocks with ---)