Saturday, April 19, 2008 5:58 PM bart

Blog Traffic Load Balancing

I love my blog readers. Of course purely in a professional sense but still. Before I continue, let me point out to my much beloved audience on the other side of the RSS channel that this post isn't particularly interesting. So, if you're looking for hardcore technical stuff, I'd strongly recommend to ignore my post for one time.

So what brings me to write down this love declaration all of a sudden? A flashback: in late 2003 I started to blog on blogs.bartdesmet.net. I already had my domain name but no particularly interesting homepage (and the color scheme was a highly controversial topic of discussion amongst some of my friends). In fact, I've never been a big fan of homepages with photos of summer barbecues, tales about pets and other personal trivia. So I wanted to take a different approach and started to maintain a technical blog. Having been involved in some local projects that required a web server I was so kind to maintain (first Windows 2000 Server, later Windows Server 2003), it was a luxury for me to take some of the machine's web space and point my domain to it. At that time, we weren't really bandwidth constrained, being housed on a fiber network in a semi-professional housing environment.

But luxury is a volatile phenomenon. Time passed by, the project moved to other housing environments and I was no longer involved in the web server's maintenance (which was honestly a bit of a relief too). However, my blog was in the position of becoming homeless which wasn't a pretty thought given the number of links to it and of course all the time I invested to write up my sometimes interesting posts. So I was looking out for some affordable hosting company - which will remain unnamed - providing the latest and greatest platform (Windows Server 2003 and SQL Server 2005 at the time) with enough space to hold my contents. At the time I had about 250 MB of content + a 100 MB SQL Server database, so 1 GB of web space and 250 MB of SQL seemed to be more than enough. Today the size occupied has almost doubled as illustrated below:

Web space SQL Server space
image image

The thing I paid the least attention to though was the bandwidth provided by the hosting company, at that time I believe 60 GB per month. Who needs 60 GB right? I was a little surprised initially to find out there was - if I remember correctly - about 30 GB being transferred the first month at my new hoster. History only goes back to November 07, where it climbed to approximately 40 GB a month:

image

I was still fine and the bandwidth limits were even increased to a stunning 80 GB. However, ever since I started using Windows Live Writer I've been adding more and more screenshots to my posts (it so easy to upload dude), most of these at a large size to avoid the annoying "click to enlarge" catch-phrase every two lines (my policy is to fall back to the latter thumbnail approach whenever an image isn't part of the regular flow of the post but acts purely as an illustration which is "nice to see" but not strictly needed). So, after a few popular blogging series lately (C# 3.0, VB 9.0 and PowerShell 2.0 feature focuses and my recent ramblings on functional pattern matching in C# - to be continue) I was still a bit surprised to see my stats for this month so far:

image

This is why I love my readers: they make electrons travel the globe to deliver my writings to their brains. I never thought to become bandwidth constrained, but 63 GB in little over one half of the month is a bit too much given the limit of 80 GB a month. So this called for immediate action and luckily I have some additional space at bartdesmet.info which I bought recently to play around with IIS 7.0 (I said my hoster is on the cutting bleeding edge of technologies!) and to prepare moving my blog over to IIS 7.0 eventually.

The plan is simple: move over images to the second host, leave app stuff on the first host. One of the first things that brought up nice experiences was the re-encounter with the amazingly fast (not!) FTP protocol when dealing with small files: copying about 100 MB of images from one place to the other took several hours. One thing remaining is to update all pointers to images for which I considered different approaches:

  • Handle all .jpg and .png files using an HTTP Handler and redirect to the new location - would work in IIS 7.0 but my old host is still on IIS 6.0 so the metabase would require a change which involves the ISP's goodwill.
  • Tweak the SQL Server database to "replace" (although the REPLACE T-SQL function wouldn't work here - storage for post bodies in CommunityServer seem to be ntext fields) old links with new ones.
  • Write a Community Server module.

I went for the last approach, at least temporarily, since it doesn't involve touching the database and should take effect immediately. It goes roughly like this:

using System;
using System.Text;
using System.Xml;
using CommunityServer.Blogs.Components;
using CommunityServer.Components;

namespace BartDeSmet.Net
{
    public class CSRewriteImageLinks : ICSModule
    {
        public void Init(CSApplication csa, XmlNode node)
        {
            csa.PreRenderPost += new CSPostEventHandler(csa_PreRenderPost);
        }

        void csa_PreRenderPost(IContent content, CSPostEventArgs e)
        {
            if (e.ApplicationType == ApplicationType.Weblog && e.Target == PostTarget.Web)
            {
                WeblogPost post = content as WeblogPost;
                if (post != null && post.PostLevel == 1)
                {
                    StringBuilder sb = new StringBuilder();
                    sb.Append(post.FormattedBody);
                    sb.Replace("http://bartdesmet.info/images", "http://bartdesmet.info/images");
                    sb.Replace("http://bartdesmet.info/images_wlw", "http://bartdesmet.info/images_wlw");
                    post.FormattedBody = sb.ToString();
                }
            }
        }
    }
}

Quite brute force - though still a bit gentle using the StringBuilder (and web infrastructure caching I'd assume) - but functional. Fingers crossed... Oh and obviously I've changed my Live Writer settings to upload files to my second host (click to enlarge <g>):

image image

Alright, with this post I violated my own policy not to nag about personal trivial but one sin should be acceptable, no? At the very least I did publish some lines of C# code...

Del.icio.us | Digg It | Technorati | Blinklist | Furl | reddit | DotNetKicks

Filed under:

Comments

# Web Hosting &raquo; Blog Traffic Load Balancing

Sunday, April 20, 2008 1:13 AM by Web Hosting » Blog Traffic Load Balancing

Pingback from  Web Hosting &raquo; Blog Traffic Load Balancing

# re: Blog Traffic Load Balancing

Sunday, April 20, 2008 5:47 AM by DM

Hi, The images for your statistics are not showing up... they seem to point to your "hoster's" https site (oops!). Interesting "non-technical" stuff :-)

# re: Blog Traffic Load Balancing

Sunday, April 20, 2008 7:29 AM by Eamon Nerbonne

Have you taken a peak at the web server logs - should they be available? I found (admittedly on a photo site) that a surprising amount of traffic was bots (mostly google) indexing hi-res images. I'm sure your more content heavy site doesn't quite follow that pattern, but if bot-image downloads do account for a significant portion a simply robots.txt update will reduce your traffic at the mere cost of reduced findability via image searching sites - which probably isn't exactly crucial to you. And in any case, looking through logs is enlightening anyhow, since it can tell you quite a bit about how your site is viewed and a little bit about by whom.

# re: Blog Traffic Load Balancing

Sunday, April 20, 2008 4:36 PM by bart

Thanks for the feedback on missing pictures folks - it's solved now.

# re: Blog Traffic Load Balancing

Sunday, April 20, 2008 4:38 PM by bart

Hi Eamon,

Thanks for the feedback - I recall having has a robots.txt file in the past but apparently it was dropped in some upgrade of Community Server when I wiped the web folder - I did put it back to exclude the images folder. I'm following the logs quite regularly and about 30% of the total volume comes from images; 25% from RSS feeds (which overlap of course) - so it still remains to be seen what the best approach will be to balance traffic without breaking existing pointers to the site.

Thanks,

-Bart

# sql server 2005 load balancing

Friday, June 20, 2008 5:10 PM by sql server 2005 load balancing

Pingback from  sql server 2005 load balancing