Tag Archives: tech

Host with an expert

Whenever I talk with somebody at a company that has a need for dedicated servers, I jump on the opportunity to sell them on Rackspace.  No, I don’t get any commission or anything from them.  I just feel that when it is so apparently clear to me that Rackspace is exactly what a company needs, I feel compelled share so that they don’t go needlessly down a wrong path.

On Friday, I was having lunch with two guys from one such company in Blacksburg, and they asked me “What is the biggest thing you’ve learned about hosting a system as large as Webmail.us’ at Rackspace?”  Man, where do I begin?  The biggest thing.  Hmmm…

I told them the story of how when we first moved our email hosting system to Rackspace, we were running it on just 5 servers.  These were powerful dual-Xeon boxes, lots of RAM, fast expensive SCSI drives, the works.  Not cheap boxes.  This was 2003, and our business was starting to boom.  Soon 5 servers turned into 7 servers.  Then 9.  Our application started becoming more complex too… adding dns-caching, multiple replicated databases, load balanced spam filtering servers, etc.  We had each of our servers running several of these applications so that we could get the most bang-for-the-buck out of the machines.  This started to get complex fast, and was about to become a nightmare to manage.

With multiple applications per server, it became increasingly difficult to troubleshoot problems.  For example, when a disk starts running slow or a server starts going wacky (technical jargon), how do you determine which of the 4 applications running on that server are the culprit.  Lots of stopping and starting services, and watching /proc/* values.  But with just 9 servers, you don’t have an excessive amount of redundancy and don’t want to have to do this all that often.  Or worse, when an application crashes a box, it takes down all of the apps that were running on that box.  If there was a better way to scale, we needed to find it.

We started Webmail.us while still in college, and while we had interned at some pretty neat companies, we didn’t have a whole lot of experience to lean on in order to figure things like this out.  In computer engineering / computer science they teach you how to code, but they don’t teach you how to manage clusters of servers.  We were learning how to run a technology company by making decisions through gut instinct and trial-and-error – not by doing what has been done in the past at other companies.  And even after we had hired a decent number of employees, very smart employees and some with lots of experience, there were still many areas that our team was lacking expertise in.  So what did our gut tell us to do in order to learn how to scale things the right way?…  Get help from an expert.

Having a company like Rackspace on our side has been a huge asset.  With a collection of talented engineers the size of theirs, they seem to always have at least one person who is an expert on just about anything that we have needed help with.

In 2005, by working with people at Rackspace like Paul, Eric, Alex, Antony and others, we decided to re-architect our system to give each of our internal applications and databases their own independent server clusters.  The idea was to use smaller servers, and more of them; with smart software to manage it all so that hardware failures can be tolerated (hmm… have you ever heard of a system like this before?).  With this approach, each application is completely isolated from the next.  When a server starts acting wacky, we can just take it down to replace hardware, re-install the system image, or whatever… and the load balancers and data replication software knows how to deal with that.

We ended up completely ditching the beastly dual-Xeon servers in favor of 54 shiny new single-cpu AMD Athlon boxes, each with a 1 GB RAM and SATA hard drives.  Basically equivalent to the hardware you could get at the time in a $1000 Dell desktop.  We’ve grown this system over 3x since we first launched it with 54 servers.  We still mostly use Athlon cpus, but have some Opteron and even some dual-Opteron boxes now in clusters that require a lot of CPU such as spam filtering.

Today it is just as easy to manage 180 servers as it was with 54 servers, because we’ve built things the right way.

Rackspace’s expertise was invaluable in creating this new system.  However, we are not the type of company that likes to be completely dependent on another company, even if that other company is Rackspace.  So, we didn’t just let them build this new system for us.  We had them show us how to build it.  They may have built the first pair of load balancer servers out of basic Linux boxes; but then we ripped them up, and built them again from scratch.  Then we did it again.  We did this until we understood how each component worked and we didn’t need Eric or Alex’s help anymore.  We did this with everything that we built in 2005, and we continue to do this whenever we lean on Rackspace for help.

So my advice for these two guys who had been selling their software for almost 10 years and were about to move it towards a hosted web service model, was this… As much as you think you know about hosting your software, you are going to run into things that nobody at your company will have done before.  Things that you guys are not experts at.  If you stick your servers in a colo cabinet somewhere, you are going to have to figure those things out on your own.  This will be slow and will probably not result in the best solution every time.  I highly recommend that you consider hosting your app at a company like Rackspace who can help you when you need it.  You are going to pay more to per server going this route if you simply look at the raw hosting cost.  However, you will be able to get things online faster, work through problems effectively, and you will learn how to host your system from the best.

My other posts about Rackspace:
Outsource your data center
Amazon vs Rackspace

The Evolution of an Inbox

1994: No steady email account. Burned through at least eight AOL screen names with those "First 10 Hours Fee" accounts before they finally told me to stop. Also tried Compuserve and Prodigy. At this point just trying to figure out what this Internet thing is.

1995: First real email account – @vt.edu. Eudora 1.5.2 (I think). POP3, no SSL. Everything goes to my inbox. I delete mail after I’ve read it. Wanted to run OS2/Warp, but VT dept of Engineering required Windows 3.1.

1996: Switched to Netscape Mail. Still POP3 and no SSL. Upgraded to Windows 95.

1997: MS Exchange account at Lockheed Martin – @lmco.com.  Used company directory and shared calendar. Also had an AIX Unix mailbox at Lockheed, but didn’t know how to use it.

1997: Signed up for first free webmail account so that I could talk to friends while at work – @hotmail.com. 2MB mailbox, so I must delete mail after I read it.  Microsoft would buy Hotmail within a year.

1998: Assigned to Defense Messaging Systems (DMS) at Lockheed, where I got to play with the email encryption and archiving technologies used by the US defense department. My project was to integrate an Oracle + EMC archiving solution with MS Outlook clients. The archiving burden was placed on the email user via an "Archive This" button, which was pretty stupid.

1998: At some point back at VT I upgraded to Windows 98 and Outlook 98. Still POP3, no SSL for my @vt.edu account. Started using mail filtering rules to organize my inbox. Stopped deleting mail.

2000: Laptop crashes. A year of Outlook data lost. Reinstalled with Windows 2000 and Outlook 2000. Much more stable.

2000: Hosting my first mail server. Learning a lot. Still using POP3 without SSL. Religiously using mail filtering rules so that nothing goes to my inbox. Everything is organized.

2001: Finally switched to SSL. Had to run stunnel on the server to wrap the plain-text ports.

2002 – 2004: Still a big fan of POP3 and Outlook 2000. 100+ folders. 100+ filtering rules. Archiving anything older than 90 days to a separate PST file.

 

2004: Turned off Outlook’s auto-check feature.  I hate getting interrupted by new email all day long.  Now I must press Send/Receive to get new mail.  Now I control my email rather than my email controlling me.

May 2005: Reached Outlook’s 2GB storage limit in my primary PST file, and Outlook is now dead and my data is corrupt. What a stupid MS bug. Tried at least 5 recovery programs before finding one that retrieved about 75% of my data. That’s it, I hate you Microsoft… switching to Thunderbird and IMAP.

June 2005: Loving IMAP! Loving Thunderbird! Set up all of my filtering rules via webmail so that it gets filtered during delivery. My mail folders look the same when I check webmail while traveling, which is incredibly convenient.

July 2005: Shit… Just realized that I there is no decent IMAP client for the Treo that can check 100+ IMAP folders efficiently. Many don’t even support nested folders. All are slow when they do. For now, when I travel I send a copy of all of my mail to a POP3 account and use VersaMail. Really need a 2 inch wide minimal html version of webmail.

Later 2005: Want to switch to webmail rather than Thunderbird, but webmail’s folder handling is still too clunky, and folders with thousands of emails are too slow for me to use. Sticking with Thunderbird IMAP for now.

June 2006: A ton of webmail performance improvements have been released this year. Large folders are now fast, switching between folders is now fast. Most everything is AJAXed. Here it goes… switching to webmail.

July 2006: Loving webmail, especially search! I miss multi-colored flags though (Thunderbird keywords). Lets see if I can push this feature with the webmail team.

Nov 2006: Sweet… multi-colored flags added to webmail beta during Hackathon 3. Thanks Steve!

Nov 2006: Finally… I can access all of my IMAP folders from my Treo, via Webmail Mobile.

Dec 2006: Started using webmail’s calendar now that beta has shared calendaring. Everybody at Webmail can see where I am every second of the day, and schedule meetings with me. Also switched all of my task lists to webmail.

Dec 2006: Oops. Used $75 in data transfer on my cell last month checking Webmail Mobile. Switching to Verizon’s unlimited data plan. Checking webmail from my Treo is going to be awesome over the holidays.

Dec 2006: 618 MB out of 10 GB used. I still have a habit of deleting emails that have large attachments. I should stop that.

Still need…

– notes inside of webmail (I use my Drafts folder for notes right now)
– file storage in webmail (again, I use my Drafts folder to store files right now)
– better contacts management in webmail
– calendar/contacts/tasks syncing between webmail and Treo

…Dell is human

I received a random phone call today from a nice woman named Katherine, from Dell’s Executive Support department.  Katherine said she came across my recent blog post discussing my problematic Dell order, and wanted to see if there was anything she could do to help.  I told her that I had in fact received Beth’s new computer yesterday, and that we had set it up last night.  We are very pleased with it.

Although, I told her that I never received my shipping notice or delivery notice emails.  She’s looking into that.

I never expected to hear from anybody at Dell.  Being the size of company that they are, I figured that even if a Dell employee did happen to find my post, it would never get to the right person and nothing of significance would come of it.  Apparently I was wrong.

My conversation with Katherine quickly turned into me asking questions about how she found my blog post, and what Dell’s strategy is by contacting people that write negatively about them.  She said Dell has launched a huge company wide customer loyalty initiative, and they are taking it very seriously.  She said that they have 150 people who’s job it is to scour blogs, message boards and websites to find posts like mine.  And then they do something about every one of them.

This is awesome, and I told her that.  Props to Dell.  That is how you run a business – even when you have 50,000 employees.

Just got off the phone with Dell…

I placed an order for Beth’s new computer last Tuesday (Dec 5), and I had my first bad experience with Dell.  Beth wrote about this last night too.

If you’ve ever ordered from Dell, you know that they are great about keeping you informed about your order.  They email you an order acknowledgment, then a confirmation that the order has been placed, then another email when it ships, and finally another when it is successfully delivered.

Well, I received my order acknowledgment via email right away, but never received anything else from them.  But I wasn’t concerned since Dell always seems to have their act together, so I let a few days go by.  By Sunday I still had not received any other emails from them, so I logged into my Dell account to check on my order and the order was not listed anywhere.  So finally concerned, I opened a ticket with Dell, and proceeded to wait….

On Tuesday I received my first response.  A very generic email stating that they are unable to locate my order, and that I need to call their sales department.  Great.

A few hours later, I received another email from Dell.  This time from a real person, stating that they had indeed received my order but it is pending and not yet processed.  They need to talk to me prior to releasing the order into production.  So I call.

It turns out that the speakers I ordered must be ordered with one of their flat-panel monitors.  Beth already has one of the required Dell flat-panel monitors, so I didn’t order one.  I just wanted the speakers so that I could attach them myself.  They said I’d have to remove the speakers from this order and place a separate order for them.  Weird, but whatever.  I removed the speakers from the order.

A 10 minute call to Dell fixed my problem, but…

Dell,

Why does something so simple confuse your processes?

Why did I have to contact you to fix the problem, instead of you contacting me?

Why couldn’t I see my acknowledged/unconfirmed order in my Dell account?

Why did you let me place this invalid order in the first place?

Amazon vs Rackspace

I’ve been asked several times recently what it means for Rackspace now that Webmail.us is using Amazon S3 (and EC2 & SQS) for data backups.  In case you missed it, last month we replaced our tape backups system managed by Rackspace, with a homegrown backups system built on top of Amazon’s web services.

Just yesterday in fact, in a great post on grid computing and Amazon, Joyent asked “So is Webmail.us’s use of Amazon’s web services a success for Amazon or a failure of Rackspace? Or both?”

Well let me answer publicly with what I have been telling everyone who has personally asked me this question…

Yes, our use of Amazon S3 displaced our use of Rackspace’s managed backups.  However, we desperately needed to replace it anyway.  Traditional data backups systems do a horrible job at backing up maildir formatted email data.  This is because with maildir, file names change frequently in order to track meta data such as Flagged, Read and Replied.  Each time the file name changes, the backups system sees a new file and backs the email up again.  This results is several copies of the same email being backed up and wastes backup resources – directly wasting our money.  This would be the case with any general purpose backups system, regardless of if it were a Rackspace hosted solution or not.

What we needed was a smarter backups system.  We needed to build something new; something custom; something designed specifically for the type of data we store.

We are a software and services company, not a hardware company.  Which is why we outsource our data center to Rackspace.  Rackspace owns the hardware, keeps it running, and replaces hardware components that break.  They do a great job at this.  We write and manage the software that runs on the hardware, and we do a great job at that.  A core software development philosophy at Webmail is to maintain a short development cycle; i.e. to release new features early and often.  One of the many ways we accomplish this goal is to build on top of re-usable components, whether that’s software that we write, open source software, or services hosted by other companies.  In this case we built on top of services hosted by another company.

Amazon’s web services allowed us to build something new.  By building on top of their S3 “storage cloud”, we were able to just develop the maildir backup logic and some data cleanup logic.  We were able to skip developing the backup storage system altogether.  We coded the storage client, not the storage server.

Initially we had planned on building both.  But when S3 came out our thoughts quickly shifted to “Screw that, lets just build the client and get this thing released”.

I strongly feel that moving our backups to S3 is a success for Amazon, and not at all a failure of Rackspace.

We’ve announced that this new backups system is saving us 75% monthly.  In the end, our backup data hosting costs would have been about equal had we built the backup storage system and hosted it on servers at Rackspace instead of using S3.  Our 75% cost savings came from building the logic that eliminated backing up the same email multiple times, which we were going to do in either case.

S3 allowed us to build this faster and start saving money earlier.

Will we host other applications on Amazon’s web services in the future?

Yes, if it makes sense to do so.

We have a limited number of programmers.  And as I have said before, when making “build vs buy” decisions it almost always comes down to two things: (1) Where do we feel our internal resources can be best spent?  and (2) Can we find a partner that we can trust with the rest of the stuff at an affordable cost?

Will our use of Amazon web services now or in the future replace our server growth at Rackspace?

Probably not.  We are growing fast and will always need a lot of servers to support our business.  I see these web services as a way to get more done, as opposed to replace existing stuff that we are doing currently.

We are always looking for ways to build new stuff faster.  In some cases this will mean building on top of services hosted by other companies, such as Amazon.  In other cases it will mean building on top of open source software and hosted it on servers at Rackspace.  And still, in other cases it will mean hiring more smart people to build it from scratch and host it on servers at Rackspace.  (speaking of which, if you’re a smart programmer shoot me an email)

Webmail is using Amazon Web Services

In my previous post regarding Amazon Web Services, I asked “where shall we start?”  i.e. Where would it make sense for Webmail.us to start using this sort of new infrastructure service?

That was a rhetorical question.  We have been building a replacement for our tape backups system using Amazon S3, SQS and EC2 for several months now, and today we have announced it’s release.  It is fully deployed, and working wonderfully.

Below are some links to the details, but hit me up with questions if you want more:

Technical Details from our website

Amazon S3 Success Story

Amazon Web Services Blog Post

Webmail.us Blog Post