Skip to content

My server’s hard drive crashed I have a new server

This weekend I spent quite a bit of time (that should have been dedicated to doing something fun, or at least to productive work) dealing with a significant hardware scare. The OS on my web server started freezing up randomly, not allowing me to make any changes to the system, or even to shut it down cleanly. A little bit of investigation showed that the root filesystem was setting itself to read-only, which in turn led me to “unspecified errors” in the SMART diagnostics. My excellent tech support contacts at Core Networks were quickly able to determine that the drive was indeed failing, and after a bunch of prep work and backups, we got the data moved to a new drive. Since mid-2009 I’ve had a colocated server with Core, and I honestly cannot say enough good things about them; they run a very cost-effective colo service, and their tech support is absolutely top-notch. If you need a physical colo server for any reason, I highly recommend them. However, physical servers do have one flaw: they’re physical, and they run on real hardware. Of course, virtual machines also run on real hardware, but the abstraction between the two is extensive enough that failing hardware can be easily migrated away from without the virtual system being aware of the change.

Wasting a day on diagnostics, tech support conversations, backups, and restorations made me question whether a physical server was really what I needed. The conclusion I came to was that I did not, and that a virtual server was the best choice. Although Core recently began offering virtual servers, and I was reluctant to take my business elsewhere, the fact is that while they’ve been doing colo for years, VMs are a very new market for them, and I’ve always had incredible success with Slicehost’s virtual servers. So, I signed up for a “512 slice” at Slicehost (which is a little cramped; I may upgrade in the near future) and have migrated all of my sites and data off of the physical server. While I’ll be sorry to say goodbye to Core, the fact is that I didn’t need the extras that having a colo server provided (hard drive space, in particular, tends to be much cheaper in physical servers) and the extra cost in terms of management burden simply wasn’t worthwhile.

Categories: Random.

Tags: , , , , , , ,

A Simple Django WebAuth Decorator

Like many universities, my employer uses Stanford’s WebAuth Single Sign-On package as one major piece in it’s computing account system. WebAuth is an MIT licensed infrastructure that allows decentralized web applications to securely authenticate users without themselves ever handling user credentials. Websites begin by sending unauthenticated users to a trusted WebAuth server which validates the user and provides them with a ticket which is passed back to the application. The application web server then communicates directly with the WebAuth server to validate the ticket it was given. If the ticket is valid, the application is provided with information on the user, and it can proceed without further interaction with the server. As long as the user’s WebAuth session remains active, any other application the user visits can authenticate the user behind the scenes.

The benefits to users of a consistent security interface are significant, so I’ve recently been pushing myself to make use of WebAuth where possible. Unfortunately, there’s not a lot of preexisting code for integrating WebAuth into mainstream web frameworks, so I’ve had to write my own, which — fortunately for me — hasn’t proven to be all that difficult. Today I’m going to share a snippet of code that I wrote to integrate WebAuth into a Django app:

webauth.py — Pretty Print HTML

webauth.py — Raw Code

@webauth_required is a Python decorator that functions similarly to the @login_required decorator that is provided with Django. Importing this decorator and prepending a view function with @webauth_required is all that’s needed to force that view to authenticate the user. Once they’re authenticated, their username is stored in a Django session for authorization purposes (as ‘netid’ in this code, since that’s the parlance familiar with users and developers on my campus). Obviously the WebAuth endpoint (AUTH_URL) is also specific to my situation, and most WebAuth providers will require registration of client applications to enhance security.

The only thing that’s left is to provide a logout mechanism. My logout view (one of the few views that doesn’t need the @webauth_required decorator in my app!) simply destroys the session data and provides the user with a link to log out of WebAuth entirely; it’s possible for the user to log out of the app but remain logged in to WebAuth, which effectively leaves them logged into the app (since they’ll be re-authenticated behind the scenes if they return), but that’s how the powers that be have asked client apps to behave, so that’s how it is.

Categories: Random.

Tags: , , , , ,

Contemplating cost-efficient, highly reliable storage

I have long been a fan of NetApp’s storage products; my office purchased our first NetApp filer shelf back in 2005, and over the years have added an additional five shelves as our capacity needs have grown. Management of these devices has almost always been painless, and NetApp’s support has generally been top notch. Having a new drive show up in my mailroom along with a note describing which drive slot it should go in — because of a predicted potential failure — without any involvement on my part? As someone who’s charged with protecting valuable research data, this level of monitoring helps me sleep at night. And automated snapshots give my users the ability to restore most accidental deletions without ever having to admit to me or our support staff that they’ve done something careless, which is a win for everyone involved.

In fact, there’s really only one downside to NetApp’s product line: price. The most recent quote for a full NetApp system I’ve seen put the total first year cost for storage (with SATA drives) somewhere close to $10/GB, with significantly reduced annual maintenance fees thereafter. Lately, as the NetApp’s dispute with Sun over WAFL and ZFS continues to crumble, I’ve been looking into ZFS and it’s feasibility for our premium storage. Sun’s lowest end storage servers (with high performance, low capacity SAS drives) have a list price point of about $4/GB, while commodity hardware prices for SATA drive based systems are $0.30/GB; as our total storage needs continue to skyrocket, it makes sense to see if I can replicate most of the critical abilities of NetApp storage at a more reasonable cost.

We will soon be deploying a storage server with somewhere between 32-48 TB of raw disk space, shared via NFS and CIFS. My plan is to install OpenSolaris on this hardware and set it up with ZFS RAIDZ2, ZFS’s equivalent to the double-parity RAID-6. In preparation for this, I’ve been playing with OpenSolaris VMs, trying to get a feel for how management of ZFS is handled, and so far, I’ve liked what I’ve seen.

Over time, I’d like to document the critical steps I’ve had to go through to get a working production system, along with my conclusions along the way about the viability of this platform as a significant piece of our overall storage platform. For starters, here’s a few of the things I’ve come across so far on my test system, which has two 20GB system disks and five 8gb data disks:

At least when using the OpenSolaris LiveCD installer, root mirrors have to be set up after the fact, and mirrored drives must be set up with Solaris slices

The OpenSolaris installer was a pleasure to use, but it’s simplicity makes for a lot of additional tuning after the fact to get a usable system, and while ZFS is the default filesystem, you can only install to a single drive. My root partition was c3t0d0s0, so after booting into the system, I set about to add a mirror. The only hitch I ran into was an error about rpool (the Solaris root pool) needing to live only on Solaris slices — I’d tried to attach an unlabeled c3t1d0 drive to the pool. After labeling the disk and adding a slice 0, I was able to create a mirror with a single command:

pfexec zpool attach -f rpool c3t0d0s0 c3t1d0s0

To watch the progress of data “resilvering” over to the new drive, I used:

zpool status rpool

After a few minutes, this process was done and I moved on with adding a data pool.

RAIDZ pools can’t be internally adjusted

Initially I created a RAIDZ2 pool with three disks, the minimum number of devices possible, planning to add the other two later:

pfexec zpool create data raidz2 c3t2d0 c3t3d0 c3t4d0

While I had no problems with this command (creating an 8GB volume from three 8GB disks), adding drives later did not work as I’d hoped:

pfexec zpool add data c3t5d0 c3t6d0
invalid vdev specificationuse '-f' to override the following errors:
mismatched replication level: pool uses raidz and new vdev is disk

Sadly, increasing the “width” of a RAIDZ pool is not currently a feature of ZFS, although it would be an important one to have, and is supposedly in the works. This doesn’t mean that a RAIDZ pool is forever locked to it’s original size, rather it means that to increase the pool’s size, you have to add additional RAIDZ “subpools” (this is almost certainly not the correct terminology) which then operate side-by-side with the original RAIDZ logical device, instead of integrating all of the disks into a single logical device. This means that if you plan to build an expandable storage server, you’ll need to think carefully about how you want to allocate parity disks, hot spares, and filesystems.

Anyhow, once I discovered this I destroyed the pool and recreated it with with all five disks:

zpool create data raidz2 c3t2d0 c3t3d0 c3t4d0 c3t5d0 c3t6d0

This gave me a 24GB volume made up of five 8GB drives, which is what I’d intended for this test system. Next I set the mountpoint (which defaults to /<pool name>):

zfs set mountpoint=/export/data data

ZFS Deduplication may not be appropriate for many data sets

Next I added a couple of additional configuration options to enable compression and deduplication:

zfs set dedup=on data
zfs set compression=gzip data

Deduplication is very much a new feature in ZFS, and is currently only available in the very latest Release Candidate builds of OpenSolaris. It’s not a feature that I would enable in production without carefuly testing not only for this reason, but also because I’m unsure of how it scales to very large filesystems (unlike NetApp, which has a 1-TB volume size limit on their A-SIS dedup technology, ZFS has no such hard limits); Deduplication requires that data block hashes be searchable (and potentially searched often), and at some point a large enough amount of data will require that index to spill out of RAM and onto disk, which could potentially have a very negative affect. Jeff Bonwick of Sun has written an excellent blog post that goes into more detail about dedup in general and ZFS’s implementation.

While dedup is yet another feature of NetApp’s storage that I find very valuable, ZFS dedup is new enough that there aren’t enough field reports on it to have a good idea of how well it performs in extreme cases. Once I have real hardware to test this on (my test system lives on deduped NetApp volumes, so performance testing would be all but meaningless, not to mention cost-prohibitive given the space I would need), I hope to be able to shed some light on the matter of scalability; only time is likely to make me comfortable with the maturity factor.

Categories: Random.

Tags: , , , , , , ,

Quick-n-Dirty Nagios Plugins

In the spirit of the holidays — and of spending more time at home, away from the server room — I’d like to present a sample script from the many that I use to monitor (from home, the beach, or my mythical cabin in the woods) my computers’ collective health. Obviously, Nagios itself comes with a rich collection of plugins, and plenty of people have written additional check scripts for most common services, but at some point, a system administrator is likely to be responsible for services which for whatever reason aren’t covered by existing scripts, so having a template to whip up your own script can be invaluable. Before I go any further, a couple of disclaimers — nothing here should be taken as a style guide to writing shippable, enterprise-class Nagios plugins; note the title: “Quick-n-Dirty”. The example script I’ve chosen, which checks the status of a Windows DHCP Server address pool, is production-worthy, but fairly case-specific. There is no –usage flag, the threshhold values are hard coded into the script, and the arguments must be given positionally. Each of these could easily be fixed, but even the cost small of doing so isn’t justifiable in my environment.

check_dhcp.txt

Basic requirements:

  • Python
  • SSH keyed access to the Windows server — I use Bitvise WinSSHD, which costs money in a commercial environment, but there are other options, such as FreeSSHd.
  • Nagios, obviously, is needed to get much value out of this script

Usage:

./check_dhcp <hostname> <subnet>

First, a little bit of python boilerplate code:

#!/usr/bin/env python
import os, sys, re

Next, a few variables to keep track of what we report to nagios:

STATUS = ["OK", "WARNING", "CRITICAL", "UNKNOWN"]
status = "UNKNOWN"
service = "DHCP"

The STATUS array maps the allowable status values to the exit codes that they correspond to, and the initial status is set to UNKNOWN, which is what the script should report if anything goes wrong with the script itself during execution. Finally, I set the service name here, though it’s not strictly needed.

warn_level = .8
crit_level = .9

used = free = total = 0
used_percent = 1

The first two variables above are the hard coded threshold values I mentioned earlier, and the rest are the variables I’ll use to keep track of the status of the DHCP pool. Note below, where we set the command that we’ll actually run to get the status data, that the location of ssh is hard coded, and might need to be adjusted for your environment:

check_cmd = r'/usr/bin/ssh %(server)s "netsh dhcp server show mibinfo"' \
% {"server": sys.argv[1]}subnet = sys.argv[2]

The next section parses the output of the check command line by line, looking first for the specified subnet, then for the usage data for that subnet:

ss = None
for ll in os.popen3(check_cmd)[1].readlines():
  rr = re.match(r"\s*Subnet = (?P<ss>[\d\.]+)\.", ll)
  if (rr):
    ss = rr.group("ss")
    if (ss != subnet):
      ss = None
      rr = re.match(r"\s*(?P<kk>[^=]+) = (?P<vv>[\d]+)", ll)
      if (ss and rr):
        if (rr.group("kk") == "No. of Addresses in use"):
          used = int(rr.group("vv"))
          if (rr.group("kk") == "No. of free Addresses"):
            free = int(rr.group("vv"))

Now we calculate the value we’re looking for — the percentage of address space that’s currently being leased:

total = used + freeif (total > 0):
  used_percent = float(used) / totaldetail = \
  "%d of %d addresses in use; %d%% utilization" \
  % (used, total, used_percent * 100)

Finally, we compare the current status data with the threshholds we’ve set, and set the output status variable accordingly, build a status message, and look up the return code in the STATUS array:

if (used_percent >= crit_level):
  status = "CRITICAL"
elif (used_percent >= warn_level):
  status = "WARNING"
elif (used_percent == 0):
  status = "UNKNOWN"
else:    status = "OK"

print "%s %s: (%s) %s" % (service, status, subnet, detail)
sys.exit(STATUS.index(status))

And that’s all there is to it. This script returns status messages like the following:

DHCP OK: (150.135.220.0) 229 of 505 addresses in use: 45% utilization

As you’ve probably noticed, about half of this script is fairly standard structural stuff which translates pretty well to most other Nagios check scripts; the DHCP-specific code all happens in the middle of the script. Below is a script for checking that a Solaris LDAP server is working that uses pretty much the same template, but is stripped down even further:

#!/usr/bin/env python
import os, sys, re

STATUS = ["OK", "WARNING", "CRITICAL", "UNKNOWN"]
status = "OK"
service = "LDAP"

check_cmd = r'/usr/bin/ldaplist passwd nagios'
detail = os.popen3(check_cmd)[1].readline().rstrip()

if (detail.find("uid=nagios") >= 0):
  status = "OK"
else:
  status = "CRITICAL"

print "%s %s: %s" % (service, status, detail)
sys.exit(STATUS.index(status))

Lastly, here’s a script (which I’m pretty sure must have been written by one of my coworkers) that uses python’s subprocess module to pipe commands together, which allows for some more complex processing of status information:

#!/usr/bin/env python
import subprocess, sys

STATUS = ["OK", "WARNING", "CRITICAL", "UNKNOWN"]
status = "OK"
service = "Mailman"

p1 = subprocess.Popen(["/usr/bin/ps", "-eaf"], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["grep", "[m]ailmanctl"], \
stdin=p1.stdout, stdout=subprocess.PIPE)
return_code = p2.wait()

if (return_code == 0):
  status = "OK"
  detail = "Master daemon found."
else:
  status = "CRITICAL"
  detail = "Master daemon not found."

print "%s %s: %s" % (service, status, detail)
sys.exit(STATUS.index(status))

Hopefully, this sampling of scripts gives you a good starting point for whipping up some more scripts that are helpful in your own environment; if you come up with anything particularly innovative or interesting, I’d love so hear about it!

Categories: Random.

Tags: , , , , , , , , ,