Musings of a Data professional

Stuart Moore

Month: May 2013

What should a SQL Server DBA be doing in a disaster?

What does a DBA do in a disaster?You’ve had a disaster at work, the data centre is off line, management are panicing and money is being lost. What should you be doing?

Short answer :
nothing that isn’t in the Disaster Recovery plan (you do have one of those don’t you?)

Longer answer:
Some time ago I got into work at my usual hour to notice a distinct lack of server admins at their desks and other early starters unable to log in to Active Directory, neither of which filled me with confidence for a quiet day working on writing some training material.

A quick walk down to the data centre found some worried looking electricians, some annoyed looking server admins and a distinct lack of server noise. Obviously something major had failed and taken out everything, so time to dig out the disaster recovery plans..

In previous roles I’ve had responsibility for hardware and infrastructure so the urge to jump in and help out was strong. But I could see that the correct guys were on the case, and that with the influx of panicing management another body was only going to slow things down and get in the way.

So I went back to my desk, got out a copy of the SQL Server Disaster Recovery plan and started refreshing (you do have an offline copy you can read without power, network or server infrastructure?) . This was the best place for me to be, and the best use of my time while things were being fixed, I was ensuring that:

  • I could be found easily  by anyone needing me
  • Would have information to hand as soon as needed. (which order to bring back the hardware back online, SAN dependencies, etc)
  • I could make a condensed checklist for what to do when things came back up based on time (ie; kill off an import that would fail due to a missing dependancy)
  • Keep out of the way of the people who were working hard to get their part of the disaster plan enacted. They really don’t need another person who doesn’t know what’s going on bugging them.
  • Work with management to work out the order for bringing back services
  • Make sure I had all the contact details I needed for the service owners who I’d have to liase with for final release (you just know that a disaster will happen while X is on leave, and you won’t have the number for Y who’s covering their role)

All of this meant that when the call came up that the physical layers were back up and running I quickly started bring up the SQL Server instances in an orderly fashion, and could quickly run through my checks before handing them over to the business users for clearance.

Disaster recovery plans take a long time to write and to get right, but if you’ve got it right then on the day it makes life go easier if you stick to it. When the business is suffering it’s not the right time to be improvising, or trying to work on something you’re not completely OK with, or the time to be trying out some new technique you’ve read on a random DBA’s blog.

Making sure you study it is even more important if it’s a disaster plan you didn’t write. If you ever get brought in to enact a disaster recovery plan you’ve never seen before and you don’t know the providence of, make sure you read it thoroughly before you start. You don’t want to get half way through and find out that you’re missing either a vital piece of infrastructure or software.

(Burnt server image courtesy of John/Roadhunter – http://www.flickr.com/photos/roadhunter/68017745/)

Juggling Study and real life

cat interrupts studying

I’m in the middle of recertifying as a SQL Server MCSE, studying for a Mathematics Degree with the Open University and trying to train for various long distance cycling events. It can feel like an uphill struggle to fit it all in and still have time for friends and family (and the cats). And that’s without the usual 9-5 we all have

So how to juggle it all?

I try to think first in long term chunks, usually 6 months and 12 months. For an OU module I have to think across a whole 12 month period as that’s the teaching period. For my MCSE I can split it into 6 month chunks (doing MCSA in the first 6 months, and then the remaining exams in another 6 month period).

Once I’ve made sure that I am not overloading myself in one period I make sure I have all the dates on the calendar.

First off I block out any immovable dates, for example our wedding anniversary, OU exam periods, important birthdays, etc. These are sacrosanct and I either can’t or won’t move them. Everything else has to fit in around these, or will have to be a once in a life time opportunity.

Next to go in are the dates I want to achieve something. This could be an MCSE examination or a particular Audax event. These could be fixed or movable. This year for example:

  • I have a fixed goal date of the end of July for being fit enough to ride the London Edinburgh London audax.  This can’t be moved.
  • I have an exam date set in August for completing the MCSA phase of my MCSE qualification. This could be moved if needed.
  • OU assignment hand ins. These are a fixed deadline, but I can hand in earlier if needed if I need to free up time. (Next week I’ll be doing this so I’m not working through a holiday)

Now I’ve got the goals I can start working backwards to work our what needs to happen when. How I do this for each goal will be different depending on how I need to study or work towards them. For my cycling goals I know I’ll need more time and effort as the goal nears as I’ll be taking longer training rides. Studying for my courses and exams should hopefully remain constant with just a slight increase in time as I revise before exams.

Now I’ve a good idea of what I need to do and when I can start to plan my weeks to make sure I’ve got some time set aside to get everything done. So I’ll block out 3 hours on a Wednesday straight after work to go for a long training ride on the bike, with shorter 1 hour rides on a Monday and Friday. Then I might schedule 2 hours on a Thursday after dinner to work on my OU course or even 2 evening for some of the tougher modules.

By having a set routine it gets easier to remember what you’re meant to be doing when, and all the other bits and pieces fall into place. And by sharing the routine with my wife we both know when I’m going to be busy, or days when I’d rather not have something else happen.

But having done all that planning it’s important to stay flexible, stuff happens.

  • A tier 1 database down at work 10 minutes before clocking off time on a wednesday, you can be sure my bike ride’s gone out the window.
  • A friend is visiting town for the first time in years on a Thursday evening, not a problem, as I know when my free slots are so I can move my study time around to suit.

All of it’s easily coped with if you’ve left yourself some slack. And remember, it’s always good to have some downtime. A night in from of the TV isn’t wasted if you’re relaxing from hard work on the other nights, especially if you’re relaxing with loved one.

Using R to average perfmon statistics and plot them

In the last post (Simple plot of perfmon data in R) I covered how to do a simple plot of perfmon counters against time. This post will cover a couple of slightly more advanced ways of plotting the data.

First up is if you want to average your data to take out some of the high points. This could be useful if you’re sampling at 15 second intervals with perfmon but don’t need that level of detail.

The initial setup and load of data is the same as before (if you need the demo csv, you can download it here):

install.packages(c("ggplot2","reshape2"))
library("ggplot2")
library("reshape2")

data <-read.table("C:\\R-perfmon\\R-perfmon.csv",sep=",",header=TRUE)
cname<-c("Time","Avg Disk Queue Length","Avg Disk Read Queue Length","Avg Disk Write Queue Length","Total Processor Time%","System Processes","System Process Queue Length")
colnames(data)<-cname
data$Time<-as.POSIXct(data$Time, format='%m/%d/%Y %H:%M:%S')

avgdata<-aggregate(data,list(segment=cut(data$Time,"15 min")),mean)

avgdata$segment<-as.POSIXct(avgdata$Time, format='%Y-%m-%d %H:%M:%S')
avgdata$Time<-NULL
mavgdata<-melt(avgdata,id.vars="segment")

ggplot(data=mavgdata,aes(x=segment,y=value,colour=variable))+
+ geom_point(size=.2) +
+ stat_smooth() +
+ theme_bw()

The first 8 lines of R code should look familiar as they’re the same used last time to load the Permon data and rename the columns. Once that’s done, then we:

10: Create a new dataframe from our base data using the aggregate function. We tell it to work on the data dataframe, and that we want to segment it by 15 minute intervals, and we want the mean average across that 15 minute section
11: We drop the Time column from our new dataframe, as it’s no longer of any us to us
12: Convert the segment column to a datetime format (note that we use a different format string here to previous calls, this is due to the way that aggregate writes the segment values.
13: We melt the dataframe to make plotting easier.

And then we use the same plotting options as we did before, which gives us:

R plot of perfmon data at 15 minute average

If you compare it to this chart we plotted before with all the data points, you can see that it is much cleaner, but we’ve lost some information as it’s averaged out some of the peaks and troughs throughout the day:

Perfmon data plotted on graph using R

But we can quickly try another sized segment to help out. In this case we can just run:

minavgdata<-aggregate(data,list(segment=cut(data$Time,"15 min")),mean)
minavgdata$Time<-NULL
minavgdata$segment<-as.POSIXct(minavgdata$Time, format='%Y-%m-%d %H:%M:%S')
mminavgdata<-melt(minavgdata,id.vars="segment")

ggplot(data=mminavgdata,aes(x=segment,y=value,colour=variable))+
+ geom_point(size=.2) +
+ stat_smooth() +
+ theme_bw()

Which provides us with a clearer plot that our original, but keeps much more of the information than the 15 minute average:

R plot of perfmon data at 1 minute average

Powered by WordPress & Theme by Anders Norén