Friday, 14 August 2015

DBA Training Week 8 - What Pages Are In Memory?






What Objects Are In Cache?

We like to think SQL Server is using all of our memory to cache data, but that's just part of it. SQL Server uses memory for lots of things:
  • Caching database objects
  • Sorting data for your query results
  • Caching execution plans
  • Performing system tasks
Often, we're surprised by how little data is being cached for each database.

Last week, we looked at the 8KB pages in our database. Those pages are the same whether they're on disk or in memory - they include the database ID and the object ID, so if we looked at all of the pages in memory, we could figure out which tables are being cached in memory right now. The below query gives us the magic answers, but be aware that the more memory you have, the longer this will take. It won't block other users, but it could take a minute or two if you've got >64GB memory, several minutes if you've got a terabyte or more:
 
SELECT CAST(COUNT(*) * 8 / 1024.0 AS NUMERIC(10, 2)) AS CachedDataMB , 
CASE database_id WHEN 32767 THEN 'ResourceDb' ELSE DB_NAME(database_id) END AS DatabaseName 
FROM sys.dm_os_buffer_descriptors 
GROUP BY DB_NAME(database_id) , database_id 
ORDER BY 1 DESC

Compare the size of each database versus how much is being cached. Often in the field, I'll see 100GB databases that just have 8-12GB of data cached in memory. That might be completely okay - if you only regularly query just that amount of data - but what if we constantly need all 100GB, and we're constantly pulling it from disk?

This Leads to Cool QuestionsThis DMV query leads to so many cool performance tuning questions! I get so excited by these concepts.

How fast are the cached pages changing? From the moment we read an 8KB page off disk, how long does it stay in memory before we have to flush it out of the cache to make room for something else we're reading off disk? This concept is Page Life Expectancy, a Perfmon counter that measures in seconds how long things stay in RAM. The longer, the better, as I explain in my Perfmon tutorial.

Do the results change based on time of day? This is a one-time snapshot of what's in memory at the moment, but it can change in a heartbeat. If you have automated processes that run a bunch of reports in a single database at 2AM, then the memory picture will look completely different then.

Are we caching low-value data? If you mix vendor apps and in-house-written apps on the server, you'll often find that the worst-written application will use the most memory. Thing is, that might not be the most important application. Unfortunately, we don't have a way of capping how much memory gets used by each database. This is why most shops prefer to run vendor applications on separate virtual machines or servers - this way, they don't hog all the memory on a SQL Server that needs to serve other applications.

Do we have enough memory? If you're running SQL Server 2008/R2/12 Standard Edition, you're limited to just 64GB of physical RAM. If you're running SQL Server on bare metal (not a VM) and you've got any less than 64GB, go buy enough to get to 64GB. It's the safest, easiest performance tuning change you can make. If you're in a VM or running Enterprise Edition, the memory question gets a lot tougher. To learn more, read A Sysadmin's Guide to SQL Server Memory.

Are we using memory for anything other than SQL Server? If we've got Integration Services, Analysis Services, Reporting Services, or any other applications installed on our server, these are robbing us of precious memory that we might need to cache data. Don't remote desktop into your SQL Server and run SSMS, either - it's a memory pig. Put your management tools on a virtual machine in the data center, and remote desktop into that instead.

Can we reduce memory needs with indexes? If we've got a really wide table (lots of fields) or a really wide index, and we're not actively querying most of those fields, then we're caching a whole bunch of data we don't need. Remember, SQL Server is caching at the page level, not at the field level. A nonclustered index is a narrower copy of the table with just the fields/columns we want. The less fields, the more data we can pack in per page. The more we can pack in, the more data we're caching, and the less we need to hit disk.

When I tune indexes on a server I've never seen before, sys.dm_os_buffer_descriptors is one of the first places I look. The database with the most stuff cached here is likely to be the one that needs the most index help, and we'll talk indexes next.

Brent says: when I was a junior DBA, I focused a lot on the storage. I kept complaining to my SAN administrators because my storage didn't respond fast enough - my drives were taking 50ms, 100ms, or even 200ms in order to deliver data for my queries.

The SAN admin kept saying, "It's okay. The SAN has a cache." Thing is, the size of the SAN's cache is typically 32GB-128GB - which at first sounds like a lot - but divide it between all of the servers connected to the SAN. Often, we find that an individual SQL Server might get only a couple of gigabytes of SAN cache. That's way less than what the SQL Server has in memory. What are the odds that, when we need data for a query, it's not going to be in SQL Server's 64GB of memory, but it IS going to be in the SAN's miserly 2GB of cache? Not gonna happen.

SAN caching is still great for writes, especially for the transaction log, but don't count on it helping for performance for SELECT queries.

To learn more, check out my training course on Storage, Virtualization, and Hardware for DBAs. As a thank-you for reading these and investing in your own skills, you can use coupon code HardwareTraining to knock off $100 this week.



DBA Training Week 7 - Managing Space Inside Databases




Pages: What SQL Puts in Files

Inside each of your database data files (MDFs), SQL Server stores your data in 8KB pages. That's kilobytes - not megabytes, not gigabytes, but just mere kilobytes.

Say that we create a table:

CREATE TABLE dbo.Employees
(EmployeeID INT IDENTITY(1,1),
 EmployeeName VARCHAR(200))

First off, yes, I understand, I shouldn't do EmployeeName in one field - I should have FirstName, LastName, MiddleName, Suffix, Prefix, yadda yadda yadda, but I'm trying to keep this email short. Now see what you did? It's long again. Doggone it, it's hard teaching this stuff in an email.

Anyhoo, in this table, each record takes up just a little bit of space. EmployeeID is an INTeger, which takes 4 bytes. It's the same 4 bytes whether that number is 1 or 1,000,000. EmployeeName is a VARCHAR(200), which means we can store up to 200 characters in here, and each character takes a byte. If we insert 'BRENT OZAR', that's 10 characters (and boy, am I a character), so we need 10 bytes to store it.

If all of our employees average about 10 characters in their name, that means we could fit about 500-600 records per 8KB database page. (In reality, there's some overhead because SQL Server also needs to use some parts of the page to store metadata, and we'll talk about that later in the training.)

Brent Ozar Unlimited is a small company, so we can keep all of our employees on a single 8KB page. As we insert, update, and delete employees, SQL Server fetches that 8KB page off disk, brings it into memory, makes the necessary changes, and then writes that data page back to disk. The 8KB page itself is the smallest unit that SQL Server will cache - it doesn't cache individual rows/records - and each page belongs exclusively to just one object.

A Word About Objects

You'll notice that I avoid using the word "table". Tables are cool, but as we start to dig into what SQL Server's doing under the hood, we want to start thinking about these three object types:

Heap - a table with no clustered index. In my dbo.Employees table, I didn't specify in what order SQL Server should store my data, so it's just going to slap the data down on my 8KB page in any old order.

Clustered Index - what we normally think of as a table. If I'd have created my table like this:

CREATE TABLE dbo.Employees
(EmployeeID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
 EmployeeName VARCHAR(200))

Then SQL Server would store my data in order of EmployeeID. That way, when I search for EmployeeID #42, it can go directly to that number without scanning through all of my employees. The clustered index is sorted in the order of the EmployeeID field, but it's actually the full copy of our table, including all of our fields - in this case, just EmployeeName.

Nonclustered index - If I tell SQL Server to:

CREATE NONCLUSTERED INDEX IX_Name
ON dbo.Employees(EmployeeName)

Then SQL Server will create a second copy of my table sorted by EmployeeName. This copy of my table will only include the fields specified in my index (EmployeeName), plus whatever fields it needs to get back to the clustered index (in this case, my clustering key, EmployeeID).

All three of these objects - heaps, clustered indexes, and nonclustered indexes - will be stored on separate sets of pages. We won't have the clustered index and nonclustered index for the same table on the same page - they're split. That's why when we're doing space analysis, we have to think in terms of indexes, not tables. To learn more index basics, read Jes Borland's SQL Server Index Terms.

Pages & Rows on Disk

The sys.dm_db_index_physical stats Dynamic Management Function (DMF) returns the number of rows and pages stored in each database object. It takes parameters for database ID and object ID, or you can pass in NULLs to get information across all of your database objects. Scroll down to the examples link in that Books Online page, and you'll get queries to examine a single table - I'd strongly recommend starting with a small table, because some of the parameters for this DMF will cause SQL Server to actually look at every page in the object. That means if all of the pages for that object aren't cached in memory, SQL Server will go pull those pages off disk, and that can slow down your running SQL Server.

This DMF also includes average record size and max record size. This makes for fun spelunking: how big is each record, really? Just because we make everything a VARCHAR(8000) doesn't mean we're actually storing 8,000 characters in each field. Now, don't go changing your database structure just yet - you can easily break applications when datatypes change. Let's leave that for later.

You can get similar metadata much faster by using sp_spaceused, but it doesn't get the cool info about average record size, and I wanted to encourage you to go spelunking here.

Learning More About PagesIn my 90-minute session How to Think Like the Engine, I explain pages, indexes, joins, SARGability, and more. I use real database pages from the StackOverflow.com database for demos, and you actually get PDFs to print out and follow along as we go.

Your Homework

Let's start thinking about what objects are taking up space in our databases. Check out our free sp_BlitzIndex™ stored procedure that analyzes the indexes in your database from a psychologist's point of view. Is your database a hoarder, clutching on to a bunch of nonclustered indexes that aren't getting used, and are just taking up space? The details columns in the results will show how big each index is, and whether it's getting used or not.

This is important because the more indexes you have:
  • The longer your backups take
  • The longer index rebuilds take
  • The more memory you need to cache everything
  • And most importantly, the slower your inserts/updates/deletes go, because SQL Server has to maintain more copies of your table
As we start to move up the Hierarchy of Needs from capacity into performance, you'll start to see how these foundational items are so important. See you next week! 

Brent says: Wanna see what a page looks like? Check out the DBCC PAGE command. You pass in a database name, file number, and page number, and SQL Server will return the nearly-raw contents of that page along with diagnostic information. It's a fun way to get a peek under the hood.

This week's topic is a good example of how knowing the basics of database internals can come in handy when you step back and think about database server performance. You don't have to use DBCC PAGE as part of your job as a DBA, but just knowing what an 8KB page is helps you understand the output of various Dynamic Management Views (DMVs) when they report back units in pages.

For more, watch How to Think Like The Engine. Use coupon code Engine19 and it's just $19 this week.

Free related videos:

How Does SQL Server Store Data?
Back to Index Basics
Heaps in SQL Server

How to Tune Indexes
5 Hours, $299

Want to become an index tuning expert? Kendra Little shows you how. Use coupon code InsideIndexes and get $100 off.


DBA Training Week 6 - How to Monitor Drive Space

How to Monitor Drive Space

Don't.

It's not your job.

No, seriously, hang on a second. I'm not saying that you should ever say things like "It's not my job," but I do want you to understand when you should avoid reinventing the wheel. Your Windows administrators should be monitoring every Windows server you have, making sure it doesn't run out of drive space. This isn't some wild, unique need that only SQL Server has - it's everywhere in your data center.

(You might actually even BE a Windows admin, just here because you need to learn about working with SQL Server. It might actually BE your job to monitor this stuff. That's cool - that's how I got started too.)

In our journey from the base of my Hierarchy of Database Needs to the top, we do indeed need to talk about capacity - but I don't want to monitor drive space from inside SQL Server, and I don't want to buy SQL-specific tools in order to pull it off. Yes, you can technically use commands like xp_fixeddrives to list all of the SQL Server's drive letters and how much free space they have, but that doesn't work everywhere. For example, you might have mount points or databases on UNC paths, neither of which will show up in xp_fixeddrives. So leave the drive space monitoring to the pros.

Why You Shouldn't Build a Monitoring ToolIf you want to be a professional developer, you should build things. You should learn what exceptions could happen, how to trap them, and how to fail elegantly. It's hard work, but if you get good at it - really good - you can build amazing things and make a killer living for yourself.

But if you want to be a professional DBA, you should leave the developing to the developers.

I'm not saying you shouldn't learn the dynamic management views (DMVs), how to dig through them to find the evidence you need, and how to turn that data into actions. Hopefully, I'm going to teach you how to do a lot of that over the course of the next six months. Take another look at the Hierarchy of Database Needs again, and think for a second about all the things we're going to be learning over the next six months. Just in the last five weeks, I've had you build an inventory of your servers, automate your database restores, start locking down security, and enumerate your database risks. The next few months are chock full of things that will help you become a hero in the eyes of your users and your developers.

Building a crappy monitoring tool in your spare time will not give you that same level of respect. (And yes, if you've only got your spare time at work, your monitoring tool is going to be crappy. If you've got so much time that you can build a great monitoring tool, you need to focus on providing results to the business fast, because you're about to be laid off as pure overhead.)

How to Pick a Monitoring ToolThere's basically two kinds of monitoring tools out there for SQL Server.

Up/down Alerting - these tools make sure the SQL Server service is running, and that it has all the basic needs taken care of. If the server runs out of drive space, if the service stops, if the event log starts filling up with errors, or zombies attack, these tools will let you know. The most popular software in this category is Dell Spotlight, Idera SQL Diagnostic Manager, and Red Gate SQL Monitor.

Performance Diagnostics - these tools try to derive insight from SQL Server's dynamic management views (DMVs) to help performance tuners wring the most throughput out of their hardware. The most popular software here is SQL Sentry Performance Advisor, Dell Foglight Performance Analysis, and to some extent, the tools in the above category too.

If I was you, I'd start by asking the Windows team if they've got any software that handles the up/down alerting for services, drive capacity monitoring, etc. If so, get them to start monitoring your servers. I'm being selfish here - my real reason is that I want to save my monitoring budget for tools in the Performance Diagnostics category. Surprisingly, all of these tools are around the same price - around $1,000-$2,000 USD per monitored instance.

Then, no matter which category of tool you're buying, make a list of the last 4-5 things that have caused your phone to ring after hours. Here's some of my personal favorites:
  • Deadlocks
  • Out-of-control query running for hours
  • Long-running job, like a backup going too slow
  • Queries that desperately need an index
Figure out how to recreate those same symptoms in your development environment, and then get a free trial of a couple of the tools I mentioned. (All of them provide free 10-14 day trials.) Reproduce the problem, and watch how the monitoring software reacts. Does it lead you to the root cause quickly, or does it just have a bunch of flashing red lights on the console? The best ones will save you time by getting you right to the solution.

After you've done your evaluation and picked a favorite, get a quote from them - and get a quote from the other vendors as well. Share the quotes with the competing salespeople. You'd be surprised how willing they are to negotiate, and you might be able to use the cheapest quote to get the best tool at a price that works for you.

Your Homework This WeekI don't expect you to magically get budget approval for a tool this week, but I want you to start making a list of day to day problems as they strike. If you waste four hours troubleshooting a deadlock problem, make a note of the date, the time required, and a few words about the root cause. In a month or two, you'll be able to present this list to your boss as proof of why the right monitoring tool could save you money.

I'm not selling you vendor tools, by the way. I often link to vendor tools, but I don't receive any compensation. I'm just a huge believer in using the right tool to look like a hero fast - instead of building your own hammer every time you change jobs.

If you're still convinced you should build your own tool, go check out the open source project Opserver. It's a free performance monitoring tool from the guys at Stack Exchange, home of Stack Overflow. They use SQL Server (and a lot of other cool tools), and theirs is the best open source performance dashboard I've ever seen. (And that's not just because I helped with some of the DMV queries.)

See you next week!
Brent 

Brent Says: People often ask me what's my favorite monitoring tool. Because I jump around from one client to the next, and they all use different tools, I don't get the luxury of knowing one tool well.

When I do see folks that are happy with their monitoring tool, these are the tools they're usually using - listed in alphabetical order:

Dell Spotlight on SQL Server Enterprise

Idera SQL Diagnostic Manager

Red Gate SQL Monitor

SQL Sentry Performance Advisor



DBA Training Week 5 - Risk, Governance, and Compliance

Risk isn't just a fun board game.

Download the SQL Server Compliance Guide. It's an old 92-page whitepaper circa 2008, but it might just be the best (and most timeless) technical document that Microsoft has ever produced.

I only need you to read pages 7-13. In those six pages, you'll understand the differences between risk management, governance, and compliance. Risk management means knowing what risks the company is taking with the data, governance means the actions taken to address the risks, and compliance means someone is double-checking that we're actually doing the governance stuff we say we're doing.

Your job as a DBA involves all three, but when you're just getting started with compliance, focus on risk management. We need to get a quick idea of the different ways we could lose data, or that supposedly secure data could get into the wrong hands. (Yes, your developers are probably the wrong hands, but that's a different story.)

Homework: Look for Risky BusinessOdds are, nobody in the company has an inventory of the data we're storing in ridiculously unsafe ways. Here's a quick way to check your own database looking for dangerous fields:

SELECT * FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME LIKE '%password%'
OR COLUMN_NAME LIKE '%social%'
OR COLUMN_NAME LIKE '%credit%'

Feel free to change out those keywords for other terms that are relevant to your business - fields that are sensitive, and that would hurt if they got out into the wild. Then look at the contents of those tables - is the data being stored unencrypted? Who has access to it?

If we're storing unencrypted passwords in the database, for example, then every database backup we've ever done is dangerous. If anybody gets access to any backup file, like our offsite tape backups, then we could be on the front page of the news tomorrow.

When you find it, email the developers and the business contact for that database. Explain that the data is not encrypted, and use examples from the SQL Server Compliance Guide to show how this is a risk for the business.

Compliance: Knowing Who's Accessing WhatIt's unusual for me to assign homework first in the email, and then go on to talk about other things, but I want you to be aware of other things that companies usually do around their secure data.

Often management will say, "We need to audit everyone who changes or accesses secure data." Technically, SQL Server has features that can accomplish this goal - things like SQL Server Auditing. In practice, though, this is a problem for hard-core security teams because the very same DBA who manages the servers also has permissions to change the SQL Server audits. A greedy DBA could easily disable auditing, get the necessary data, and then enable it again.

With seriously ambitious work, you can lock auditing down so that can't happen, but ... it's really, really hard.

As a result, most seriously secure companies end up with a completely different auditing solution that lives outside of the DBA's realm. The security team buys third party hardware appliances like IBM Guardium or Imperva that act like a network firewall between everyone and the SQL Server. The appliance logs everything that happens in the SQL Server, and then the security team can review those logs without you (the DBA) knowing about it.

These solutions are typically six figures and up. They're expensive because they have to be absolutely bulletproof - if they fail, you can't have the access to the database suddenly stop. Plus, you'll need them in place for not just your production environment, but also your disaster recovery environment.

If you just want to check a box and make the auditors think you're secure, that's easy and cheap, but seriously good security is seriously expensive.

Let me know what you think of the SQL Server Compliance Guide. Enjoy, and see you next week! 


DBA Training Week 4 - Knowing Who Has Access

Missed last week's email? Read it here.

Who Can Get You Fired?Run our free sp_Blitz™ stored procedure on one of your production servers and pay particular attention to the Security section of the results. It lists the logins who have been granted the sysadmin or security admin roles.

Don't think of them as logins.

Think of them as people who can get you fired.

These people can drop databases, drop tables, change stored procedures, edit data, or even change SQL Server configuration settings like max server memory or maxdop. You're probably not getting alerted when any of these things change - we just can't afford to monitor every single thing in SQL Server and send alerts on it. At some point, we have to be able to trust certain logins, and that's where the sysadmin and security admin roles come in.

Except when we first get started learning database administration, it's usually because we're the only DBA in the shop, and the server is a mess. The front door is unlocked, the key is under the floor mat, and everybody knows we've got a big screen TV in the living room.

How to Get Started Locking Things DownBefore you start removing people's SA rights, be aware that there can be political backlash. In one shop, the head developer's SA rights were removed, and he stormed into the DBA's office the next morning screaming. Turns out one of his apps automatically created a processing database every night, did a bunch of work in it, and then dropped the database. Nobody knew because it was only around for 30-45 minutes. The problem could have been avoided by communicating the security changes ahead of time, and that's where we need to start.

Take the sp_Blitz™ output to your manager - just the security portions - and say something like this:

"Here's the list of people who can do anything on the production server - delete data, drop databases, or change performance settings. If they do, here's the list of applications that will be affected, including accounting and payroll. I don't want to take away all of their permissions - I just want to start by giving them full permissions over their database, but not in any other databases, and not at the server level. Can I talk to them about doing that?"

Note that we're only going to TALK to them, not actually do it, because we need to communicate with them first. Then, go to the end users or developers involved and say:

"We're starting to lock down the production server, but I want to make sure you have all the permissions you need. I'm going to make you a complete database owner inside your database so you can do anything you want in there, but I'm going to take away your rights to the other databases (like accounting and payroll), and I'm going to remove your permissions to change server-level settings like how much memory the server can use. I'm planning on doing it next weekend, and I want you to have my email and phone number so that if anything breaks on that date, you can call me and I can audit what's happening on the server to see if it's related to the permissions change."

When You Get ResistanceIf you get any pushback from developers or users, go to the project managers or business people who have a large stake in the database. For example, if the accounting database is on the server, go to the CFO and say:

"Here's the list of people who can take down the accounting system. They have the permissions to drop the database at any given time, and there's nothing I can do to stop it. I'd like to get that changed - can I schedule a short meeting with you and the development manager to get everyone on the same page?"

You want to turn it into a business problem, not a technology problem, and the CFO will very much be on your side. She can't afford to have her entire department go down just because some developer didn't put a WHERE clause on a T-SQL statement.

I Know, This Week Isn't FunDatabase administration isn't all bacon and roses. Sometimes it's boring politics and paperwork, and this is one of those weeks.

In Week 1: Where My Servers At?, we built a spreadsheet inventory of our servers, and now it's time to fill in a little more details. Since we're analyzing security, we need to know which applications live on each server, and who's in charge of each of those applications. You don't have to fill in the specifics of who has read or write permissions in each database, but we want to at least know the business purpose and the business contact.

The business contact is the one who really helps us get the database locked down because their job is on the line if this data is lost or unavailable. (Come to think of it, you're someone who can get THEM fired!) In the coming weeks, you'll be working more with them on reliability and performance, too, so now is a good time to start fleshing out that part of the spreadsheet. 

Brent says: Database administration is politics.

Sure, we like to think we're the police, here to protect and to serve, but most of what we do involves sitting in meetings, convincing people to do what we want, the way we want it.

It's made more challenging because we often don't have any real authority. Sometimes the DBAs report to completely different managers than the developers - and sometimes, it's even different companies! We might be administering a database server that houses a third-party application, and the vendor's support team demands to have SA access.

Consulting Lines: After doing consulting for a while, I wrote a series of blog posts explaining lines you can use as a DBA:
"Sounds like you've got it all under control."
"What happens if that doesn't work?"
"Would you mind driving?"
"SQL Server needs a dog."

Missed a week? Here's what we covered so far:


DBA Training Week 3 - Automating Restores

Doing the Fastest Database Restores
If you're doing transaction log backups, forget using the GUI. Even if you only have one transaction log backup per hour, it'll take you way too long to click through all the different files.

Think about what your backup folder might look like if we named our backup files by database, date, time, and a different extension per type of backup (BAK for fulls, DIF for differentials, and TRN for transaction logs):
  • MyDatabase_20130718_0000.bak
  • MyDatabase_20130718_0100.trn
  • MyDatabase_20130718_0200.trn
  • MyDatabase_20130718_0300.trn
  • MyDatabase_20130718_0400.trn
  • MyDatabase_20130718_0500.trn
  • MyDatabase_20130718_0600.dif
  • MyDatabase_20130718_0700.trn
  • MyDatabase_20130718_0800.trn
In that scenario, I took my full backup at midnight, then hourly transaction logs, with differential backups every 6 hours. (This is never a scenario I'd use in the real world, but it's the simplest way to get the point across in a tiny email. Hey, you try teaching tough concepts via email, buddy.)

If disaster strikes at 8:15AM and I lose the server, I need to restore the full, the most recent differential, and the transaction logs that follow the differential, like this:
  • MyDatabase_20130718_0000.bak
  • MyDatabase_20130718_0600.dif
  • MyDatabase_20130718_0700.trn
  • MyDatabase_20130718_0800.trn
That's a really easy script to write - and thankfully, MSSQLTips has already done it. Just change the variables for your database name and the path where your restore files live, and presto, the script will restore all of the files for you automatically.

This Script Has Implications For YouYou need to back up your databases intelligently. You want to put each database in its own folder, and you want the file names to have the database name and the time in them. I'm a big fan of that anyway - it makes life easier when I want to quickly scan and see what backups have been done.

It assumes the same database path when you restore. If you're the kind of DBA who likes to change drive letters and folders all the time, or you've got files scattered all over the place, and your production and development database servers have different drive letters, then life is going to be harder for you. There are other restore scripts that can adjust data and log file names at restore time, but they're not quite as elegant when it comes to restoring fulls, diffs, and t-logs.

You can do transaction log backups as often as you want. If the restore process is fully automated, why not do transaction log backups every minute? It doesn't cost you any extra. Rather than incurring extra overhead, it can actually incur LESS overhead. If you're only backing up logs once an hour, your users probably feel that hourly load big time when it kicks off. If you do smaller backups more frequently, they'll be less likely to notice the impact.

You can automate your own fire drill testing. Since this is done with plain old T-SQL, you can set up a SQL Agent job to restore last night's backups onto your staging server. I like setting up a DBA utility server chock full of really cheap (but really large) SATA drives, like 2-4TB drives in a RAID 5. I have a series of Agent jobs to restore a different server's backups every day, and then run DBCC jobs on them. This way I know I'm getting really good backups.

Your Homework This WeekYou don't have to go build this whole infrastructure out - but start laying the groundwork by making sure your backup files are named in a restore-friendly way.

Then, try out the MSSQLTips restore script to make sure you can quickly restore a database from scratch into a development or staging environment. If you get errors, leave comments on that blog post describing the error, and ideally, contribute your script improvements back in. Now, when disaster strikes, you won't be clicking around blindly in a GUI - you'll just open a proven script and hit Execute. Bam! 

Brent says: You can use this technique to build smaller copies of your production databases. For example, at StackOverflow, I built an ultra-compressed backup with even more tricks. On my nightly restore server, after restoring the production backups, I dropped the non-clustered indexes, rebuilt all objects with 100% fill factor, and shrank the data and log files down.

The end result was a database that was over 50% smaller!

I then backed it up with compression, and left those backups in a folder that the developers could access. That made it easier for developers to quickly grab a copy of production data as of last night, copy it to their workstation as quickly as possible, and restore it faster with less drive space required.

Log Shipping Training Video: Jes explains how log shipping is really just doing your restores ahead of time.

Backup Planning Video: Jeremiah explains 3 company scenarios and asks you to pick the right backup strategy for each.

Missed a week? Here's what we covered so far:


DBA Training Week 2 - Backups

The Only Reason We Back UpThe only reason we do backups is so we can do restores.
Brent here. When I first started out as a SQL Server DBA, I thought things were going well as long as the backup jobs ran successfully. I'd go into SQL Server Agent every now and then, make sure they were still running, and ... that was the end of it. I figured if disaster ever struck, I'd just do a restore. How hard could it be?

In theory, we test our backup strategy ahead of time with Kendra's 5 Simple Questions About Your Backups, and we've memorized the 9 Letters that Get DBAs Fired.
In practice, small disasters strike all the time. The most common reasons to do restores aren't to revive an entire server - it's just to get back a few small tables or an individual database. Somebody ran the wrong DELETE statement or dropped a database in production instead of development, and next thing you know, we're all scrambling. Let's think through a few things ahead of time to make the crisis easier.
Where to Do Restores
When you're restoring code (stored procedures, views, triggers, etc) or individual tables, don't restore onto the production server. I don't like touching production servers more than I have to, and let's face it - you're already having a bad enough day as it is. That's why you're doing a restore, remember? So let's do our work on a different server (like dev or QA) and leave production as it is. I've also written about restores in my ideal dev, test, and production environments.
After we've safely restored the right data onto another server, it's easy to copy that data across to other servers. For simplicity and security, you can set up a linked server on the production box with read-only access over to the restore server. Then, from production, you can run INSERT statements using a SELECT sourced from the linked server tables.
However, if you're restoring tables over 10GB, you'll probably want to do the restores directly on the production server to make the data copies faster. Just make sure you're extremely careful with the scripting and the database names - we don't want to restore over the top of your working production database.
This may require adding extra space to the production server. In one emergency, I freed up the necessary space by shrinking all of TempDB's data and log files down to just 1MB. TempDB was on fast drives, perfect for a one-time emergency restore, and that particular server didn't have any other activity happening due to the outage. We're not always so lucky, but it helps to think out of the box like that.
A word of warning: if referential integrity is involved, like if you're trying to restore tables that have relationships to other tables that you're NOT restoring, then you can be in for a world of hurt here. We're not going to cover that scenario - it really is different in each case.
Doing the Restore
Restore the most recent full backup using the WITH NORECOVERY option - this is really important. This leaves the database in a restoring state so that you can continue to apply additional backups to it. If you forget those two key words, your restore has to start over again from scratch, so please, for the love of all that's holy, double-check that option before you start the restore.
When I'm restoring code or config tables that haven't changed since the last full backups, I don't bother restoring any subsequent differential backups or transaction log backups. The goal is to finish the restore quickly.
Next, if differential backups are involved, restore the most recent differential WITH NORECOVERY. Differential backups are cumulative - you only have to restore the most recent one.
Next, restore all of the transaction log backups after the differential (or, if you don't have diffs, all of them after the full backup) - again, using WITH NORECOVERY.
Doing all of this with the GUI sucks. The more backups you have, the longer this takes, and the more likely you are to run into errors. Instead, what you need is a script that looks at all of the backups in a folder, plucks out the most recent relevant files, and restores them for you automatically, in order. We'll talk about automating restores in the next training module.

To learn more about backups and restores, our favorite getting-started articles are:
And if you want even more detail, check out our video classes:
This week, use coupon code BackItUp for $10 off either course.

Enjoy, and see you next week!

Brent says: I did my first emergency restore when I was working for a photo studio. I'd dropped out of college, and I took a job running their databases. Every morning, I got in bright and early to print out the list of labels for the high school graduates, weddings, babies, and so on that were going to be photographed that day. The photographers would pick up their stacks of labels for their film (FILM! remember that?) and head out into the field.

One morning, as part of my data cleanup process, I ran the DELETE statement. One minor problem - I'd forgotten to put in the WHERE clause, so I deleted all of the photo shoots, ever.

Thankfully, I was also the guy in charge of backups, so after I freaked out, I put last night's tape backups in and started the restores. When the boss came in to pick up his stack of labels, I was able to calmly explain what had happened and what I was doing to fix it.

Orville's temper was legendary, but I dodged a bullet and kept my job. Since then, whenever I handle a database, the very first question I ask is, "Where are the backups, and when was the last time they were tested?" Either the backups are tested, or you're about to be.

Missed a week? Here's what we covered so far:

1: Where My Servers At?