Friday 14 August 2015

DBA Training Week 7 - Managing Space Inside Databases




Pages: What SQL Puts in Files

Inside each of your database data files (MDFs), SQL Server stores your data in 8KB pages. That's kilobytes - not megabytes, not gigabytes, but just mere kilobytes.

Say that we create a table:

CREATE TABLE dbo.Employees
(EmployeeID INT IDENTITY(1,1),
 EmployeeName VARCHAR(200))

First off, yes, I understand, I shouldn't do EmployeeName in one field - I should have FirstName, LastName, MiddleName, Suffix, Prefix, yadda yadda yadda, but I'm trying to keep this email short. Now see what you did? It's long again. Doggone it, it's hard teaching this stuff in an email.

Anyhoo, in this table, each record takes up just a little bit of space. EmployeeID is an INTeger, which takes 4 bytes. It's the same 4 bytes whether that number is 1 or 1,000,000. EmployeeName is a VARCHAR(200), which means we can store up to 200 characters in here, and each character takes a byte. If we insert 'BRENT OZAR', that's 10 characters (and boy, am I a character), so we need 10 bytes to store it.

If all of our employees average about 10 characters in their name, that means we could fit about 500-600 records per 8KB database page. (In reality, there's some overhead because SQL Server also needs to use some parts of the page to store metadata, and we'll talk about that later in the training.)

Brent Ozar Unlimited is a small company, so we can keep all of our employees on a single 8KB page. As we insert, update, and delete employees, SQL Server fetches that 8KB page off disk, brings it into memory, makes the necessary changes, and then writes that data page back to disk. The 8KB page itself is the smallest unit that SQL Server will cache - it doesn't cache individual rows/records - and each page belongs exclusively to just one object.

A Word About Objects

You'll notice that I avoid using the word "table". Tables are cool, but as we start to dig into what SQL Server's doing under the hood, we want to start thinking about these three object types:

Heap - a table with no clustered index. In my dbo.Employees table, I didn't specify in what order SQL Server should store my data, so it's just going to slap the data down on my 8KB page in any old order.

Clustered Index - what we normally think of as a table. If I'd have created my table like this:

CREATE TABLE dbo.Employees
(EmployeeID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
 EmployeeName VARCHAR(200))

Then SQL Server would store my data in order of EmployeeID. That way, when I search for EmployeeID #42, it can go directly to that number without scanning through all of my employees. The clustered index is sorted in the order of the EmployeeID field, but it's actually the full copy of our table, including all of our fields - in this case, just EmployeeName.

Nonclustered index - If I tell SQL Server to:

CREATE NONCLUSTERED INDEX IX_Name
ON dbo.Employees(EmployeeName)

Then SQL Server will create a second copy of my table sorted by EmployeeName. This copy of my table will only include the fields specified in my index (EmployeeName), plus whatever fields it needs to get back to the clustered index (in this case, my clustering key, EmployeeID).

All three of these objects - heaps, clustered indexes, and nonclustered indexes - will be stored on separate sets of pages. We won't have the clustered index and nonclustered index for the same table on the same page - they're split. That's why when we're doing space analysis, we have to think in terms of indexes, not tables. To learn more index basics, read Jes Borland's SQL Server Index Terms.

Pages & Rows on Disk

The sys.dm_db_index_physical stats Dynamic Management Function (DMF) returns the number of rows and pages stored in each database object. It takes parameters for database ID and object ID, or you can pass in NULLs to get information across all of your database objects. Scroll down to the examples link in that Books Online page, and you'll get queries to examine a single table - I'd strongly recommend starting with a small table, because some of the parameters for this DMF will cause SQL Server to actually look at every page in the object. That means if all of the pages for that object aren't cached in memory, SQL Server will go pull those pages off disk, and that can slow down your running SQL Server.

This DMF also includes average record size and max record size. This makes for fun spelunking: how big is each record, really? Just because we make everything a VARCHAR(8000) doesn't mean we're actually storing 8,000 characters in each field. Now, don't go changing your database structure just yet - you can easily break applications when datatypes change. Let's leave that for later.

You can get similar metadata much faster by using sp_spaceused, but it doesn't get the cool info about average record size, and I wanted to encourage you to go spelunking here.

Learning More About PagesIn my 90-minute session How to Think Like the Engine, I explain pages, indexes, joins, SARGability, and more. I use real database pages from the StackOverflow.com database for demos, and you actually get PDFs to print out and follow along as we go.

Your Homework

Let's start thinking about what objects are taking up space in our databases. Check out our free sp_BlitzIndex™ stored procedure that analyzes the indexes in your database from a psychologist's point of view. Is your database a hoarder, clutching on to a bunch of nonclustered indexes that aren't getting used, and are just taking up space? The details columns in the results will show how big each index is, and whether it's getting used or not.

This is important because the more indexes you have:
  • The longer your backups take
  • The longer index rebuilds take
  • The more memory you need to cache everything
  • And most importantly, the slower your inserts/updates/deletes go, because SQL Server has to maintain more copies of your table
As we start to move up the Hierarchy of Needs from capacity into performance, you'll start to see how these foundational items are so important. See you next week! 

Brent says: Wanna see what a page looks like? Check out the DBCC PAGE command. You pass in a database name, file number, and page number, and SQL Server will return the nearly-raw contents of that page along with diagnostic information. It's a fun way to get a peek under the hood.

This week's topic is a good example of how knowing the basics of database internals can come in handy when you step back and think about database server performance. You don't have to use DBCC PAGE as part of your job as a DBA, but just knowing what an 8KB page is helps you understand the output of various Dynamic Management Views (DMVs) when they report back units in pages.

For more, watch How to Think Like The Engine. Use coupon code Engine19 and it's just $19 this week.

Free related videos:

How Does SQL Server Store Data?
Back to Index Basics
Heaps in SQL Server

How to Tune Indexes
5 Hours, $299

Want to become an index tuning expert? Kendra Little shows you how. Use coupon code InsideIndexes and get $100 off.


No comments:

Post a Comment