Pages: What SQL Puts in Files
Inside each of your database data files (MDFs), SQL Server stores your data in 8KB pages. That's kilobytes - not megabytes, not gigabytes, but just mere kilobytes.
Say that we create a table:
CREATE TABLE dbo.Employees
(EmployeeID INT IDENTITY(1,1),
EmployeeName VARCHAR(200))
First off, yes, I understand, I shouldn't do EmployeeName in one field - I should have FirstName, LastName, MiddleName, Suffix, Prefix, yadda yadda yadda, but I'm trying to keep this email short. Now see what you did? It's long again. Doggone it, it's hard teaching this stuff in an email.
Anyhoo, in this table, each record takes up just a little bit of space. EmployeeID is an INTeger, which takes 4 bytes. It's the same 4 bytes whether that number is 1 or 1,000,000. EmployeeName is a VARCHAR(200), which means we can store up to 200 characters in here, and each character takes a byte. If we insert 'BRENT OZAR', that's 10 characters (and boy, am I a character), so we need 10 bytes to store it.
If all of our employees average about 10 characters in their name, that means we could fit about 500-600 records per 8KB database page. (In reality, there's some overhead because SQL Server also needs to use some parts of the page to store metadata, and we'll talk about that later in the training.)
Brent Ozar Unlimited is a small company, so we can keep all of our employees on a single 8KB page. As we insert, update, and delete employees, SQL Server fetches that 8KB page off disk, brings it into memory, makes the necessary changes, and then writes that data page back to disk. The 8KB page itself is the smallest unit that SQL Server will cache - it doesn't cache individual rows/records - and each page belongs exclusively to just one object.
A Word About Objects
You'll notice that I avoid using the word "table". Tables are cool, but as we start to dig into what SQL Server's doing under the hood, we want to start thinking about these three object types:
Heap - a table with no clustered index. In my dbo.Employees table, I didn't specify in what order SQL Server should store my data, so it's just going to slap the data down on my 8KB page in any old order.
Clustered Index - what we normally think of as a table. If I'd have created my table like this:
CREATE TABLE dbo.Employees
(EmployeeID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
EmployeeName VARCHAR(200))
Then SQL Server would store my data in order of EmployeeID. That way, when I search for EmployeeID #42, it can go directly to that number without scanning through all of my employees. The clustered index is sorted in the order of the EmployeeID field, but it's actually the full copy of our table, including all of our fields - in this case, just EmployeeName.
Nonclustered index - If I tell SQL Server to:
CREATE NONCLUSTERED INDEX IX_Name
ON dbo.Employees(EmployeeName)
Then SQL Server will create a second copy of my table sorted by EmployeeName. This copy of my table will only include the fields specified in my index (EmployeeName), plus whatever fields it needs to get back to the clustered index (in this case, my clustering key, EmployeeID).
All three of these objects - heaps, clustered indexes, and nonclustered indexes - will be stored on separate sets of pages. We won't have the clustered index and nonclustered index for the same table on the same page - they're split. That's why when we're doing space analysis, we have to think in terms of indexes, not tables. To learn more index basics, read
Jes Borland's SQL Server Index Terms.
Pages & Rows on Disk
The sys.dm_db_index_physical stats Dynamic Management Function (DMF) returns the number of rows and pages stored in each database object. It takes parameters for database ID and object ID, or you can pass in NULLs to get information across all of your database objects. Scroll down to the examples link in that Books Online page, and you'll get queries to examine a single table - I'd strongly recommend starting with a small table, because some of the parameters for this DMF will cause SQL Server to actually look at every page in the object. That means if all of the pages for that object aren't cached in memory, SQL Server will go pull those pages off disk, and that can slow down your running SQL Server.
This DMF also includes average record size and max record size. This makes for fun spelunking: how big is each record, really? Just because we make everything a VARCHAR(8000) doesn't mean we're actually storing 8,000 characters in each field. Now, don't go changing your database structure just yet - you can easily break applications when datatypes change. Let's leave that for later.
You can get similar metadata much faster by using sp_spaceused, but it doesn't get the cool info about average record size, and I wanted to encourage you to go spelunking here.
Learning More About PagesIn my 90-minute session
How to Think Like the Engine, I explain pages, indexes, joins, SARGability, and more. I use real database pages from the StackOverflow.com database for demos, and you actually get PDFs to print out and follow along as we go.
Your Homework
Let's start thinking about what objects are taking up space in our databases. Check out our free
sp_BlitzIndex™ stored procedure that analyzes the indexes in your database from a psychologist's point of view. Is your database a hoarder, clutching on to a bunch of nonclustered indexes that aren't getting used, and are just taking up space? The details columns in the results will show how big each index is, and whether it's getting used or not.
This is important because the more indexes you have:
- The longer your backups take
- The longer index rebuilds take
- The more memory you need to cache everything
- And most importantly, the slower your inserts/updates/deletes go, because SQL Server has to maintain more copies of your table
As we start to move up the Hierarchy of Needs from capacity into performance, you'll start to see how these foundational items are so important. See you next week!
No comments:
Post a Comment