I’m on my way to the PASS Business Analytics conference today. If you are attending, and you are reading this blog post, come find me and Rob Collie (blog | @powerpivotpro) during the welcome reception on Wednesday night; we’d love to talk data.
Speaking of data…I was finishing up a session at the Colorado Springs SQL Saturday recently when I was struck with an idea. The session is “Database Design: Size DOES Matter” and it’s been well received over the past year (probably because there are so many awful database designs out there). The talk focuses on datatypes, how to recognize when they are a problem, and what to do to mitigate the performance issues.
My thought was brought about by the question I ask during the session: “How many rows do you want to fit on a page?” The answer is “all of them”, but I also accept “as many as possible”. Since SQL Server (and most all database platforms) will read data pages from disk into a memory buffer, the more rows you can store on one page results in better performance. If you have a query that returns 100 rows, but you can only store one row on a page, then you will need to pull 100 pages from disk in order to return the 100 rows. If you can fit the same 100 rows onto one page then your query will perform faster.
So, wouldn’t it be cool if you could quickly look into a database to see which objects were close to storing one row per page?
Turns out we can! I have a script here that works for SQL2012 and later. This script will return the name of the table, the name of the index, the number of pages, the number of rows, and the ratio of rows to pages. It then orders the results by the objects storing the fewest number of rows on a page, giving you the opportunity to focus your tuning efforts on those objects first.
Here’s the quick script for you. And yes, you *could* consider this my own “Rowsie Project“.
Script disclaimer, for people who need to be told this sort of thing:
DISCLAIMER: Do not run code you find on the internet in your production environment without testing it first. Do not use this code if your vision becomes blurred. Seek medical attention if this code runs longer than four hours. On rare occasions this code has been known to cause one or more of the following: nausea, headaches, high blood pressure, popcorn cravings, and the impulse to reformat tabs into spaces. If this code causes your servers to smoke, seek shelter. Do not taunt this code.
/*=============================================
File: SQL_Server_check_rows_and_pages.sql
Author: Thomas LaRock, https://thomaslarock.com/contact-me/
Size Matters: Table Rows and Database Data Pages
Summary: This script will check the number of rows stored on a database data page. The more rows on a page, the better. If you are storing few rows per page then you will want to check the design, specifically the column datatypes chosen. Date: May 5th, 2014 SQL Server Versions: SQL2012, SQL2014 You may alter this code for your own purposes. You may republish altered code as long as you give due credit. THIS CODE AND INFORMATION IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. =============================================*/ SELECT st.name, si.name, si.dpages, sp.rows , 1.0*sp.rows/si.dpages AS [Ratio] FROM sys.sysindexes si INNER JOIN sys.partitions sp ON si.id = sp.object_id INNER JOIN sys.tables st on si.id = st.object_id WHERE si.dpages > 1 --objects that have used more than one page AND st.type = 'U' --user tables only AND si.indid = sp.index_id AND si.rows > 1000 --objects with more than 1000 rows ORDER BY [Ratio] ASC
I’m thinking the next logical step for something like this is to then find which of these objects are currently in memory. While it is good to know which parts of the schema need adjusting, it is also useful to know which ones are being accessed frequently. But I’ll leave that exercise for another blog post.
See you in San Jose!
My. Hero.
Um, it’s just a script, but if that gets me “hero” status, I’ll take it! Thanks!
What if I only need to return one row?
Well, if you only need one row, then you likely only need one page. But my script is meant to raise awareness for objects that potentially have questionable design choices.
You mention 2012/14 – script appears to work on 2005 and 2008R2 as well (or am I missing somehing?)
A t?
Oddball,
I didn’t test against those instances, only 2012 and 2014, so that’s why I listed those. But I’m glad to hear that they are working against prior versions, too!
Thanks!
I have a vendor db with a table that has 4 rows per page, what do?
Andrew,
Well, I offer a few solution options in the presentation I have been giving, you can see the slides here: http://thomaslarock.com/2013/03/slides-from-reno-sql-server-user-group-available/
But for a vendor DB, there may not be anything you can do, you should contact them first before making any changes.
Thanks for the comment!
This doesn’t seem to handle partitions well. The idea is rather interesting.
Bill,
Hmmm…I guess I can see that for partitioned tables this may not be ideal. I can play around and come up with an updated version.
But yeah…I found the idea of this to be an interesting way to help someone focus on specific objects in their design that might need some improvement.
Thanks for the comment!
How about something like this? It won’t run as quick as yours and is even more of a “please don’t run in production” type of thing. I think it pulls most of the same data together. if you remove the comment out, it adds in a few useful fields. Do you see any consistency issues with this?
select sch.name as SchemaName, tbl.name as TableName, idx.name as IndexName,
ips.page_count as Pages, 1.0*record_count/page_count as Ratio , ips.record_count as RecordCount,
ips.avg_record_size_in_bytes, ips.max_record_size_in_bytes
–,ips.index_type_desc, ips.alloc_unit_type_desc, idx.fill_factor, fg.name as FileGroup, ips.partition_number, compressed_page_count
from sys.dm_db_index_physical_stats (db_id(),null,null,null,’detailed’) ips
inner join sys.indexes as idx on idx.object_id = ips.object_id
and idx.index_id = ips.index_id
inner join sys.tables as tbl on tbl.object_id = ips.object_id
inner join sys.schemas as sch on sch.schema_id = tbl.schema_id
inner join sys.filegroups as fg on fg.data_space_id = idx.data_space_id
where page_count > 0
and record_count > 1000
and index_level = 0
Thanks for the script Bill, I’ll see about testing it here locally when I have a chance.
Glad to run across this script. One of my table has 2 rows per page. But then one of the column stored is XML. So I guess, I cannot do anything from this information?
Well, you could inquire about how/why XML is being used. But unless you suspect performance problems as a result of the use of XML here, you aren’t likely to be able to change anything.