Years ago I left school after getting my graduate degree in Mathematics and got myself a real job, you know the one that had things like a pension and health insurance. Since it was only the cost of a co-pay I decided to visit a doctor to have a physical done because, well, isn’t that a good idea? When the doctor asked why I was there I told him that I thought to have a routine physical was just the right thing to do.
He gave me the physical and we talked over the results. Then he asked a very simple question:
“Tom, do you use a seatbelt?”
I thought the question odd. In our state (Massachusetts) the use of seatbelts was not made mandatory until after I had already been driving for many years. Wearing a seatbelt was not a habit for me. I simply said “no”.
He smiled a bit. Not a laughing kind of smile. More like a smile when you can’t understand why someone would be so stupid. It’s the same smile I give to folks who call me and say they need to recover all their data but don’t have any backups.
He then said, “You are young and in great health. At your age, the number one cause of death is a car accident, most likely from failure to wear a seatbelt. You should consider making that a habit and looking forward to a long life ahead.”
I started wearing my seatbelt that day.
Now, let’s pretend I was your “database doctor” and had a look at your database servers. I bet I can ask you a simple question: “Are you using filegroups for your databases?”
I bet the answer is “no”. That’s because filegroups are like seatbelts. They are right there, easy to use, and most people ignore them.
Filegroups help you in so many ways and are easy to use but hardly anyone is using them. I rarely come across a customer or client that has deployed filegroups. When I ask about them the response is usually “yeah, we know about them, but we don’t see why they are useful.”
Just like a seatbelt, you don’t know why it is useful until you need it, and then you’ll use it all the time.
Buckle Up: Using Filegroups For Safety
SQL Server allows for what is called “piecemeal restores”. The name is fairly descriptive; a piecemeal restore means you are able to restore pieces of your database. It is not necessary to have your entire database online and available. You have the ability to have only a portion of your database available for user connections. There is just one catch, however: you need to be using more than one filegroup.
Using multiple filegroups allows for a lot of benefits:
1. Separating out internal system data from user data
2. Placing larger tables/indexes on their own filegroup and/or dedicated set of disks for performance
3. Placing archive (or even read-only) data onto their own filegroups and dedicated set of disks to reduce maintenance overhead
4. Ability to bring parts of the database online quickly, no need to wait for a full restore
I recently had a client discuss their desire to reduce their database backup times. They had a 10TB database that was taking about 12 hours to do a full backup. The first thing I asked about was their use of filegroups along with their current archive strategy. About 80% of their data was data that would not be changing (or should not be changing).
By moving that data onto a dedicated filegroup they would reduce their backup time to less than two hours because they could stop taking weekly backups of data that was never changing. They could also mark the entire filegroup as read-only. This all would result in less time for backups as well as general maintenance.
Creating New Databases With Multiple Filegroups Defined
Besides my doctor reminding me about the use of seatbelts I had another reason to start using them: it’s the law. That got me thinking about filegroups and how they aren’t mandatory.
For example, you can create a new database in SQL Server Management Studio (SSMS) using this syntax:
CREATE DATABASE foo
GO
You end up with a database created that has one database file, tied to one filegroup. That filegroup serves a dual purpose as both the default filegroup (so, all new objects get created as part of that filegroup) and the primary filegroup (where all the system objects are stored). Just as you can drive your car while not wearing your seatbelt you can create a database while not utilizing multiple filegroups.
I can understand that Microsoft would not require the use of additional filegroups. However, I would prefer that when creating a new database they prompted you with a syntax similar to the following:
CREATE DATABASE [foo]
ON PRIMARY
( NAME = N'foo' , FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.JAMBON\MSSQL\DATA\foo.mdf'),
FILEGROUP [USER]
( NAME = N'foo_user' , FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.JAMBON\MSSQL\DATA\foo_user.ndf')
LOG ON
( NAME = N'foo_log' , FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.JAMBON\MSSQL\DATA\foo_log.ldf')
GO
USE [foo]
GO
IF NOT EXISTS
( SELECT name
FROM sys.filegroups
WHERE is_default = 1 AND name = N'USER' )
ALTER DATABASE [foo] MODIFY FILEGROUP [USER] DEFAULT
GO
So, any new table or object created will get assigned to the USER filegroup, while the system objects will remain tied to the PRIMARY filegroup. using this syntax by default would be a great step forward towards helping administrators to understand and leverage the use of multiple filegroups.
Additional filegroups does result in some additional administrative overhead. For example, if you are using filegroups you will likely want to utilize file backups as part of your regular database backup strategy. You can read more about backing up database files at this MSDN entry, which also has a nice graphic to help you visualize this concept of database file backups.
Summary
The short of it is this: filegroups will save your ass, but only if you know enough to use them in advance.
I’d say that filegroups are one aspect of SQL Server that has nothing but upside. They are easy enough to implement, they allow for you to have greater flexibility when it comes to recovering data, and have a minimal amount of administrative overhead. They may not save your life, but there is a good chance they could at least save your job.
And they won’t wrinkle your clothes or chafe your neckline.
So why aren’t you using them?
Great post Tom. I really do think that Microsoft should make a much bigger thing about this and have two filegroups ro start with. They should also physically stop you from creating anything in the Primary filegroup.
Some people’s arguement for not using multiple filegroups is that there are no tangible performance benefits. As you pointed out it depends on the IO structure. Personally I do it for the flexibiliy alone.
Oh, that’s a GREAT idea actually, to not allow any user info on the PRIMARY filegroup unless you went out of your way to enable access or something. I wonder if we could get that into the product somehow.
I’d love to see it in there, however MS have a “policy of least surprise” so I doubt they would be willing to make a change to such a fundamental part of the database creation process.
I guess one way to move things forward would be to organise a hangout about the suggestion. If we were to raise awareness through all the usual channels then I’m sure we could put together a well argued case for making the change. It would also provide a good user base to vote for it on Connect.
What do you think?
I don’t like that idea…i LOVE that idea!
I wonder if we could tie it into Connect somehow? Perhaps talk about things we would like to see added/removed, mention some Connect items, and just babble for 30 minutes or so each week?
I’m loving the idea of a “Connect Community” 🙂
It would be great if someone could pull some strings and manage to have a dev team member or two on the call to bounce possible ideas off to see how feasible they are.
I can ask around!
Do you gentlemen think it would be a better solution to put the DB boot page, page 9, etc. on some newly named protected SYSTEM filegroup by default and keep ‘primary’ the same name?
This way code doesn’t have to change and the impact to most end users will be seamless. This is a great idea though, much better than checking it in Policy Manager across thousands of DBs.
Nice easy write up I learned that I should continue using my seat belt. Oh wait, yeah and use filegroups. Actually, we already use these at my job because we have partitioned data by year. However, we continue to use FULL backups because they don’t take that long (1-2 max for all of our databases). Would you still recommend doing backups according to filegroups?
It depends on if your needs are. If you believe that you may want/need to have partial database availability then you should consider doing FG backups. If you are able to recover your entire database with a FULL backup within a defined SLA, then don’t worry about doing FG backups.
There is one use for FG backups that I just thought of. If we would like to move only part of our database to a staging environment, we could use the partial restore so that we don’t have to bring the entire thing over. That way we could just bring the FG with newer data to test production data out in staging if needed.
Our SLAs for our databases with partitioned tables are quite loose and it would take a couple of hours to restore all the DBs if they crashed. Probably would not need to do it regularly. Never-the-less this is some good information, thank you!
I tried the whole pull a subset of Production data to Staging and after some tooth pulling it worked. You’ve inspired me to blog about it. Coming soon! Thanks again.
Glad to hear it! I look forward to reading that post!
Great post Tom!!! I have a dumb question. How would you recommend using file groups to split up large tables on an already existing database? Or does this work best with new DB’s?
Thanks,
Greg,
I can’t think of a specific formula to apply for all existing databases. I’d advocate that you look at the largest tables first and see which ones might be the most static, and think about breaking those out onto a dedicated FG first.
I use Red Gate SQLPrompt and that allows you to set up so that the intellisense translates the create table command into a template as per your example. I think (need to check to make sure) that you can override the templates within SSMS anyway.
I didn’t think about overriding a template, but that would only help myself. I’d really like a solution that would help all end users, and that solution would need to come from the makers of SSMS.
Hi! Thanks for this article. I have a little question : is there a reason for a DBA (for a non-system DB) to create one (or more) secondary filegroups and let user only fill those filegroups, letting the PRIMARY always for system data ?
Jefferson,
Sure! There are a lot of reasons as to why you might want to do this. It can help with recovery as well as performance. I think a better question to answer may be “why wouldn’t I want this”. It could be the case that the performance and recovery gains aren’t worth the administrative overhead. It really depends on your shop and your needs.