Few years ago when I started my career as a SQL Server DBA, I remember a 200-500 GB sized database was considered as a very large database. During that time maintaining and managing those databases were not a tedious task. But over the years the definition of a very large database has changed. Today we’re looking at database size in terabytes and petabytes.
With cloud computing, hardware requirements needed to sustain the growth of databases are now just one click away. Auto-scaling is now a blessing to companies, as they previously had to run through their budgets whenever there was a need to add resources to the database servers.
Because database sizes have grown over time, managing and maintaining them have become a pain. And when I say managing and maintaining a database, it means taking regular backups, performing index maintenance, integrity checks, etc. etc.
Most of the time we try to archive the old/cold data so that we can keep the database size in check. But sometimes there are cases where the scope to archive the database is very limited. This is especially in medical and financial sectors where old data is still used for various purposes.
Given the rate of database daily growth, I’m going to take you through some database management tasks to give you a better understanding of how you can keep up.
Consider an OLTP database that is active 24*6 and is around 10TB in size.
Taking daily full backup of databases whose size is 10TB can be a very demanding task, especially for an OLTP database. Even with better hardware, the backup time would be around five to seven hours. Therefore, having a proper backup strategy in place is as important as maintaining its availability.
It would be wise to consider having a weekly full backup with daily differential and hours transaction backup when looking at the time and cost of resources.
Using third party tools to backup the database is a better option. These tools not only help you in reducing the time taken to backup the database, but also reduces the size of the compressed backup.
Performing normal index maintenance tasks on a very large database is not the same as performaning on a regular size database. REBUILDING index on big tables having big indexes is very time consuming. This is something that also causes blocking on the servers which hampers the performance of other applications.
The only way to maintain the indexes on such a huge database is to REORGANIZE them. REBUILD index option can only be chosen during index corruption or when there is an absolute need to REBUILD a particular large index.
One important option that we need to consider for creating indexes on such a large database is to use is to specify WITH SORT_IN_TEMPDB in the statement. SORT_IN_TEMPDB forces the index build to occur in tempdb, and when it is complete, it is then written to its destination filegroup. This is an often-overlooked option that can provide huge reductions in index build times.
Another advantage of reorganizing the index is that we can stop the operation in the middle if the execution time overlaps business hours – and this will not cause the index to go in ROLLBACK state.
Database Consistency Check (DBCC)
DBCC command is used to check the consistency and integrity of the database. This command helps us to make sure that our databases are in a healthy state and if in case any issue/corruption occurs then this command helps to identify and fix the issue.
Executing the DBCC command is very resource-intensive. It causes constrain on both memory and disk. Running DBCC command on such a large database can be very risky because if the command does not execute in the allotted time frame and if we try to KILL the execution process it will go in ROLLBACK state. This is not only time consuming but also jeopardizes the consistency of the database. Hence running this command on a very large OLTP database is not a feasible option.
The speed of completion of this command completely depends on the Memory provided and the type of RAID used for hosting the tempdb database.
Other options can be used such as DBCC CHECKTABLE against individual tables or groups of tables on a rotating basis, DBCC CheckDB WITH PHYSICAL_ONLY. This option limits the processing to check the integrity of the physical structure of the page and record headers in addition to the consistency between the pages for the allocation structures (data and indexes).
The most recommended and best option is to have a STANDBY server, restore the backup of the production database on that server, and then run the DBCC command. If the consistency checks run ok on the standby database, the production database should be ok as it is the source of the standby. If the standby database reports corruption, then DBCCs or other tests for corruption can be run against the production database.
I hope this gives you a better understanding of the complexity and management options for very large databases in SQL Server.