INTRODUCTION
Below is an overview of database backups. OS or user workstation file backups are not considered. The below information may be useful for non-database backups, but databases (which are data focused), are a special case of backups. This blog is meant to assist in creating a process. Don’t expect a suggestion of exactly how to implement your specific database backup process.
WHY BACKUP
I’ve heard Enterprise IT people that should know better say that database backups are not important because we have a cluster with the primary database in one data center, and a hot standby in another data center. Without backups, there would be no recovery if a disaster took out booth data centers, no recovery if a release to production failed and lost data, no recovery if a malicious hacker wiped the data, no recovery if a bug happened to delete the wrong data… That run-on sentence could run-on for as far as the imagination can see.
My latest catch phrases regarding database backups are “HA is not a backup”… “DR is not a backup”… “Remember ransomware (which just happens to be recently making the national news. Insert any malware, the paradigm still works)”.
With all my 35+ years of IT experience, I will tell you that if it’s a computer, and your data is important to you, you need to manage a backup process that involves multiple backups and off-site copies of that backup, regardless of HA (High Availability), DR (Disaster Recovery), period.
ENGINE VERSIONS REFERENCED
Database engine versions referenced below are versions published by the vendor within the last 10 years or so (ie, explicitly excluding Microsoft Server 2000 and prior, excluding MySQL 4x and prior, excluding Oracle 9x and prior, excluding Informix 7x and prior, excluding DB2 8x and prior). Amazon RDS (MySQL clone via SaaS (Software as a Service)) automated backups and DB Snapshots are also excluded from the discussion below as this style of backup is very rigid.
DATABASE BACKUP STRATEGY
Please note, I’m not all-knowing. I’ll not be able to design a backup system that will meet your needs. There will be some remaining questions that only someone that is working day-to-day with your business can answer. Things like, “how long do we need to keep this data”, or “how many minutes of data loss are we willing to risk given the effort involved in backing up”. I do provide some examples that have worked for me. See below.
GENERAL BACKUP INFORMATION
There is much ambiguity in the IT industry regarding the definition of ‘Incremental’ & ‘Differential’ backups. The below seems to be the most common definition, and the one I use below. Backup Types…
- FULL Backup is a backup of all information (data, stored procedures, objects in general) in the database
- DIFFERENTIAL backup is a backup of the information that has changed since the last Full backup
- INCREMENTAL backup is a backup of the data that has changed since the last Full, Differential, or Incremental backup
- TRANSACTION LOG (TLOG) backup is a backup of the transactions in the transaction log since the last Transaction Log backup, Incremental, Differential, or Full backup.
- Informix refers to a FULL backup as a level 0 backup, DIFFERENTIAL as level 1 backups, etc.
Backup types…
- Logical – records SQL & DDL, and the order that the SQL is to be applied
- Physical – records database blocks
BACKUP INFORMATION SPECIFIC TO ENGINE
SQL Server is somewhat unique with these items…
- All backups in SQL Server are Physical and “On-line” (the database can be used while users are active).
- To back up Agent jobs and similar server (vs database) objects, backup the ‘msdb’ database
- To back up users, backup the ‘master’ database. (note, restoring ‘master’ must be done from single user mode)
- The Recovery Model is an option for each database. It indicates how the database handles transaction logs. The default for a new database is the value from the ‘model’ database. Recovery Models…
- SIMPLE Recovery Model indicates that Point In Time restores cannot be accomplished.
- FULL Recovery Model enables the Point In Time recovery.
- BULK LOGGED Recovery Model means that Point In Time recovery may be available if a load operation (select into, BCP, etc) has not been performed since the last Full/Differential backup. Frequently databases will move between this Recovery Type and Full Recovery as needed. This recovery type will be ignored below. If this Recovery Model is used, the assumption is that the database will be changed back to FULL Recovery and a FULL backup then will be taken.
LOCKS
SQL Server does not lock DML nor DDL while backup is running. DROP DATABASE does lock though, applicable when restoring. Some engines will have concerns with DML and/or DDL locking as described above (specifically MySQL & MariaDB). Be sure you understand any locking issues with your specific engine. Test & monitor to ‘tune’ backups at your site.
PERFORMANCE
BACKUPS, in general, are designed to be as quick, efficient, and as unobtrusive as possible. Unless your server is way underpowered, backups will have no (or a very minimal) impact (some estimate between 5% – 20%) on the system, and only for the duration of the backup (which may be less than a minute). The IO subsystem may be affected slightly more than the CPU or RAM, even with encryption. With compression, you will see a slight CPU increase, but even so, backups in general will cause very minimal CPU & IO impact. Note, some compression algorithms are SAN de-duplication friendly (will compress during backups, and will dedup on the SAN as if the data has not been compressed), but in general, don’t turn off backup compression because someone tells you that your SAN dedups. SAN administrators like to see their dedup rate as high as possible, without taking into consideration that backup compression has the advantage of significantly reducing the size of the data transferred to the SAN (frequently a major IO bottleneck).
RESTORES, on the other hand, may take longer than the backups as there are many components that may need to be brought together and checked once restored. An example of what will need to be brought together is the relationship of TLOG backups to FULLS / DIFFERENTIALS / INCREMENTALS, DIFFERENTIALS to FULLS, INCREMENTALS to FULLS / INCREMENTALS, and having all the related backups available and specified to the restore. This means that you’ll need to restore a Full backup, then a specific Incremental, and a specific set of Tlogs. Most of the engines maintain that relationship, but the files need to be made available when the engine needs them. Third party software can make that task easier. See below.
To reduce the impact on the IO subsystem, it’s best to put the backup storage on its own device. In the era of SAN, NAS, RAID, this is not always possible. If it’s possible to dedicate a device to the backups, moving the backup’s storage would reduce the IO on the database IO subsystem by up to 50%. If the database containers are on SAN, consider placing the backups on NAS. This is a minor point most of the time, but is something to consider… especially if becomes IO an issue.
When storage is network based (NAS), separating the IO is best accomplished by adding a NIC (Network Interface Card) to the involved systems and creating a separate VLAN (Virtual Local Area Network) just for backups.
BACKUP RETENTION
This is another one of the items that is site specific. it addresses the question: how long should the backups be available? This needs to be balanced with 1) how much storage is available, 2) how long will the current version of the database engine be able to read the backups, 3) how long will a problem go unnoticed, 4) users always want their backups kept forever, 5) cost of adding backup storage, 6) if on tape, how long will the tape drives be available to read the tapes, etc.
If backups are to be retained beyond approximately a few years, be sure to keep copies of the database engine installer, and the operating system installer for the operating system on which the db engine would run. Some engines will not let you restore from a version of the engine older than one major release. Some are more liberal at three major releases. For example, SQL Server 2000 backups cannot be restored to SQL Server 2012 or SQL Server 2014, but can be restored to SQL Server 2005, SQL Server 2008, or SQL Server 2008R2.
MONITOR & AUTOMATED RERUN
See ‘SOFTWARE’ below. This is noted here to highlight the need.
COMPRESS & ENCRYPT
This is a site specific consideration. Unless the server is severely underpowered with respect to CPU, and does not affect the IO subsystem, always compress database backups (see above). Seriously consider encrypting all database backups, especially if any of your data are subject to the following standards: HIPAA, PCI DSS, FISMA, GLBA, ISO 2701, FACTA, PIPEDA, ECHR, ECPA, etc.
CATEGORIES
Categorize your backups into a few site specific categories. For example: DATA WAREHOUSE, TRANSACTIONAL PROD, and NON-PROD. When categorizing, consider the following…
- Size of data as it affects backup times of FULL BACKUPS
- How often the data changes as it affects DIFFERENTIAL/INCREMENTAL/TLOG backup times
- How important is the data (main credit card processing system vs internal development version of the report writing system). This may affect retention, Recovery Model, etc.
OFF SITE
Consider always having a copy of your backups moved off-site after a more current backup has completed successfully.
Note, consider encrypting your backups prior to moving off site. See above.
BACKUPS
By definition, FULL backups back up the entire database. They serve as the basis for any RESTORE. After the FULL is taken, DIFFERENTIALS may be taken. Each DIFFERENTIAL backup is associated with one and only one FULL backup.
To use that DIFFERENTIAL backup to RESTORE, the FULL backup must first be restored, then the DIFFERENTIAL is optionally ‘applied’. Transaction Log (Tlog) backups that occur after the most recently restored FULL or DIFFERENTIAL backup may be used to roll the database forward to a specific Point In Time. A database schedule considers what’s involved in a RESTORE when determining a BACKUP schedule. See below.
Incremental vs Differential
The above assumes that just FULLS & DIFFERENTIAL were taken. If INCREMENTALS are also taken, the backup/restore process would look like this… After the FULL is taken, DIFFERENTIALS may be taken, then INCREMENTALS may be taken. Each INCREMENTAL backup is associated with either: one and only one FULL backup or the prior INCREMENTAL ‘chain’. To use the INCREMENTALS to restore, the FULL BACKUP must first be restored, the DiFFERENTIAL restored, and then each of the INCREMENTALS in the backup ‘chain’ would need to be restored in order.
BACKUP SCHEDULE
This is another site specific consideration. The team that is responsible for the data should, for each of the CATEGORIES, formulate a BALANCE of the following that should accommodate near 100% of the needs…
- Recovery Model
- Backup type (FULL, DIFFERENTIAL, INCREMENTAL, TLOG)
- Retention period
- Scheduled time when the backup should be run. It should be a time when write activity to the database is low, or just after a significant change.
- Effort involved in restoring vs resources available for the backup (ie, a FULL backup usually uses more resources than a DIFFERENTIAL).
There may be other considerations as to when to run backups. For example, one may wish to back up a data warehouse immediately after the monthly load. These ‘abnormally scheduled’ (one-off) database backups should be broken out into separate jobs that can be submitted or triggered by a successful load.
The schedule should accommodate near 100% of the needs. There may be a few database that have additional backup needs beyond those broad needs defined here. Those should be a separately scheduled job to backup the specific database or server. That job most likely will simply preform an additional Differential or a Tlog backup.
With a backup schedule, a sample may be worth a thousand words, or at least provide a suggestion. See below for a SQL Server example.
SAMPLE SCHEDULE
Here is a sample schedule for database backups.
Category=Production Transactional Important
Full=Weekly
Differential=Daily
Tlogs=10 mins*
Retain=weekly for three months, daily for two weeks, and Tlogs for five days
Category=Production Transactional Not-so-important
Full=Weekly
Differential=Daily
Tlogs=1 hour*
Retain=weekly for three months, daily for two weeks
Category=Non-Production Transactional
Full=2x/Month
Differential=None
Tlogs=none
Retain=2x/Month for two months
Category=Data Warehouses / Big Data
Full=Monthly
Differential=Weekly per ETL
Tlogs=none
Retain=Monthly for three months, weekly for five weeks
- Generally Monthly backups will be submitted the first Friday of the month at 7:00 PM and will have the weekend to run.
- Generally weekly backups will be submitted after 7:00 PM Friday and will have the weekend to run (exclude the 1st Fri if conflict).
- Generally Daily backups will be submitted Mon – Thu after 7:00 PM.
- Generally Tlog backups will be submitted on TBD schedule, as required to fulfill SLA, not to exceed 2 hours.
* with Tlog (Transaction Log) backups, the database will use Full Recovery Model. All other databases will use Simple Recovery Model
TEST
Backups should be tested by restoring a database from each category to another server. The test should be implemented any time the database engine is upgraded, the backup/restore program is upgraded, or other major changes are made to the backup software. At a minimum, test annually.
SOFTWARE
In the above, there was no mention of software to assist the backup process. That evaluation is left as an excessive for the reader. Backup software will implement and manage many of the decisions made using the above as a guide. Some things to contemplate…
Backup software should not only handle the backup of databases (the easy part), but be able to automatically…
- schedule backups
- verify backups to insure an exact copy was obtained or that an exact copy was restored
- compress and/or encrypt backups prior to hitting the wire, and the converse when restoring
- resubmit a failed backup automatically
- parallelize backups (ie, mysqldumper)
- report backup failures (via SNMP, email, etc)
- manage backup dependencies (tlog related to incremental, which is related to the full)
- provide GUI/Wizards for managing backups & restores, interactive help & other prompts/information
- central administration of backups & restores
- ability to kill connections to database prior to backups or restores (especially restores)
- work with different medias, tape, cloud, SAN, NAS, local disk, etc
- be HA aware
and most importantly, be able to effectively assist in restoring all or part of a database!
A prior employer had software that, at the time, was not so easy to use to restore a database. It frequently could not find backup sets required for restores. There was a running joke… “they don’t call it NetRestore for a reason.”
REFERENCE
SAN Deduplication, http://www.brentozar.com/archive/2009/11/why-dedupe-is-a-bad-idea-for-sql-server-backups/
Troy Frericks.
blog 14-Oct-2015
=
Copyright 2015 by Troy Frericks, http://dba.frericks.us/.
#