Protecting Your Unstructured Data: NAS Backup Best Practices
Unstructured data and file repositories are common in most enterprise environments. As organizations try to reduce licensing costs and maintenance overhead, Network Attached Storage (NAS) is still commonplace in many modern enterprises. The ability to store and host data directly on NAS without having to manage OS patches and other lifecycle operations is very attractive to many IT departments. Even though it isn’t suited for all workloads, unstructured data like Microsoft Office documents, JPGs, MP3s, etc. is one of the most common use cases for NAS backup.
Centralized repositories can make end-users’ lives easier by having a single location where they can store and share their data. However, for backup administrators, protecting that data brings its own unique set of challenges. As data generation rates steadily increase year over year around the world, so does the volume of data that needs to be reliably protected. Designing a backup solution that can quickly and reliably safeguard that data is just one part of the challenge. Long-term retention, fast and efficient restores and avoiding vendor lock-ins are all tasks that require your careful attention.
Selecting the right backup software is just one piece of the puzzle. In this post, we’ll dive into some of the standard best practices that enterprises can apply to provide a sound level of data protection and the confidence you need in a disaster recovery (DR) event.
Understand the Foundational Options That Are Available to You
When it comes to backing up enterprise-grade NAS solutions, there are typically three high-level options available to help protect your data:
- Network Data Management Protocol (NDMP)
- Solutions provided by the storage vendor
- Manual or automated file copy
Each of these options have their pros and cons, but you should still understand how they work and when they might be a good fit.
Network Data Management Protocol (NDMP)
NDMP was first introduced in the mid-1990s to provide a streamlined way for administrators to backup their NAS data. Before its introduction, backing up NAS data was often a manual and laborious process that was prone to errors. Since then, the NDMP protocol has become an open standard, which allows for commonality across vendors and solutions. In addition to its broad support, many vendors offer optimizations for their implementations.
NDMP works in a client-server model and is typically deployed in one of these three fashions:
- One-way (i.e., direct): The backup data is sent directly from the volume on NAS to a backup device like a tape drive, which is directly connected to NAS.
- Two-way: The backup data is sent directly from NAS to an external backup device (like a tape drive) without having to go through the backup server. This model requires two-way communication between NAS and the backup device. The advantage of this model is that it reduces the load on your backup server and can improve the speed of the whole backup process. One downside to this model is that you have to have a physical backup device attached to the NAS in question. This does not scale well.
- Three-way model: In this model, backup data flows from NAS to the backup server, which then sends that data to the backup device. This model requires three-way communication between NAS, the backup server and the backup device. Some advantages of this model are that it allows for better control over the backup process and greater scalability. However, this option may also introduce additional load on your network.
Currently, an overall major challenge with this technology is efficiency. NDMP was originally designed to only use a single stream when backing up NAS. This resulted in significant amounts of time needed to backup data, particularly as datasets grew in size. On top of this, metadata was kept in an index that contained information like file names, sizes, creation or modification dates. Although it’s required to simplify backup and restore operations, the index itself also consumes storage and network resources. Moreover, implementations that strictly adhere to protocol specifications require a full backup to be performed after nine incremental backups, which can lead to frequently extend backup windows. All of the above can quickly lead to significant overhead from backup storage and network utilization perspectives.
Another challenge that pertains to NDMP is platform lock-in. Commonly, NDMP backups are locked into the platform in which they were created. This means that, if you are using NDMP on NAS from Vendor X, you likely will not be able to easily restore those backups to NAS from Vendor Y, if at all. This can become a significant limitation, particularly when you’re dealing with hardware failures or other DR-related scenarios.
Hardware Platform-based Solutions
Storage vendors typically offer their own data protection options as well. These can range from customized NDMP implementations to snapshot replication or highly available systems. At the heart of everything, data recoverability is the core reason for backing up data. When implementing storage-based solutions, it’s imperative that you ask the following questions:
- Will the solution meet your required recovery point objectives (RPOs) or recovery time objectives (RTOs)?
- Does the solution offer the granularity that you need? Can you restore individual files and permissions, or do entire volumes need to be restored?
- Where are your backups being stored? Does it require a like-for-like storage appliance that can double your budget, or can it be stored in a separate repository?
These solutions can offer low RPOs and RTOs by leveraging volume-based snapshots, and they can often copy these snapshots onto different arrays for added protection. Although this is a step in the right direction from a risk management perspective, it’s still far from a bulletproof strategy. If a file or volume becomes corrupt on the primary NAS appliance, that corruption may be copied to a secondary location as well. This likely would not be caught without proper verification until the data is actually needed, which may be too late.
Additionally, snapshot replication typically hits contention with IT departments when it comes to its required infrastructure. If a similar or identical model of NAS is needed for you to use snapshot replication, then the storage budget is immediately doubled. This doesn’t even account for the required licensing fees. Furthermore, this situation assumes that your retention requirements can be met with two NAS appliances that have the same configuration, which won’t be the case for a long-term solution.
Lastly, if your backups can only exist on vendor-specific platforms, you are now in a vendor lock-in situation. If primary NAS fails, you might find yourself waiting for new equipment to arrive before you can actually conduct a restore.
The last commonly available option is some variation of a file backup. This could be as simple as a backup administrator copying files to a landing zone and moving the data to tape. Unfortunately, this comes with numerous downsides, including error-prone environments, little to no versioning, no data reduction and more. Where this option shines is when some intelligence can be built into this process.
Multiple software solutions are available today, and they incorporate features like changed block tracking (CBT), flexible backup targets and long-term retention options. Vendor-agnostic storage can also bring considerable benefits since it allows customers to design solutions to fit their architecture and budgets. Moreover, most of these options also offer optimizations that can enable backup jobs to run quicker when compared to traditional NDMP jobs.
Some of the more efficient solutions available leverage storage snapshots. In this model, a snapshot is taken on the NAS and that data is backed up from the snapshot. This approach has multiple benefits, including avoiding file locks. This means that if a file is in use, it may not be reliably backed up. In the same vein, for large datasets, data consistency can be a problem as well. If it takes hours or days to perform a backup from a production file share, the backed-up data will have a large window that it can be modified in. By leveraging a storage snapshot, you can ensure that your data is consistently protected, at least from a timing perspective.
File-level backups can provide special benefits that may not be available in volume-based backups. This includes the ability to specify the directories or files you want protected and the ability to provide file versioning without needing to restore entire volumes. This granularity can also provide benefits when it comes to long-term storage, since entire volumes don’t need to be retained when only a data subset is required.
Designing an Efficient Network Infrastructure
With good reason, backup administrators are likely quite familiar with the 3-2-1 Rule for backup. This rule provides a straightforward framework that offers resiliency when it comes to data protection. For those unfamiliar with this rule, it is defined as having three copies of your data on two different mediums (e.g., disk and tape), one of which is immutable or offline. When protecting NAS data, the “two different mediums” portion of this rule can only really be accomplished by performing a copy over the network.
Depending on the amount of data that’s being copied and the efficiency of your method and transport, this process can take anywhere from minutes to hours or days. Traditional NDMP backups, for example, do not handle large backups promptly. I In fact, in some cases, large volumes can take weeks to backup! Most enterprises cannot afford to have an RPO measured in weeks.
With this in mind, file-level backups typically leverage NAS’s network connectivity to copy this data elsewhere. This is where fast networks become essential. To meet desired RPOs, 10GbE networking is generally considered the bare minimum. However, most modern CPUs can saturate 10 GbE network interfaces, creating a bottleneck. In these situations, 40 GbE or faster interfaces are another option that you can consider.
Additionally, verifying MTU settings across your entire network path is essential. Some organizations fully embrace Jumbo Frames, and others do not. Although performance increases are typically seen when using Jumbo Frames, mismatched MTU settings in the data flow path will hurt throughput.
Standard interfaces you’ll want to check include:
- Network interfaces on the NAS itself
- Top-of-rack switches for backup source and destinations
- Core routers and switches
- Any firewalls or other security devices that may be in the path
- The receiving device, whether it be NAS, file servers, tape libraries or something else
Whenever possible, a dedicated network for backups is recommended, since it brings many benefits, including:
- Segregation from production workloads, which minimizes the impact of high network traffic
- A smaller scope of devices and interfaces may be financially favorable when considering high-end networking devices (e.g., 10 GbE vs. 40 GbE)
- More granular access controls and security
Meeting Retention Goals
At the core, backups are used to restore data. Planning for various recovery scenarios is part of the challenge when designing a comprehensive backup strategy. With this comes the question of how to determine the number of restore points you need and how long they should be kept. Compromises must be made since you probably don’t have an unlimited budget for a backup project. Design choices, like how often a full backup should be performed opposed to an incremental backup, must be assessed.
Short Term Retention
Enterprises commonly perform restores from their more recent backups (i.e., ones that were performed within the last 30 days). This can typically be attributed to the fact that employees tend to largely work with the same data day after day. This means that errors can be typically found in a relatively short amount of time and restores from recent backups are required.
With this in mind, writing these restore points to a relatively fast medium like solid-state drives or high-performance spinning disks is a common practice. This approach will allow for quicker backups, since the data is written to disk faster than on a traditional spinning disk. Additionally, when you perform a restore, the read operations can be significantly quicker, thus helping you achieve your RTO.
Keeping all your restore points on a high-performance storage array isn’t always feasible for medium- or long-term retention. Challenges surrounding hardware budgets, storage density, heating/cooling and physical space constraints also make it unrealistic. Many organizations opt for secondary storage to house their medium-to-long-term retention points. Although this isn’t standard, it’s not uncommon to see enterprises keep 12 months’ worth of restore points on a slower storage array. These systems are typically cheaper than high-performance storage while offering higher overall capacity.
Archiving data is typically achieved in one of two ways. The first method is typically based on the file’s age. If a file has not been altered after a pre-defined period of time, the backups for that file can be moved to a secondary storage tier to free up capacity on the primary storage. This process works under the assumption that if a file has not been modified in a while it is less likely be restored. However, if data is automatically moved based on its time period, then you may wind up in a situation where your only backed-up copy is sitting on secondary storage. This can potentially result in longer restore times.
As an alternative, some file-based backup solutions can offload files from primary backup storage after changes are detected, like in a versioning system. This data will still typically remain in an online and accessible system. This allows for recovery, albeit with a potentially longer turnaround time. This approach provides flexibility so that only the most recent versions of files are kept on primary storage. This is typically faster and more expensive as opposed to automatically moving data based on a time-measured approach.
Tape as a medium for long-term retention is one of the most cost-effective methods available. The cost per GB of data is noticeably lower than that of a traditional hard drive and particularly a solid-state drive. This medium also tends to be reliable when properly stored and maintained. Lastly, the physical dimensions allow for high density storage capacity in a physical space. In part due to these benefits, it is commonly used as part of a “Grandfather-father-son” retention scheme, where certain copies (e.g., weekly, monthly, yearly, etc.) are kept for specified periods.
Tape as a storage medium is not without its downsides, however. First and foremost, it can be slow to perform restores. This medium is read in a sequential order, which translates to the tape having to be wound to the specific portion of magnetic media where the file(s) in question were written to.
Storing and maintaining tape also come with specific requirements. Factors like controlling natural light, humidity and temperature must be well thought out. Because of environmental requirements, many enterprises outsource tape storage to dedicated providers with facilities that are specifically built to store tapes at-scale. When selecting a partner, it is crucial that you understand their SLAs when delivering a tape back on-site. In the event of disaster, you don’t want to be in the position where you spend days waiting for a backup media tape to be delivered.
Tape capacity has been growing steadily for decades, largely through improvements to the tape libraries that are used to write data. A plan should be developed to ensure that data written with a tape library can be read by whatever tape library you could have in the future. It is not uncommon for organizations to keep an unused, redundant tape library off site to mitigate this potential future risk. If a tape library requires a specific host or operating system to function, it’s essential that you consider these dependencies as part of your plan.
An additional and relatively newer approach to long-term retention is to use object storage. Object storage, whether on-premises or through a cloud provider, can provide you with significant durability and security while being cost-effective. When paired with immutability, this allows you to have a copy of your backups online for faster recoveries while ensuring that your data cannot be altered by threat actors.
Using the Public Cloud for Offsite Backup
Using cloud backup services as secondary storage is a growing trend in the industry, and numerous providers are available with various solutions. Some offer basic file storage while others act as a target for NDMP or as backup software-specific repositories. Although the concept of offloading storage management might seem ideal, it is crucial that you consider the pros and cons of all these solutions.
Some benefits of leveraging a cloud provider include shifting Day 2 management duties to an external organization. Constraints like heating/cooling, physical datacenter space and break/fix or patching operations are no longer the IT department’s duties. Additionally, many of these offerings include “unlimited scaling,” which removes the hassle of having to procure new hardware. One last benefit of the cloud approach for some organizations is that it can shift the cost from a capital expense to an operating expense.
There are downsides when it comes to outsourcing your storage needs, however. First and foremost are the legalities. Depending on the industry vertical, there ay be tight restrictions on where your data can sit. These restrictions might include the following:
- The ability to control and audit access controls for the entire data path
- Data locality laws may require that storage devices physically remain in a specific country or territory
- Specific technologies, like defined encryption algorithms, may be mandated and not available to all providers
Using the cloud for storage is just one aspect to consider bandwidth requirements also need to be evaluated. Assessments need to be made on how fast you can send or receive data, since this might have a direct impact on meeting your RPOs and RTOs.
Design for Restore
Backups are only valuable if you can restore data as needed. Nobody wants to discover that their application data or document backups are corrupt and unrecoverable. Testing your backups can help negate this risk, but this is only one aspect of the puzzle.
When it comes to restoring data, especially large volumes, several points should be considered, including:
- Is your backup server online and available? If not, how does that affect your restoration operations?
- If you leverage cloud backups as part of your 3-2-1 strategy, how does that affect your recovery times?
- Can you perform a granular restore, like a single file or object?
In addition to the questions above, it’s important to understand the impact that your backup job types can have on your restores. If a complete data loss (i.e., a NAS hardware failure) were to happen, could you instantly mount your data from your backup storage and allow your users to quickly get back to work? Or, do you need to perform a full restore followed by numerous incremental restores?
Audits that determine precisely what is required for a full restore should be performed with regularity. Identifying essential components like software, required permissions or resource allocation, is often overlooked when it comes to recovery planning.
Although NAS storage can ease day-to-day demands for IT departments, protecting that data comes with its own set of challenges, especially when compared to virtual machine (VM) or application backups. Although multiple solutions are available to protect your environment, careful planning and considerations are required to find a strategy that best achieves your data protection and restoration goals.
Veeam Backup & Replication unstructured data NAS solution brief [Solution Brief]