Convergence Without Compromise

Hyperconverged Infrastructure (HCI) gets a lot of attention these days, and rightly so. With HCI we’ve seen a move towards an easy-to-use, pay-as-you-grow approach to the datacenter that was previously missing. Complex storage array that required you to purchase all your capacity up front is what I started with in my career. While expansion of these storage arrays was possible, often times we were buying all the storage we’d need for 3-5 years even though we wouldn’t be consuming it for multiple years.

While HCI certainly made things easier it was far from perfect. Mixing storage and compute nodes into a single server meant maintenance operations needed to account for both available compute resources as well as available storage capacity to accomodate offline storage. At times we would actually sacrifice our data protection scheme in order to takes nodes offline and hope there were no additional failures within a cluster at the same time. Not ideal when we’re talking about production storage.

Get Down with the DVX

Datrium and the DVX platform aim to address these problems in an interesting way. Datrium separates storage and compute nodes much like traditional two tier system, but utilizes SSDs inside each of the hosts to act as a read cache. By moving the cache into the host we’re able to increase performance with every host we add. This decoupling of cache from the storage layer means we’re not queuing up reads at a storage array that is trying to satisfy the requests of all the connected hosts over the same connected switches. While this sounds very similar to previous technologies we’ve seen before (Infinio and PernixData come to mind), the differentiator is the storage awareness.

The Datrium DVX solution utilizes their own storage nodes for the persistent storage piece. With the caching and storage being fully aware of each other, Datrium is able to offer end-to-end encryption from the hypervisor down to the persistent storage while still being able to take advantage of deduplication and compression. Often times encrypting data at the storage array level means we are forced to give up these data efficiencies, but not in the case of Datrium. We get an additional level of data security without having to make any compromises.

No Knobs, No Problems

HCI vendors have really pushed the configuration abilities within their systems. Customers can choose what data is deduplicated and compressed, whether or not it should be encrypted, how many copies of their data should be kept, and is erasure coding a better choice than traditional RAID just to name a few. This is where Datrium separates itself from its HCI competitors. Disaggregating compute nodes from the persistent storage layer, Datrium’s DVX system manages to deliver performance and features without penalty. Once again, no compromises.

Erasure coding, dedupe and compression, double-device failure protection, data encryption; every one of these features is always on and doesn’t require any separate licensing or configuration. The advantage here isn’t just in administrative overheard, but also in performance. Datrium’s performance numbers are based on each one of these features enabled. No tricks. No Gimmicks. What you see is what you get; unlike many of their competitors that hide behind unrealistic configurations many of these features being disabled.

3 Tiers, 1 Solution

Datrium aims to bring together a Tier 1 HCI-like solution, combined with scale-out backup storage and Cloud-based DR all in the same system. With integrated snapshots that utilize VMware snapshots as well as VSS integration, they are able to perform crash consistent and application consistent snapshots of virtual machines right on the box. This, of course, is table stakes when it comes to modern storage arrays. The differentiator is that Datrium is able to do this at the VM-level despite presenting NFS to the virtual hosts. Now we’re not just backing up all the VMs that live in a LUN or volume, we’re able to get as granular as the virtual disk itself. No VVOLs required.

Adding another level of visibility into the mix, Datrium reports its latency at the individual Virtual Machine level instead of at the storage array. Traditional storage array vendors talk about their ultra-low latency, but this reported latency is what the array is seeing not taking into account the latency imposed by virtual hosts and switching infrastructure. With each different component in the virtual infrastructure having its own queues, varying utilization and available bandwidth, the latency a Virtual Machine experiences is much greater than what the array is reporting. Datrium is offering this full visibility at the individual Virtual machine level so you know how your environment is actually performing. Dr. Traylor from The Math Citadel has an excellent overview of queuing theory, Little’s Law, and the math behind it.

The cloud-based integrations also allows for an additional level of data availability. Instead of requiring an additional backup software, Datrium allows for replication of your data to a DVX running in the cloud. Now we have an offsite copy of your data ready to be restored in the event of VM corruption or deletion. Replication is also dedupe-aware, meaning data isn’t being sent to the cloud if it is already present helping to minimize bandwidth requirements and speeding up the replication process.

Cloudy Skies Ahead

While I am very reluctant to trust one solution with my primary and backup data, in certain situations I can see the advantages. Integrations with AWS allowing for virtual machines to be restored from the Cloud-based DVX means your DR site can now be in AWS. Datrium has lowered the barrier to the cloud for a lot of customers with the features they’ve included in the DVX platform.

Datrium continues to make a good product even better. The additional features available in version 4.0 of DVX make this not only a great fit for SMB customers, but enterprises as well. A feature-rich, no-knobs approach to enterprise storage with backup and DR-capabilities all rolled into one. Datrium is definitely worth a look.

________________________________________

Disclaimer: During Storage Field Day 15, my expenses (flight, hotel, transportation) were paid for by Gestalt IT. I am under no obligation by Gestalt IT or Datrium to write about any of the presented content nor am I compensated for such writing.

Convergence Without Compromise

Tegile Array Replication and Restore

These days most of my replication is handled at the VM-level by software design for virtualization. While that is the case for most of my evironment, I still have a few non-virtualized workloads that run on shared storage that need to be replicated in the event of a disaster at my primary location. This process has never been too complex from my days of working with NetApp and now as I continue exploring the Tegile I’m happy to say that it’s just as easy through the GUI.

Documenting this process for my non-virtual workloads would be a little difficult so I’ve decided to document this process using an NFS datastore containingĀ a few virtual machines. The first half of this guide is setting up the replication relationship and replicating the data. The second-half is the process to actually restore that data and make it usable at your DR site.

 

1. Login to the web interface of the Tegile that is the replication source
2. Click on “Settings” then “App-Aware”
tegiledr111214-step2
3. Click on “Zebi Replication” on the left column
tegiledr111214-step3
4. Under the tab “Replication Target” click the “Add” button (This is adding the DR Tegile as the target array)
tegiledr111214-step4
5. Enter the name or IP of the array (the shared Management IP address) and the username/password (Optionally you can specify a port range for replication which we won’t be doing for this documentation) and click “Add”
tegiledr111214-step5
6. Once it has been successfully added it will appear in the “Replication Target” list
tegiledr111214-step6
7. Login to the web interface of the DR target Tegile, click on “Settings” then “App-Aware”, choose “Zebi Replication” on the left column and then click on “Replication Source” tab. You should see your other array listed here (The IP address will be the “management” IPs of each controller, not the shared management IP for both arrays)
tegiledr111214-step7
8. Back on the Primary Tegile (Replication source) click on “Data”
tegiledr111214-step8
9. Click on the disk pool then then project that will be replicated
tegiledr111214-step9
10. For this documentation I’ve created a Project named “NFS_Replication” with a volume named DR_Windows with 4 VMs inside. Click on the project that will be replicated and click on the “Edit” button
tegiledr111214-step10
11. Click on “Replication” on the left column
tegiledr111214-step11
12. Click the “Add Replication” button
tegiledr111214-step12

a. Select the Target System and click “Next”
tegiledr111214-step12a
b. Select the “Target Pool” and enter a name for the “Replication Project”. Click “Next”
tegiledr111214-step12b
c. Choose what options are required and which volumes will be replicated (This test only has one volume, DR_Windows, but you can include or exclude any volumes that exist in this project. We’ll choose quiesce which will perform a VMware snapshot to put the OS in a consistent state. Click “Next”
tegiledr111214-step12c
d. Choose your schedule (manual or automatic), frequency, and how many additional snapshots (restore points) will be saved on the target array. For this example we’ll do daily replication that happens at 10:49 am and we’ll keep 14 snapshots. Click “Finish”
tegiledr111214-step12d
e. Once it’s all setup, you’ll see your target array, the target pool, and the target project
tegiledr111214-step12e

13. I have 4 VMs in that datastore (DR-Test01-04). Once the time hits, we can see that snapshots are taken, then removed, for each of the VMs in that datastore.
tegiledr111214-step13
14. On the DR target array, we can see we now have snapshots available for this project. (The reason there are 2 is because I initiated a manual replication sync for testing first)
tegiledr111214-step14

a. To manually kick off a replica snapshot, on the source array, find the project, click on “Replication” and then click the “Play” button that says “Replicate”
tegiledr111214-step14a

 

That is how simpleĀ it is to setup replication. Now let’s imagine we need to spin up those replicated VMs in this volume. Here is how we do that.

 

1. On the DR target array, click on Data, select the pool, then click on “Replica (1)” to view the replica project
tegilerest111214-step1
2. Click the “Edit” button for the NFS volume
tegilerest111214-step2
3. Click on “Snapshots” and find the snapshot you want to bring live (We’ll choose the latest version). Click the “Clone” button
tegilerest111214-step3

a. Cloning the snapshot will allow us to create a new project and NFS Volume from this snapshot and spin up these VMs in DR. By doing a clone, we’re able to continue to replicate data in the event you are testing replication instead of having an actual DR event.

4. Enter a name for the new Project (DR_NFS_Replication for this writing) and a name for the mount point (/export/DR_NFS_Replication for this writing) and click “Clone”
tegilerest111214-step4
5. If successful, you’ll receive this message about the new project being created. Click “OK”
tegilerest111214-step5
6. Close the window for “Share Configuration” and click on “Local (1)” under “Projects”
tegilerest111214-step6
7. Click on the “DR_NFS_Replication” project then view the Mountpoint of the Share (/export/DR_NFS_Replication/DR_Windows). Note the “c” before the share name which denotes it was a clone from another projects
tegilerest111214-step7
8. Click the “Edit” button for the project and then click on “Sharing”
tegilerest111214-step8
9. This is where you will add the IP addresses or range of IPs that need read/write and root access to the shares in this project. The IP addresses/ranges will carry over from the source array. Our IP range is the same in DR as our lab so we’ll leave this alone.
tegilerest111214-step9
10. Connect to your DR vCenter server or ESXi hosts. Click on the host, then “Configuration”, then “Storage”
tegilerest111214-step10
11. Click “Add Storage” towards the top right
tegilerest111214-step11
12. Choose “Network File System” and click “Next”
tegilerest111214-step12
13. Enter the NFS IP address of the DR Tegile, enter the folder path (/export/DR_NFS_Replication/DR_Windows) and then enter the name of the Datastore (DR_Windows). Click “Next”
tegilerest111214-step13
14. Review the summary info and click “Finish”
tegilerest111214-step14
15. Repeat for each host that needs access to this datastore. Afterwards, right click the datastore and click “Browse Datastore”
tegilerest111214-step15
16. Inside you’ll see the 4 VMs we that were located in here before. Open each folder, right click the VM name.vmx file and choose “Add to inventory”
tegilerest111214-step16
17. Enter the name and location for the VM and click “Next”
tegilerest111214-step17
18. Choose the Cluster or host and click “Next”
tegilerest111214-step18
19. Review the settings and click “Finish.” Repeat for each VM that needs to be added.
tegilerest111214-step19
20. Power on all the VMs and now you can run any validation tests or bring these VMs live in a DR event
tegilerest111214-step20

 

Obviously, the process of mounting the datastore in your DR vCenter Server and re-adding the VMs one by one would be time consuming and tedious. When developing your DR plans, having this process scripted (easy enough in something like PowerCLI) ahead of time on the vCenter side of things would ease that burden. From the standpoint of the Tegile, this process is fairly intuitive and simple to setup. One of the things I love is that by default the data you are bringing live on the DR site is a clone and replication continues running without being affected.

Tegile Array Replication and Restore