Migrating to Cloud and Running Hybrid: Part 3 - Guest OS & Replication

Sat, Oct 12, 2024

If the environment that is being moved to a new platform is not VMware-based, or if vVols are not an option for some reason, then we moved to the next layer down and look at the performing data migrations from within an operating system. This is going to be performed by enabling & configuring iSCSI within Windows or Linux, creating a host object in the FlashArray with the IQN initiator for the iSCSI initiator, and then mapping a volume to this new host object on the FlashArray. Once the device is visible within the operating system, the raw device should be formatted with the appropriate file system option for the intended usage, and this newly formatted device can then be used for data migration. At this point, we need to discuss a few options and considerations which will be different for Windows versus Linux operating systems, and warn you that you should always have proper planning and backups in place prior to data conversions or migrations.

Within Windows, copying data is most likely going to be performed by robocopy which can copy large filesystems with options for recursion, permissions, and selection of specific properties to copy from source to destination. If anyone has had to work with cloning or migrating any significant amount of data before, you are most likely not a stranger to robocopy. If you are new to Windows migrations however, getting to where you can understand all of the options of robocopy to do bulk copy operations in the console is a lot and takes some practice.

Sometimes people are looking for simpler GUI option, even if it may take a bit longer to process the copying of data itself, as a tradeoff for ease of use. There were some older tools which gained some popularity in the past that you can still find links for (Robocopy GUI or RichCopy), and there are a decent number of open source tools that have come around (EasyCopy), and even some tools with free/paid versions which have some great easy functionality (TeraCopy). I use TeraCopy for personal use as I bought a pro license for a few bucks forever ago, and it’s still valid with the upgrades for nearly the last 10 years.

Beyond the actual data copying itself, you should also understand that this method of cloning data for Windows can be done for both entire disks, or just for directories. This is especially useful if you want to move data to external storage devices, without changing your layout of the file structure on your disk currently, or having to shut down applications to move data that is on your OS disk to a new secondary drive letter. This is all done via the capability of Windows to mount a drive as a folder, which lets you create an empty NTFS folder, and mount a drive to that folder path from Disk Management or from the command-line.

Now… the reality of moving data within Windows to a new device, whether it is on an external drive currently, or especially with moving data from your current OS disk to a drive mounted to a folder, is that you will probably face some downtime. Depending on your applications and how the system is serving data, this can be stopping & restarting of applications or services, of a full system reboot – it’s too hard to give general advice or recommendations for anything more specific than to say that you need to plan for some level of outage to perform a final cutover.

When we look at Linux, things might be easier depending on how the existing system is configured. I will start this section by admitting that I am far from a Linux guru, but I can Google and follow documentation with the best of ’em, then Google again to figure out what isn’t working. With that caveat, we’ll cover this in a higher overview fashion. If logical volumes are in use on the system, the Logical Volume Manager (lvm) has functionality and commands to handle creating volume groups, creating mirrored logical volumes, and add/remove disks to a mirror. Before you begin any of this process, it is highly recommended that you understand and confirm your multipathing configuration and any aliases in use on your system and for any devices to not confuse yourself, or work with the wrong device at the wrong time. If you have your devices online and identified, essentially the rough steps for this process are these:

Use ‘pvcreate’ to create a new physical volume (PV) from the new block device
Use ‘vgextend’ to add the new PV to the existing volume group (VG)
Use ’lvconvert -m 1’ to add the new disk as a mirror to the existing logical volume (LV)
Wait for the operation of the converting (mirroring) of the data to be completed
CONFIRM that the disks are working as mirrors (essentially RAID1 logic)
Use ’lvconvert -m 0’ to remove the original disk from the mirror within the LV
Use ‘vgreduce’ to remove the original PV from the VG
Use ‘pvremove’ to remove the original block device as a PV

Seems easy enough, right? Honestly, once you understand the constructs for each level of this process, it’s just understanding the syntax for your specific distro and version, and following the man pages or reading the online documentation.

Now, if you want to look at an alternative tool for performing this migrations in an easier fashion for a whole fleet of systems that need data moved from one platform to another, it may be worth considering a tool that makes this process more manageable. When these engagements go to our professional services team, or when customers ask for advice around how to do this at a much larger scale, we typically suggest CirrusData. (NOTE: This isn’t a plug or advertisement for the product, I just know that it works and does this well.)

Data migration tools exist all throughout the market to help perform these lower level data migrations at a volume level, typically performed via local agents which have management done at a fleet level, including some that are managed via cloud portals. If you want to look at more tools capable of doing these functions, I would recommend doing a quick search online or talking to your partner/vendor of choice to see what they recommend based on your requirements.

In the end, we are just trying to get your data disentangled from your operating system so that it can be handled in an easier fashion, which is replication. After the above wall of text above for how you can move your data to separate volumes, we get to the section where we discuss how we can copy all of that data around in bulk from an array level. In this perspective where we are discussing migration to cloud, plus potentially back on-prem, or a hybrid model of running in both we can say that we are focusing on asynchronous replication.

The good and the bad of discussing replication in regard to Pure Storage is that it is ridiculously easy as far as the core functionality and initial setup. We can demonstrate setting up replication between two arrays within a few minutes – like we’ve literally done live demos of this in lab environments in less than 5-10 minutes depending on who is talking (or rambling, in my case) while we perform the setup. For any two FlashArrays, or a FlashArray and Cloud Block Storage instance, we need network connectivity between the arrays; then we simply copy our connection key, the management & replication address of the source array, and enter these details into the destination array (we’ve already said async replication is out type for this overview).

Now, this only has our arrays connected, but from here it is not difficult to finish our configuration. We will still want to consider things like bandwidth throttling, but all we need otherwise is a protection group (pgroup) created on our source array, and add the replication target of our second array. We need to specify the snapshot and replication schedules that meet our needs for replicating data between our arrays, but local snapshots are not required if we are configuring this with the intention of replication only. From here, we simply need to add members to our pgroup which could be host or host groups, but as we are talking about migrating data to/from the cloud, we are really talking about our volumes that we extricated our data to with the methods highlighted earlier in this now very long post. Once we add our volumes as members, our replication will begin between our arrays based on the schedules, or a snapshot can be manually taken with selecting the ‘Replicate Now’ option and our data replication will be underway.

OK, that’s a lot of discussion for the high level overview of how we can separate our data and replicate it to and from the cloud in a reasonable fashion. In the next few blog posts, we will take a look at how we migrate our virtual machines themselves to AWS or Azure. We’ll see you soon for that.