• Overview

    PDF

    Overview

    About this document

    This document provides information you can use to configure and manage replication on your Unity storage system. Along with relevant concepts and instructions to configure replication using the Unisphere GUI, this document also include information on the CLI commands associated with configuring replication.

    Note:  For more information on other Unisphere features or CLI commands, refer to the Unisphere online help and CLI User Guide.

    Additional resources

    As part of an improvement effort, revisions of the software and hardware are periodically released. Therefore, some functions described in this document might not be supported by all versions of the software or hardware currently in use. The product release notes provide the most up-to-date information on product features. Contact your technical support professional if a product does not function properly or does not function as described in this document.
    Where to get help

    Support, product, and licensing information can be obtained as follows:

    For product and feature documentation or release notes, go to Unity Technical Documentation at: www.emc.com/en-us/documentation/unity-family.htm.

    For information about products, software updates, licensing, and service, go to Online Support (registration required) at: https://Support.EMC.com. After logging in, locate the appropriate Support by Product page.

    For technical support and service requests, go to Online Support at: https://Support.EMC.com. After logging in, locate Create a service request. To open a service request, you must have a valid support agreement. Contact your Sales Representative for details about obtaining a valid support agreement or to answer any questions about your account.

    Special notice conventions used in this document
    DANGER  Indicates a hazardous situation which, if not avoided, will result in death or serious injury.
    WARNING  Indicates a hazardous situation which, if not avoided, could result in death or serious injury.
    CAUTION  Indicates a hazardous situation which, if not avoided, could result in minor or moderate injury.
    NOTICE  Addresses practices not related to personal injury.
    Note:  Presents information that is important, but not hazard-related.

    About replication

    Data replication is one of the many data protection methodologies that enable your data center to avoid disruptions in business operations. It is a process in which storage data is duplicated to a remote or local system. It provides an enhanced level of redundancy in case the main storage backup system fails. It minimizes the downtime-associated costs of a system failure and simplifies the recovery process from a natural disaster or human error.

    The system supports asynchronous and synchronous replication of all storage resources, including file systems, NAS servers, LUNs, LUN groups (also known as consistency groups (CGs)), VMware VMFS datastores, VMware NFS datastores, and thin clones. The asynchronous replication feature leverages the Unified Snapshots technology to produce a read-only, point-in-time copy of source storage data and periodically updates the copy to keep it consistent with the source data. It leverages crash consistent replicas to provide remote data protection of storage resources. The synchronous replication feature leverages the MirrorView/Synchronous technology to mirror data in real time between local and remote storage resources.

    Note:  In general, Unity OE versions 4.0 or later support replication interoperability. However, the exception to this occurs when the source system is configured with features that are not compatible with an earlier Unity OE version that is running on the destination side of the replication session. For example:
    • Inline compression is only supported for block objects with OE versions 4.1 or later, while file objects are supported with OE versions 4.2 or later.
    • Block objects support asynchronous and synchronous replication with all Unity OE versions, while file supports only asynchronous replication with OE versions 4.0 or later, and synchronous replication with OE versions 4.4 or later.
    • Block and file objects support asynchronous replication of user snapshots with OE versions 4.2 or later, while only file supports synchronous replication of snapshots with OE versions 4.4 or later.
    • Advanced replication topologies, that is, fan-out or 1-N replication and cascading replication for asynchronous replication, are only supported with OE versions 5.x and only for file objects.
    NOTICE  Do not disable the automatic failback policy setting, otherwise the following issues may occur:
    • The operations to pause or delete a synchronous replication session may not complete and appear not to make any progress.
    • If SP reboots, synchronous replication sessions may not be recovered to a synchronization state of Consistent or In Sync. Instead, they may remain in a synchronization state of Out of Sync.
    In Unisphere under Settings, ensure the checkbox Management > Automatic failback policy is selected or the Automatic failback system general attribute in CLI is set to on.
    Replication modes

    Replication for both block and file storage can operate in the following modes:

    • Asynchronous—Use this mode when you want the data between the source and destination storage resources synchronized automatically at a specific interval, based on the Recovery Point Objective (RPO).
    • Synchronous—Use this mode when you want the data between the source and destination storage resources to always remain in sync.
    • Manual—Use this mode when you want to manually synchronize changes in the source storage resource to the destination storage resource. When you choose this mode, ensure that you periodically synchronize the session to avoid excessive pool space consumption.
    Recovery Point Objective

    Recovery Point Objective (RPO) is an industry accepted term that indicates the acceptable amount of data, measured in units of time, that may be lost in a failure. When you set up an asynchronous replication session, you can configure automatic synchronization based on the RPO. You can specify an RPO from a minimum of 5 minutes up to a maximum of 1440 minutes (24 hours). The default RPO is set at 60 minutes (1 hour) interval. In the case of synchronous replication, RPO is set to 0. You can use the Unisphere CLI or Unisphere Management REST API to specify a more granular RPO.

    Note:  Although a smaller time interval provides more protection and lesser space consumption, it also has a higher performance impact, and results in more network traffic. A higher RPO value may result in more space consumption. This may affect the snapshot schedules and space thresholds.
    Source and destination storage resources

    In Unisphere, for all replicated storage resources except for thin clones, once replication is configured, the destination storage resource is automatically created. In CLI, you must manually create the destination storage resource and then create the replication session between the source and destination storage resources.

    NOTICE  For file synchronous replication, the model-to-model source and destination replication pair allowed is as follows:
    • Unity 300(F) to a Unity 300(F)
    • Unity 350F, 380(F), 400(F), or 500(F) to a Unity 350F, 380(F), 400(F), or 500(F)
    • Unity 450F, 480(F), 550F, 600(F), 650F, 680(F), or 880(F) to a Unity 450F, 480(F), 550F, 600(F), 650F, 680(F), or 880(F)

    You can convert a thin LUN to a non-thin (thick) LUN, or a thick LUN to a thin LUN with a LUN move operation. To enable data reduction on a thin LUN requires an All-Flash pool on the destination system. For thick file systems, the replication process matches the destination storage resource to the source. In this case, thin and data reduction cannot be selected for file systems. For thin file systems, the following rules apply for replication:

    • If the source file system is thin, then the destination file system is also thin.
    • If the source file system is thick, then the destination file system is also thick.
    • If both the source and destination systems support data reduction, then the source or destination can have data reduction enabled or not.
    • However, if you change the source file system data reduction attribute, the change is not replicated to the destination system (retains the original setting) regardless of using asynchronous or synchronous replication.

    The following file asynchronous replication rules also apply to synchronous replication:

    • The type of NAS server is set by the user during server creation.
    • Users can change the type of a NAS server when there are no replication sessions on any of the file systems using that NAS server.
    • In destination mode, only the local configuration is enabled and all attributes in the local configuration are active.
      Note:  A storage resource in destination mode can be or is used as a replication destination and data access is restricted. For asynchronous replication, the storage resource is read-only. For synchronous replication, the storage resource is not accessible. When replication is active, the storage resource in the destination site is in destination mode. When replication is failed over, the destination storage resource becomes the source while the original source storage resource is set to destination mode.
    • Override configuration specifies a set of attributes which are enabled when the NAS server is changed from destination mode to source. The override configuration of a NAS server is not replicated as part of NAS server replication.
    • Local configuration (also known as back up) specifies a set of attributes that are related to enabling back up or local test through NFS or NDMP protocols. The local configuration of a NAS server is not replicated as part of NAS server replication.

    Table 1 classifies the Global, Override, and Local attributes for NAS server. Yes means the attribute exists in the configuration, and No means the attribute does not exist in the configuration.

    Table 1. Classification of configuration attributes for NAS server
    Attribute
    Global Configuration
    Override Configuration
    Local Configuration
    Production IP interface
    Yes
    Yes
    No
    Backup IP interface
    No
    No
    Yes
    DNS
    Yes
    Yes
    Yes
    NIS
    Yes
    Yes
    Yes
    CIFS server (name, domain name, NetBIOS name, LDAP org string)
    Yes
    No
    No
    NFS export
    Yes
    No
    No
    NFS export for Local snapshots
    No
    No
    Yes
    CAVA
    Yes
    No
    No
    NDMP user/password
    No
    No
    Yes
    ASA user/password
    Yes
    No
    No

    Table 2 lists file system level mount options that are also saved into the NAS server configuration, and all of these are Global configuration:

    Table 2. File system level mount options saved into NAS server configuration
    UEMCLI Option
    RESTful Option
    Mount Option
    -cifsNotifyOnWrite
    isCIFSNotifyOnWriteEnabled
    FS_PROPERTY_NOTIFYONWRITE_SYNC
    -cifsNotifyDirDepth
    cifsNotifyOnChangeDirDepth
    FS_PROPERTY_TRIGGERLEVEL
    -cifsNotifyOnAccess
    isCIFSNotifyOnAccessEnabled
    FS_PROPERTY_NOTIFYONACCESS_SYNC
    -cifsOpLocks
    isCIFSOpLocksEnabled
    FS_PROPERTY_NOOPLOCK
    -cifsSyncWrites
    isCIFSSyncWritesEnabled
    FS_PROPERTY_NP_CIFSSYNCWRITE
    -accessPolicy
    accessPolicy
    FS_PROPERTY_NP_ACCESSPOLICY
    -folderRenamePolicy
    folderRenamePolicy
    FS_PROPERTY_NP_RENAMEPOLICY
    -lockingPolicy
    lockingPolicy
    FS_PROPERTY_NP_LOCKINGPOLICY
    -eventProtocols
    fileEventSettings
    FS_PROPERTY_NP_CEPPPOLICY

    A user can set these properties through Unisphere and the CLI. If a synchronous or asynchronous replication session is created on the file system, these properties will be replicated through the NAS server's replication session to the destination and can be seen on the destination NAS server after the configuration view is refreshed.

    Table 3 lists other file system properties not related to mount options.

    Table 3. File system properties not related to mount options
    UEMCLI Option
    RESTful Option
    Description
    -id
    id
    Not replicated, the source and destination can be different.
    -name
    name
    Destination must be the same as the source. Unisphere creates the destination file system with the same value as the source.
    -desc
    description
    Not replicated, the source and destination can be different.
    -size
    size
    Replicated.
    -thin
    isThinEnabled
    Destination must be the same as the source. Unisphere creates the destination file system with the same value as the source.
    -compression
    isCompressionEnabled
    Not replicated, in Unisphere the system shows the source value as default for destination. It can be changed.
    -type
    supportedProtocols
    Destination must be the same as the source. Unisphere creates the destination file system with the same value as the source.
    -fastvpPolicy
    fastVPParameters
    Not replicated, in Unisphere the system shows the source value as default for destination, but it can be changed.
    -poolFullPolicy
    poolFullPolicy
    Not replicated, the source and destination can be different.
    NOTICE  If both the source and the destination file systems are legacy file systems that have been upgraded from OE version 4.2.x and a synchronous replication session has been created on them, the file system setting of minSizeAllocated will not be replicated between them.

    For properties that are not replicated when a failover occurs, ensure that you modify the attributes of the associated destination storage resource to match the attributes of the source storage resource.

    When a thin clone is replicated, the destination resource is automatically created with the same attributes as the source thin clone, except that the destination resource is a full copy, rather than a thin clone.

    Snapshots

    Block and file objects support asynchronous replication of user snapshots at OE version 4.2 and later, while only file objects support synchronous replication of snapshots at OE version 4.4 and later. Also, asynchronous replication supports the replication of existing user snapshots during the initial replication session configuration, while synchronous replication does not. For synchronous replication, source snapshots will be replicated to the destination only after the replication sessions are established.

    Note:  To do snapshot replication from a source system running OE version 4.0, 4.1, 4.2, 4.3, or 4.4 to a destination system running OE version 5.x, requires upgrading the source system to OE version 4.5 first. Upgrading to OE version 4.5 is not required but recommended if you want to do LUN or file system replication from OE version 4.0, 4.1, 4.2, 4.3, or 4.4 to OE version 5.x without any snapshot replication.

    Asynchronous replication supports the replication of read-only user snapshots to either a local or a remote site along with the resource data. Both scheduled snapshots and user created snapshots can be replicated. Snapshots are supported for all resources that support asynchronous replication (that is, file system, LUN, LUN group (also known as a consistency group (CG)), VMware VMFS, and VMware VMNFS).

    Note:  User snapshots do not apply to the NAS server resource type.

    Asynchronous replication of scheduled snapshots can be enabled during session creation, or enabled or disabled at any time in the lifetime of the replication session. User snapshots can be replicated with a remote retention policy that is different than that of the source.

    To support asynchronous replication of snapshots, both the source and destination systems must be running Unity OE version 4.2 or later. Snapshot replication can be enabled on an existing Unity OE version 4.0.x- or OE version 4.1.x-based session after both the production and the remote systems have been upgraded to Unity OE version 4.2 or later. Only read-only snapshots are eligible for replication, and they can only be replicated to the disaster recovery site where the replication destination storage resource is located. Any snapshots that are writable, such as attached block snapshots or file snapshots with shares or exports, are not replicated.

    Snapshots that exist prior to asynchronous replication session creation can be selected for replication during replication session creation. Snapshots that are older than the last sync (RPO) time can be manually selected for replication and included in the next RPO sync.

    A user snapshot can have one of the following asynchronous (async) replication state attributes:

    • Not marked for replication (No)-snapshot is not marked for replication
    • Pending sync (Pending)-snapshot is marked for replication but is awaiting transfer
    • Replicated (Yes)-snapshot has successfully transferred to the disaster recovery resource
    • Failed to replicate (Failed)-snapshot failed to replicate

    When the operational status of a synchronous replication session is Active, checkpoint snapshots that are created on the source array (either manually through Unisphere, CLI, RESTful command, or by a snapshot schedule) are synchronously created on the destination system. The data is consistent between the source and destination snapshots with the destination snapshot having the same content, name, description and retention policies as the source snapshot. If a checkpoint snapshot create operation fails on either the source or destination system due to any reason (such as out of space) when a synchronous replication session is Active, the snapshot is deleted on the other system.

    You can associate a snapshot schedule to any file system; however, if the file system is under synchronous replication, an association with a snapshot schedule is allowed only on the source side. The association will be propagated to the destination side according to the following rules:

    • Association of the source file system with cluster schedule (synchronously replicated Schedule00) makes the destination file system associated with the same cluster schedule.
    • De-association of the source file system from the cluster schedule (change to local schedule or no-schedule) de-associates the destination file system from any schedule.
    • Change from one local schedule or no-schedule to another local schedule on the source file system does not affect the association on the destination file system.
    Note:  The destination mode file system is not always on the destination side. For example, when you perform an unplanned failover from the destination site, the destination file system is no longer in destination mode. The snapshot schedule will execute on the destination site. When the source site is up, the source file system will be changed to destination mode automatically. If the network connection between the source and destination is OK, the role will not be switched until the session is failed back or resumed from the destination side.

    If a synchronous replication session is not Active (a condition in which I/O is not synchronously mirrored to the destination system due to a syncing in progress, a connectivity issue, or other system problem), checkpoint snapshots created on the source are marked as not-replicated. It is possible to delete the checkpoint snapshots on the source system but leave them on the destination system. It is also possible to change the retention policy settings or snapshot description on the source system without replicating them to the destination system.

    NOTICE  Restoring a snapshot on a file system under synchronous replication session is not allowed.

    When the operational status of a synchronous replication session is Active, checkpoint snapshots that are deleted from the source system are synchronously deleted from the destination system. Also, changes that are made to the retention policy or snapshot description on the source system are replicated to the destination system. It is possible to create snapshots on the destination system, but such snapshots are not replicated to the source system.

    Replicated snapshot deletion or modification of its retention settings or snapshot description generate a warning on the destination system. Operations on the destination system only affect the local objects and are not reflected to the source system. Restoring a snapshot on a file system under a synchronous replication session is not allowed.

    A user snapshot can have one of the following synchronous (sync) replication state attributes:

    Sync Replicated snapshot schedule replication

    Sync Replicated snapshot schedule is synchronized to the peer site while the two sites are connected (source site connected to a synchronous replication destination site). If the two sites are disconnected, you cannot create a new Sync Replicated schedule. If a synchronously replicated schedule is updated, the schedule on the peer is also updated.

    A Sync Replicated schedule can be configured from either site using Unisphere, or CLI or REST commands. However, the schedule can be associated to a storage resource (file system or VMware NFS datastore) only from the source for each replication session. A change to a Sync Replicated schedule on either site is synchronized to the other and updates the Sync Replicated schedule with the matching name when the two sites are connected through the management interface. If the two sites are disconnected, you cannot modify the Sync Replicated schedule, however, you can associate the production file system with a local schedule.

    Sync Replicated schedule deletion must be synchronized to the peer site. You can delete a Sync Replicated schedule when it is not associated to any resources on any site. If the peer sites are management-fractured, you cannot delete a Sync Replicated schedule. If a system is not participating in a cluster, you can delete the Sync Replicated schedule which has no associations with the file system.

    File-based replication session actions

    On Unity systems running OE version 4.2, the following asynchronous replication actions affect both the NAS server and its associated file systems when run at the NAS server level:

    • Failover
    • Failover-with-sync
    • Failback
    • Pause
    • Resume

    On Unity systems running OE version 4.4, the following synchronous replication actions affect both the NAS server and its associated file systems when run at the NAS server level:

    • Failover
    • Failback
    • Pause
    • Resume
    NOTICE  NDMP backup/restore operations on a synchronous replicated source file system are not preserved after failover.

    If an unplanned failover occurs, it is necessary to check whether the file systems on the original destination system have the expected sizes. For example, in the following scenario, the resize action, either expand or shrink, triggered on the source system could not be applied to the destination system, therefore, the sizes are different:

    1. The source file system is manually resized while communication is disconnected (only the source could be resized but the destination could not be resized in this situation).
    2. An unplanned failover is performed on the destination system before communication is fully recovered. The destination file system size could have an unexpected value after the unplanned failover.

    Check and resize the destination file system to the expected value after the unplanned failover and before executing a resume or failback of the synchronous replication session. This action will help to avoid an unexpected size being updated to the source after a failback or resume operation, which may cause a potential issue.

    Note:  If you need to perform a graceful/planned asynchronous replication failover, use Failover-with-sync from the source system. If you need to perform an emergency/unplanned synchronous replication failover when there is no network connectivity to the source system, use Failover from the destination system without switches, then Resume to restart the sessions on the destination system. This action also reverses the direction from destination to source. Or, instead of doing Resume, you could simply Failback to go back to the source. When there is network connectivity to the source system, CLI requires you to use the failover -sync no switch issued from the destination system before it allows an unplanned failover. If you need to perform a graceful/planned synchronous replication failover, use Failover from the source system, where sessions remain running and reversed back from the destination to the original source system. To move back to the original configuration, run another Failover, this time from the destination system, to gracefully fail back to the original source system. A Resume is not required in this scenario because the sessions remain running.

    Each of these actions triggers a group operation towards the NAS server replication session and its associated file system replication sessions. A NAS server replication as a group is available for local and remote asynchronous replication.

    NOTICE  Do not perform a group operation at both sides of a replication session at the same time. This action is not prohibited by the storage system, however, a group operation performed at the same time at both sides of a replication session can cause the group replication session to enter an unhealthy state. Also, failover-with-sync for asynchronous replication is not a transparent operation. During the failover-with-sync process, hosts' write/read requests may be rejected.

    A group replication session operation on a NAS server supports up to 500 file system replication sessions in such a way that those sessions look like one replicated unit. If group operations are conducted on a group session whose file system replication session numbers exceeds 500, the group replication session may enter an unhealthy state, along with some file system replication sessions.

    Note:  Although a group replication session looks like one operation, each file system is replicated individually. If any of the individual file system replication sessions fail, you can resolve the issue and then select the individual file system to replicate.

    Those same replication actions towards a file system remain at the file system level. Those actions are still individual operations toward file system replication sessions.

    The following replication actions affect only the NAS server when run at the NAS server level or are still individual operations toward file system replication sessions:

    • Create
    • Sync
    • Delete
    • Modify

    A destination file system changes from read only (RO) mounted to half-mounted when a synchronous replication session is created on it. It changes back to RO mounted when the synchronous replication session is deleted. Any file system functionality that relies on a RO mounted state will not work on the destination file system under a synchronous replication session (for example, disaster recovery access on the destination file system through Proxy NAS Server). In this case for such access, a snapshot should be created on the file system instead.

    Updating the view of the destination NAS server configuration

    While file system data and NAS server configuration are synchronously maintained between the source and destination systems, by default, the view of the NAS server configuration at the destination system from the management interface is only updated automatically every 15 minutes. However, if you need or want to see whether any changes to the NAS server configuration have occurred before the default update runs, you can manually issue an on demand update of the view of the NAS server configuration at the destination system from either Unisphere or the CLI.

    Note:  Since the synchronous replicated configuration file system is unreadable during initial synchronization or synchronizing after a fracture, it is not possible to update the view of the NAS server configuration at the destination system at those times.

    Advanced replication topologies for asynchronous file replication

    Unity supports advanced replication topologies, that is, fan-out (1 to many) replication and cascading (multi-hop) replication for asynchronous file replication only. Fan-out supports a maximum of four asynchronous replication sessions on the same file storage object between two remote systems, including the local system, whether or not the storage object is in destination mode. Cascaded replication replicates to another tier or level from an already replicated resource. Each cascade level can use fan-out replication for up to three additional sites. Each replication session can have an independent Recovery Point Objective (RPO).

    The existing replication operations are supported with some restrictions:

    • This feature only supports file storage objects and does not support block storage objects.
    • All systems joining the multiple sessions, either in fan-out (star) or cascaded mode, must be running OE version 5.x.
    • A user snapshot replication can only be supported for one asynchronous session among all the sessions associated with the same storage object.
    • Only one loopback asynchronous session is supported per storage object.

    The following figure shows an example of a possible configuration with both fan-out and cascaded replication. Each lettered box represents a system running OE version 5.x. A represents a production site. The source object is not in destination mode and all of the replication sessions on this resource act as the source. B, D, and E represent cascaded remote sites. Sessions on these resources act as the source for one session and act as the destination for another session. C, F, G, H, I, and J represent end remote sites. All of the sessions on these resources act as the destination and are in destination mode.

    Figure 1. Fan-out and cascade replication topologies
    Fan-out_and_cascade_replication_topologies

    Snapshot replication in advanced replication topologies

    For snapshot replication support, only one asynchronous replication session among all the sessions associated with the same storage resource can be enabled to transfer the snapshot from the source to the destination. The following figure and table show an example of how basic snapshot replication is supported along with independent RPO settings on each session. In the figure, A represents a production site with a NAS server and associated file system and user snapshot, and B, C, D, E, F, G, H, and I represent remote sites. Solid lines represent replication sessions that have a replicated snapshot while dashed lines represent replication sessions that do not have a replicated snapshot. The table lists the systems for each asynchronous replication session, whether a snapshot is replicated, and the RPO for each session.

    Figure 2. Replication sessions with and without snapshots to remote sites
    Replication_sessions_to_remote_sites
    Table 4. Snapshot replication and RPO for replication sessions
    Asynchronous Replication Session
    User Snapshot Replication
    RPO (in minutes)
    A to B
    Yes
    60
    A to C
    No
    180
    A to D
    No
    240
    A to E
    No
    480
    B to F
    No
    360
    C to G
    Yes
    720
    D to H
    Yes
    960
    E to I
    Yes
    1440

    Dell EMC MetroSync for Unity

    Dell EMC MetroSync for Unity is a collection of features that provide a file resource disaster recovery solution. To use all of the features for this solution, the systems must be running OE versions 4.4 or later. The underlying features of MetroSync for Unity are:

    • NAS server and file system synchronous replication
    • File synchronous replication support for snapshots
    • Sync replicated snapshot schedule replication
    • Cabinet-level unplanned failover of file-based replication sessions
    • Asynchronous replication of synchronous file replication to a third site

    Using replication for disaster recovery

    In a disaster recovery scenario, the primary (source) system is unavailable due to a natural or human-caused disaster. Data access is still available because a replication session was configured between the primary and destination systems, and the destination system contains a full copy, or replica, of the production data. The replica is up-to-date in accordance with the last time the destination synchronized with the source, as specified by the automatic synchronization recovery point objective (RPO) setting. By issuing a session failover on the destination system, you make the destination system the new production system, using the replica of the primary system’s data that resides on the destination system. Using replicas for disaster recovery minimizes potential data loss. The amount of potential data loss is affected by the RPO that is configured when setting up the replication session. In synchronous replication configuration, where the RPO is set to 0, the amount of potential data loss will be minimal.

    The asynchronous failover operation always restores the destination resource to the replication common base snapshot. If failing over to the common base is not sufficient and replicated user snapshots exist, the destination resource should be manually restored to any of the replicated user snapshots.

    Once the session is failed over to the destination system, the destination storage resource becomes read-write. At this point, ensure that the storage resource has the correct access permissions to the host and share. When originally establishing a replication session between the primary and destination systems, create the proper host access on the destination system ahead of time to reduce downtime in an event of a disaster.

    To resume the operations on the destination and switch the roles, resume the replication session. To resume the operations on the source, fail back the replication session.

    File-based replication consideration

    Switch over the NAS server replication session using the Failover option. This action triggers a group operation towards the NAS server replication session and its associated file system replication sessions.

    The NAS server replication session should be in one of the following states in order for it to be failed over to the destination system:

    • Idle
    • Auto Sync Configured
    • Lost Communication
    • Lost Sync Communication
    • Non Recoverable Error

    If the NAS server replication session is in one of the following states, it cannot be failed over to the destination system:

    • Paused
      Note:  The Paused state only affects NAS server replication session failover in systems with OE versions 4.x. It does not affect synchronous or asynchronous NAS server replication session failover in OE versions 5.x.
    • Error states other than Lost Communication, Lost Sync Communication, or Non Recoverable Error
    NOTICE  (This notice does not apply to asynchronous file replication in systems running OE versions 5x.) When a source site has a power outage and file replication sessions are failed over to the destination site, after power is restored and when the source site is restarted with the destination site well connected, a duplicate IP issue (the production IP addresses of the source and destination NAS servers are the same and both are in service) can be avoided. The duplicate IP issue may not be avoided for other cases which include but not are limited to:
    • Source site is alive when failover is executed on the destination site.
    • Remote system IP connection is broken during the source site restarting.
    • SP failover and file synchronous replication failover are executed at the same time.

    To resume the operations on the destination and switch the roles, resume the NAS server replication session. To resume the operations on the source, fail back the NAS server replication session.

    NOTICE  If a synchronous replication session is created on an import target NAS server with file systems on it, do not execute a failover before the import is committed. In this case, the system will reject the failover from the source (planned failover), but the system will not reject failover from the destination (unplanned failover) if the destination is disconnected from the source (import target). Ensure the source site is permanently down before failing over.
    Cabinet level unplanned failover of file-based replication sessions

    In case the source system is not available for any reason, you can execute a failover of all NAS server synchronous replication sessions from the remote system. This operation also automatically fails over the replication sessions of file systems created on the affected NAS servers. The cabinet level operation must be executed using the /remote/sys/ failover CLI command and must be run from the destination system. You must run the /remote/sys show CLI command from the destination system to obtain the remote system ID of the source system in order to perform the cabinet level failover. When the destination system detects that the source system is actually still online, you can run the command with the -force option. Once the source system has been recovered, there is no option to perform a cabinet level failback. Instead, resume and fail back each NAS server session to the original source using the graceful failover operation.

    In case a NAS server session failover operation fails as part of a cabinet level failover, the system still continues to failover the other NAS server sessions. For any NAS server session failover that fails as part of a cabinet level failover operation, you can switch over the individual NAS server after the cabinet level failover completes using either the Failover action option in Unisphere or the /prot/rep/session failover CLI command.

    Unplanned failover in advanced file asynchronous replication topology

    With the advanced file asynchronous replication topologies introduced in OE version 5.x, multiple unplanned failover operations on the same resource could be executed serially or concurrently. For the end remote sites, the unplanned failover behavior does not change from earlier OE releases. However, the behavior of a cascade mode site depends on when the processing on the upstream and downstream sessions occur with regard to the planned failover command.

    Note:  Unplanned failover may cause duplicate IP addresses for the NAS server. If the NAS server is running with CIFS support, it may cause a duplicate CIFS or SMB server as well. IP addresses need to be well planned and assigned to avoid this issue.
    • If the processing of the upstream session unplanned failover command on the cascade mode occurs before the processing of the downstream session unplanned failover command, then the cascade site will be in destination mode.
    • If the processing of the upstream session unplanned failover command on the cascade mode occurs after the processing of the downstream session unplanned failover command, then the cascade site will not be in destination mode.

    With the advanced file asynchronous replication topologies, the destination of a resume operation could be functioning as a read/write (RW) production source resource for another replication session. The resume operation checks for this condition and, if discovered, causes the resume operation to fail. In this RW to RW (source to source) case, the local data change could exist in both sides. The resume operation includes an option to overwrite data on the destination resource to complete the operation successfully:

    • In Unisphere, selecting Resync the remote and overwrite any data written to the remote discards the data changes in the destination resource. The data in the local source resource is retained and synchronized with the data in the destination resource. The local source resource is changed to destination mode and replication resumes in the original direction.
    • In the CLI, -forceSyncData forces data transfer from the local source resource to the destination resource, even if the destination resource has data that is not replicated from the local source resource. The data in the local source resource is retained and synchronized with the data in the destination resource. The local source resource is changed to destination mode and replication resumes in the original direction.

    In another case, the destination of a resume operation could be functioning as an active destination resource for another replication session. The resume operation checks for this condition and, if discovered, causes the resume operation to fail. A Resume operation from a local resource to keep local data changes is not allowed to a remote resource that is an active destination.

    Using replication for planned downtime

    Unlike a disaster, in which the primary (source) system is lost due to an unforeseen event, planned downtime is a situation for which you plan and take the source system offline for maintenance or testing purposes on the destination system. Prior to the planned downtime, both the source and destination are running with an active replication session. When you want to take the source offline in this scenario, the destination system is used as the production system for the duration of the maintenance period. Once maintenance or testing completes, return production to the original source system. Planned downtime does not involve data loss.

    To initiate a planned downtime, use the Failover with sync option (for synchronous replication, use Failover option in Unisphere) on the source system. When you fail over a replication session from the source system, the destination system is fully synchronized with the source to ensure that there is no data loss. The session remains active with roles switched for synchronous replication, and paused for asynchronous replication, while the source becomes Read-Only and the destination becomes Read-Write. The destination storage resource can be used for providing access to the host.

    Performing a failover with sync operation on an asynchronous replication session results in replication copying all the data, including any snapshots that have been created or marked for copy since the last sync occurred, to the destination site. Once the copy is finished, the destination is an exact replica of the source site and the roles are switched similar to the failover operation.

    To restore operations on the source, fail back the replication session.

    For synchronous replication, both roles and operations switch sides. To resume the operations on the original source, perform a failover again. For asynchronous replication, to resume the operations on the destination and switch the roles, resume the replication session. To resume the operations on the source, fail back the replication session.

    File-based replication consideration

    The NAS server replication session should be in one of the following states in order to do a planned fail over to the destination system:

    • Idle
    • Auto Sync Configured
    • Active

    If the NAS server replication session is in one of the following states, you cannot do a planned fail over to the destination system:

    • Paused
    • Error states

    For asynchronous replication to minimize disruption during a planned downtime window, ensure that the NAS server and associated file system replication sessions are manually synchronized first and then failed over. Follow these steps:

    1. Synchronize the NAS server replication session using the Sync option.
    2. Synchronize the replication sessions for each of the file systems associated with the NAS server using the Sync option. This ensures that the destination file systems have the latest data and minimal data will need to be transferred when the replication sessions switch over.
    3. Inform file system users and quiesce I/O operations from hosts and applications using the file systems in the NAS server.
    4. Switch over the NAS server replication session using the Failover with sync option. This action triggers a group operation towards the NAS server replication session and its associated file system replication sessions.
    5. Once all replication sessions have successfully failed over, resume I/O operations with the relevant applications and hosts.
      Note:  Any I/O attempted when the failover is occurring may result in read/write errors or stale file handle exceptions.
    Planned failover in advanced file asynchronous replication topology

    With the advanced file asynchronous replication topologies introduced in OE version 5.x, multiple planned failover operations on the same resource could be executed serially or concurrently. For the end remote sites, the planned failover behavior does not change from earlier OE releases. However, the behavior of a cascade mode site depends on when the remount task on the upstream and downstream sessions occur with regard to the planned failover command.

    Note:  Planned failover may cause duplicate IP addresses for the NAS server. If the NAS server is running with CIFS support, it may cause a duplicate CIFS or SMB server as well. IP addresses need to be well planned and assigned to avoid this issue.
    • If the remount task of the upstream session planned failover command on the cascade mode occurs before the remount task of the downstream session planned failover command, then the cascade site will be in destination mode.
    • If the remount task of the upstream session planned failover command on the cascade mode occurs after the remount task of the downstream session planned failover command, then the cascade site will not be in destination mode.

    With the advanced file asynchronous replication topologies, the destination of a resume operation could be functioning as a read/write (RW) production source resource for another replication session. The resume operation checks for this condition and, if discovered, causes the resume operation to fail. In this RW to RW (source to source) case, the local data change could exist in both sides. The resume operation includes an option to overwrite data on the destination resource to complete the operation successfully:

    • In Unisphere, selecting Resync the remote and overwrite any data written to the remote discards the data changes in the destination resource. The data in the local source resource is retained and synchronized with the data in the destination resource. The local source resource is changed to destination mode and replication resumes in the original direction.
    • In the CLI, -forceSyncData forces data transfer from the local source resource to the destination resource, even if the destination resource has data that is not replicated from the local source resource. The data in the local source resource is retained and synchronized with the data in the destination resource. The local source resource is changed to destination mode and replication resumes in the original direction.

    In another case, the destination of a resume operation could be functioning as an active destination resource for another replication session. The resume operation checks for this condition and, if discovered, causes the resume operation to fail. A Resume operation from a local resource to keep local data changes is not allowed to a remote resource that is an active destination.

    Failback a replication session

    To resume operations on a source system, the associated replication session needs to be failed back. To fail back a replication session, use the Failback option on the original destination system. For asynchronous replication, failback synchronizes the original source with the changes made on the original destination after failover, including any snapshots that have been created since the failover operation occurred. It then restores the source as the production system and restarts the replication session in the original direction.

    For synchronous replication, failback is only used after an unplanned failover (use the failover operation again if a planned failover had occurred). Failback synchronizes the original source with the changes made on the original destination after an unplanned failover, not including any snapshot. A full copy is needed (to resume from the destination) due to a restriction of the MirrorView/Synchronous technology. If there are snapshots on the source storage resource, its size will be increased (about 100%). If those old snapshots are deleted later, the size will shrink back. Failback then restores the source as the production system and restarts the replication session in the original direction.

    File-based replication consideration

    To resume operations on a source system, the associated NAS server replication session needs to be failed back. To fail back a NAS server replication session, use the Failback option on the original destination system. This action triggers a group operation towards the NAS server replication session and its associated file system replication sessions.

    Failback in advanced file asynchronous replication topology

    With the advanced file asynchronous replication topologies, fan-out and cascade, introduced in OE version 5.x, the destination of a failback operation could be functioning as a read/write (RW) production source resource for another replication session. The failback operation checks for this condition and, if discovered, causes the failback operation to fail. In this RW to RW (source to source) case, the local data change could exist in both sides. The failback operation includes an option to overwrite data on either the source or destination resource to complete the operation successfully:

    • In Unisphere Keep local data changes by updating the remote resource or in the CLI -syncData force discards the data changes in the destination resource. The data in the local source resource is retained and synchronized with the data in the destination resource. The local source resource is changed to destination mode and replication resumes in the original direction.
    • In Unisphere Keep remote data by discarding all local data changes or in the CLI -syncData ignore discards the data changes in the local source resource. The data in the destination resource is retained and synchronized with the data in the local source resource. The local source resource is changed to destination mode and replication resumes in the original direction.

    In another case, the destination of a failback operation could be functioning as an active destination resource for another replication session. The failback operation checks for this condition and, if discovered, causes the failback operation to fail. Local data changes cannot be synchronized back to the remote session when that session is already an active destination. The failback option, in Unisphere Keep remote data by discarding all local data changes or in the CLI -syncData ignore, can be used to overwrite data on the source resource to complete the operation successfully.

    Asynchronous replication of a synchronous file replication to a third site

    MetroSync for Unity supports configuring synchronous replication sessions of a NAS server and its file systems to one Unity system destination site and asynchronous replication sessions of the same NAS server and its file systems to a different Unity system destination site. These systems must have OE version 4.4 or later.

    When creating a new NAS server or when a NAS server does not have an associated replication session, you can configure one synchronous replication session and one asynchronous replication session for that NAS server. In the case of an existing NAS server, if one asynchronous replication session is already associated with it, only a synchronous replication session can be created for it. If one synchronous replication session is already associated with the NAS server, only an asynchronous replication session can be created for it.

    Note:  The asynchronous replication destination NAS server is selected as Used as backup only in Unisphere by default when synchronous replication is already enabled. When using CLI, it must be specified, otherwise, the asynchronous replication session creation will fail.

    When creating a new file system or when a file system does not have an associated replication session, you can configure one synchronous replication session and one asynchronous replication session for that file system. In the case of an existing file system, if one asynchronous replication session is already associated with it, only a synchronous replication session can be created for it. If one synchronous replication session is already associated with the file system, only an asynchronous replication session can be created for it.

    Note:  By default, the new file system replication session is created with the same attributes as the associated NAS server replication session.

    With asynchronous replication, internal checkpoint snapshots are routinely taken using the manual or automatic Recovery Point Objective (RPO) policy, and then replicated to the destination system. In addition, the internal checkpoint snapshots are synchronously replicated to the partner MetroSync system. This action ensures that a common-base snapshot will be available on the source and asynchronous and synchronous destination sites in this topology. After a MetroSync failover of synchronously replicated NAS servers and associated file systems, run a preserve operation on the new source synchronous NAS server replication session. This operation restores the asynchronous NAS server and file system replication sessions by using the replicated internal checkpoint snapshots on the new source system, as a common-base snapshot, without requiring a full synchronization.

    Note:   Internal asynchronous replication snapshots are refreshed on the source and asynchronous and synchronous destination sites on every asynchronous session sync operation. If the snapshot create or refresh operation failed on the synchronous replication destination site, the internal snapshot create or refresh operation on the synchronous replication destination site will be retried as part of the next asynchronous session sync operation. If snapshots cannot be created or refreshed on the synchronous replication destination site because the synchronous session is not Active or the connection is broken, the internal snapshot create or refresh operation on the synchronous replication destination site will be retried as part of the next asynchronous session sync operation after the synchronous replication session is restored.
    Concurrent operations compatibility

    A majority of the synchronous and asynchronous replication operations for coexisting synchronous and asynchronous replication sessions can be run concurrently with the following exceptions:

    • Create and delete - You cannot run a create operation for an asynchronous replication and a delete operation for a synchronous replication or the reverse concurrently. The operation will be rejected.
    • Failover or failback - Both of these operations are not supported on an asynchronous replication destination NAS server that is selected as Used as backup only.
    Preserve asynchronous replication sessions

    When the synchronous replication sessions of a NAS server and its file systems are failed over (planned or unplanned) or failed back, the associated asynchronous replication sessions can be switched manually from the production site. This switching operation on the synchronous replication sessions preserves the asynchronous replication sessions with the active production site.

    For example, source production site A has a synchronous replication of a NAS server and its file systems to destination site B and an asychronous replication of the same NAS server and its file systems on the source production site A to a third site C. When the synchronous replication is failed over from site A to site B, you can manually preserve the asynchronous replication sessions by running a Preserve asynchronous replication operation on the synchronous NAS server replication session on the new source production site B. This operation switches the asynchronous replication sessions from the old source production site A to the new source production site B.

    Note:  Only users with Administrator or Storage Administrator roles are allowed to perform this preserve operation.

    While performing the initial synchronization operation between the new source site and the asynchronous replication destination site, the new source storage system searches for internal snapshots on itself and the asynchronous replication destination site, determines whether there is a common base snapshot, and if so, replicates only a differential copy of the production data and not a full copy. If a common base snapshot is not found, a full data copy will be performed.

    When a NAS server synchronous replication session's production site is switched between the source site and the destination synchronous replication site (for example, through a failover or a failback operation), the source NAS server of the asynchronous replication sessions changes to replication destination mode, the data transfer of the source NAS server asynchronous replication sessions is stopped. The operational status for the related sessions in this case is hibernated. After a preserve operation, the asynchronous replication sessions between the destination synchronous replication site (new source) and the asynchronous replication site are established. Later, the original source site may become the production site of the synchronous replication sessions again. In this case, the NAS server asynchronous replication sessions between the old destination synchronous replication site and the destination asynchronous replication site will become hibernated.

    Restrictions and limitations

    The following restrictions and limitations relate to file asynchronous replication to another site for backup:

    • The asynchronous replication sessions can be preserved only to a resource on the synchronous replication production site.
    • The remote system connection towards the backup (asynchronous replication destination) site needs to be created in advance on the synchronous replication destination site for successful preserve operations.
    • The asynchronous replication session can be used for backup only, failover is not allowed on the asynchronous replication session unless the backup only property is removed manually and failover is executed from the asynchronous replication destination site. Such an operation will break the synchronous replication, therefore, you must ensure that either the synchronous replication session is deleted or both sites of the synchronous replication session are down and will not be recovered before removing the backup only property and failing over.
    • If the limit of an asynchronous replication session on the preservation site is reached, the restore will fail with an error message and no more sessions can be restored.
    • The preserve asynchronous replication sessions operation establishes the asynchronous replication sessions from the new synchronous production site to the backup site. After the preserve operation successfully completes, the new synchronous production site becomes the asynchronous replication source of the backup site. If the NAS server, file systems, and snapshots of the new synchronous production site are not the same as the old synchronous production site (for example, a snapshot is created or marked for asynchronous replication when a synchronous replication session is not in the Active state or is asynchronously replicated when a synchronous replication session is fractured) the new synchronous production site to the asynchronous replication backup site will continue based on the snapshots and information on the new synchronous production site. Some snapshots may not be replicated to the backup site. In this case, a warning will be generated when marking the snapshot for asynchronous replication when the synchronous replication session is not Active. Some snapshots may result in duplicate storage space.
    • Supported on Unity systems with OE versions 4.4 or later.
    • The preserve file system asynchronous replication sessions process can avoid the full copy process only when a system snapshot is available for a common baseline. If there are any snapshots on the destination file system when a full copy is initiated, the storage space will be increased (by 100% of the production file system size). If those old snapshots are deleted, the size will shrink back.
    • Asynchronous replication session state is always preserved as either Auto Sync Configured or Idle.
    • If a preserve asynchronous replication session operation is ongoing, it must be cancelled before a planned failover or failback synchronous session is executed.
    Remove an asynchronous replication session

    You can remove the asynchronous replication session between the source site and the asynchronous replication destination site. While performing this operation, the internal snapshots of the source and asynchronous destination sites that are used for the asynchronous replication session are removed. The internal snapshots of the synchronous destination site will be removed if the deletion happens when the synchronous replication session is Active.

    Note:  The internal snapshots used for asynchronous replication between the source site and the asynchronous replication site remain existing after asynchronous replication session removal in case a preserve operation needs to be performed.
    Remove a synchronous replication session

    You can remove the synchronous replication session between the source site and the synchronous replication destination site. While performing this operation, the internal snapshots of the synchronous destination site that are used for the asynchronous replication session are removed. The internal snapshots of the source and asynchronous destination sites that are used for the asynchronous replication session remain existing.

    Remove an internal file system snapshot manually

    You can remove a file system internal snapshot manually when no asynchronous replication session exists which uses that snapshot. If an asynchronous replication session does exist which uses that snapshot, the remove operation will fail.

    Note:  Internal snapshots on the source site and its asynchronous replication destination site are removed as part of the asynchronous replication session delete operation. If the asynchronous replication session still exists, the internal snapshots cannot be removed. Internal snapshots on the synchronous replication destination site can be removed either manually using the -force delete operation attribute or during a synchronous replication session delete operation.

    Cascade mode for third site asynchronous replication

    When one synchronous replication session is created for MetroSync for Unity, a session should not be created from the destination side and only one remote asynchronous session should be created from the source. When a storage object is created as the destination mode and one asynchronous session is created on it as the destination, that storage object can be used to create another asynchronous session when it acts as the source.

    The following figure and table show an example of cascade mode in a MetroSync configuration. Each lettered box represents a system running OE version 5.x. The thick solid lines between boxes represent asynchronous replication sessions and the thin solid line between boxes represents a synchronous replication session. A represents a production site. All of the replication sessions on this resource act as the source and are not in destination mode. B represents a synchronous replication destination remote site. C represents an asynchronous replication destination remote site in cascade mode. The resource on C acts as the source for a session on D and acts as the destination for another session from A. D represents an end remote site. The session on this resource acts as the destination and is in destination mode.

    Figure 3. Cascade mode for third site asynchronous replication
    Cascade_mode_for_third_site_asynchronous_replication
    Table 5. RPO for replication sessions in third site asynchronous replication topology
    Replication Session
    User Snapshot Replication
    RPO (in minutes)
    A to B (synchronous)
    Yes
    0
    A to C (asynchronous)
    Yes
    60
    C to D (asynchronous)
    No
    1440