Speed Matters. Using the Azure SharePoint Online Migration Pipeline

Speed Matters. Using the Azure SharePoint Online Migration Pipeline

By Jay Dave | June 03, 2015

What is the SharePoint Online Migration API? – A closer Technical Look

We’ve already covered some of the details on the new API that Microsoft announced at Ignite, which enables much faster migrations to SharePoint Online (SPO) within Office 365.

As Metalogix’ resident Architect for our Content Matrix SharePoint migration technologies, I’d like to provide a deeper dive on the technical nuances of how the new API works with Microsoft Azure (we’ll call it the Azure SPO Migration Pipeline) and how our own products use it to improve performance for SharePoint Online migrations.

The new API is designed for use by ISVs and consultants as it will require technical knowledge to develop the required migration package. However, the following description of the technical details will be of interest to both existing Content Matrix customers to learn more about this new capability and SharePoint Administrators looking to gain a better understanding of the new API.

Brief History

Metalogix has a long history of working closely with Microsoft on solving the various challenges for migrating to SharePoint, both on-premises and within the cloud. This relationship was what initiated our move from eCommerce migrations to working with the very first version of SharePoint through to being the first (and only) company that is pre-approved to install and run on the Office 365-dedicated platform solidifying our position as the leading vendor for enabling customers to migrate to Office 365.

However, even with all our years of SharePoint migration experience we quickly realised that moving large volumes of content at a speed that met customer expectations was going to be a challenge due to how multi-tenant SharePoint environments were operated by Microsoft.

Initially we were able to improve migration speeds by utilizing our unique SharePoint database adapter which allowed us to efficiently read from SharePoint content databases. We then saw further improvements by mounting these databases within Microsoft Azure Virtual Machines that could be physically located close to the SharePoint Online datacenters. This reduced network latency and allowed us to leverage performance and scalability gains of Azure. We also optimized our Client Side Object Model (CSOM) adapter for cloud migrations, which is used for writing to SharePoint Online. The combination of these factors resulted in a much improved migration performance, details of which can be found here, but we still believed that the speed limits could be pushed further.

Thanks to our special relationship we continued to work closely with Microsoft to look for ways to increase the migration speeds. We proposed several options for improving performance while still respecting the various safeguards that Microsoft has in place to protect SharePoint Online both from a security perspective but also to ensure that the multi-tenant environments themselves perform as expected by their end users. In the end Microsoft decided that the best option was to modify an existing content migration API that had been available for a number of years for SharePoint migrations – namely the Prime API – but to combine it with leveraging Azure as the benefits from that approach had already proven to improve migration speeds.  

From mid-2014 through to early 2015 we continued to work with Microsoft to develop and test the new API. Microsoft’s own America’s Cloud Services (ACS), the specialist organisation dedicated to complex and large scale SharePoint Online migrations, who had recently standardised on using Content Matrix for all their migrations joined the combined effort with some early adopter customers to fully put the new approach through its paces. ACS were also able to take performance to another level thanks not only to the new API and Azure SPO Migration Pipeline but also exclusive PowerShell scripting for Content Matrix to enable multiple migration operations running in parallel. Coincidentally, the flexibility and extensibility offered by Content Matrix through its PowerShell capabilities was one of the primary reasons for ACS selecting it as their standard for SharePoint Online migrations.

Benefits of the New API

  • Speed is obviously one major benefit of the new API and the Azure SPO Migration Pipeline. The pipeline process has yielded an order of magnitude increase in speed when uploading documents versus using the previous approaches. When this is combined with some of the unique elements of Metalogix Content Matrix, such as our database connection adapter, incredible migration speeds can be achieved. As is always the case with any SharePoint migration, on-premises or to the cloud, the exact speed achieved can vary based on a variety of factors. The type of content being migrated (more metadata and versions can increase the time taken to migrate as more operations are being performed to carry out additional steps), the cloud servers being used, the time taken to retrieve content from the source system, the maximum upload speed bandwidth or the size and number of migration jobs queued in the Azure SPO Migration Pipeline corresponding to a particular content database. 
  • No throttling - Throttling for CSOM based migrations was introduced by Microsoft to ensure that the experience for their SPO users was not adversely impacted by excessive CSOM actions and calls. Recognising this issue, the Content Matrix team implemented the ability to detect the Server Health Score, the trigger for throttling to be engaged, that was being reported by SharePoint Online. This allowed us to pause and restart migrations when a server’s health dipped below a certain threshold. This obviously impacted migration performance due to the stop – start nature that could occur although it should be said was actually preferable to the performance hit of being throttled.
  • Scalable Azure resources – Thanks to leveraging the scalability and power of Microsoft Azure, the infrastructure supporting the migration pipeline is better equipped to cope with demand (storage for uploading content can support large scale multi-TB migrations) to maintain migration performance. Azure has been designed to deliver a greater throughput than SharePoint Online and the pipeline will utilise the backend internal Microsoft network to help achieve this. 
  • Transparent to other Multi-tenant users – As the majority of “heavy lifting” and processing occurs on the back-end of the cloud infrastructure a typical user of SharePoint Online will see no impact from a migration taking place on the shared infrastructure. 

Migration Pipeline and Metalogix Content Matrix
speed matters

So How Does it work?

In practice, Metalogix takes care of all aspects of managing the new API, one of the benefits of using Content Matrix to carry out your migration, it is simply a new option within our user interface. All that you need to provide is the information about the Azure storage that you intend to use. Everything else is automatic, however let’s take a peek as to what is going on underneath.

Azure Storage plays a key part and is required as temporary storage by the Azure SPO Migration Pipeline. As it’s only required for the migration, once a migration is successfully completed all storage containers can be removed without affecting your target SPO. Shared Access Signature (SAS) keys are used to provide limited access (such as Read or List) to the pipeline without having to expose your primary or secondary Azure Storage account key. You can also limit the duration a SAS key will be valid for (1 day or 1 week for example). Shared access signatures provide a safe alternative to allow the pipeline to access your content and only that. The SAS keys are generated by the client as are the containers and queues. Lifetime of the containers and queues are the responsibility of the client, the pipeline itself will not remove them after migration.

The back end engine of the migration pipeline utilises a subset of the Microsoft.SharePoint.Deployment API (also previously known as the Prime API). As it’s using only a subset, it doesn’t support everything (see API Features below).  It’s used in this case to import contents into an empty fully provisioned (content types, columns) existing list or library.

(FYI, if you’ve ever used Central Admin, stsadm or the 2013 Export-SPWeb PowerShell command to backup or export a list, library or site then the bulk of the work is done by the Microsoft.SharePoint.Deployment API.)

The Migration Process Flow

  • Provision List or Library using CSOM
  • Create package [large part of the work for a consumer of the API is this part] (see below for details)
  • Upload to Azure
  • Call the new API method CreateMigrationJob (with SAS keys)
  • Job progress will be written as messages to the queue. When the job completes it will place the import log (and any other related error or warning log) in the Manifest container. 

When the API method CreateMigrationJob is called, a migration job will be placed on a queue. The pipeline will check this queue every 60 seconds and a bot (migration job processor) will start processing the job. Status for the migration will be written to the reporting queue and an import log (together with separate error and warning logs if those have occurred) will also be placed in the Manifest container upon completion.
speed matters

API Features

  • Supports uncompressed manifest packages only - reduces the overhead of compress/decompress and is easier to create and modify package.
  • Overwrite API – this means if you submit a file and resubmit the same file with changes then the import process will delete and replace the original (and all versions included for that file if specified in the current manifest package).
  • Does not support Active-Active scenario – the expectation is there that the target site is non-active for all users until the migration is complete.
  • To ensure your immutability of source data, the API will only accept a SAS key with Read and List access for the Content container.
  • All files uploaded into the containers in Azure must have a snapshot created (required to prevent unintended file modification of the source files and the pipeline will use always use the latest snapshot).
  • Uses the Azure Blob Security storage model in its entirety. No special treatment for containers used by the pipeline versus any other azure containers. Any encrypted content will be treated as opaque files in SPO (currently pipeline will not accept any encryption keys for content).
  • The import pipeline does not fire events as items are imported, any existing event handlers will not fire.
  • Supported
    • Authorship
    • Permissions, users and groups
    • Multiple file versions
      • Manifests can have references to multiple versions of a file, major and minor up to the limits imposed within SPO.
    • Preserves identifiers (item Ids)
    • Document Libraries
    • Custom Lists with and without attachments
    • Item metadata
  • Unsupported
    • .aspx files (no support for web parts and such)
    • Workflows
    • Events
    • Cases such as Document sets, InfoPath need testing – they may be supported as is (however xsn templates would require publishing in advance for a document library/list)
  • Coming soon
    • Managed Metadata item metadata
      • Currently the pipeline doesn’t not support setting Managed Metadata values
    • Version numbering where version numbers skip versions
      • E.g., 0.1, 0.6, 1.0, 1.5 will appear as that.

Migration Logs

The pipeline will place Import (and error, warning logs if applicable) in the Manifest Container (this is why a Write permission is required for that container only). The logs can also be found after a migration job has been completed and removed from the processing queue in SPO from accessing the “_catalogs/Maintenance Logs” location of the target site collection. The logs will be text format.

Creating the ‘package’

The ‘package’ comprises of content (binaries such as documents, images, attachments) and manifest xml files. The manifest is the large piece of the puzzle in all of this. It’s important to get it right otherwise submitted migrations will fail or have strange side effects and it’ll be very likely down to how things have been specified in the manifest (which is used by the new API to import content).

For example, supplying an incorrect GUID or not specifying a server relative path for an attribute value, or not referencing a required object in related xml in the manifest can cause havoc and lots of head scratching. The best way to overcome this is to understand the manifest and related xml files. Explore and play around with the uncompressed export of an existing list or library using stsadm or the Export-SPWeb cmdlet. See how the structure, related elements and attributes interplay.

Sample stsadm command:

stsadm -o export -url "<url to the list/library>" -filename "<location to save locally>"  -versions 4 –nofilecompression

Example:
stsadm -o export -url "http://jaydave-2010/SampleSite/DocLibWithVersions" -filename "C:\Backup\DocLibWithVersions"  -versions 4 -nofilecompression

Sample Export-SPWeb (2013 PowerShell) command:

Export-SPWeb -Identity "<url to site>" -ItemUrl "<server relative path to list/library>" -Path "<location to save locally>" -NoFileCompression -IncludeVersions All -IncludeUserSecurity

Export-SPWeb -Identity "http://jaydave-2013/sites/TeamSC" -ItemUrl "/sites/TeamSC/VersionDocLib" -Path "C:\Backup\VersionDocLib" -NoFileCompression -IncludeVersions All –IncludeUserSecurity

Required Manifest files

  • ExportSettings.xml
  • LookupListMap.xml
  • Manifest.xml
  • Requirements.xml
  • RootObjectMap.xml
  • SystemData.xml
  • UserGroup.xml
  • ViewFormsList.xml 

To get more information on each of the files, please see the following:

Keep things simple

If you’ve never used any of the export commands or are not familiar with the export structure then I’d recommend you start with simple scenario first – create new document library with one document, export that. Then add a folder and export that, add sub folder and a few documents in each, export and so on. Later on you can explore permissions and having items with different users and how the related xml files interplay.

SP 2013 - You can also use the Import-SPWeb PowerShell command on your on-premises SP 2013 to test the package you have created for the provisioned empty target document library. As the pipeline is a subset, it should help you give an indication to the validity of your package before submitting it to the pipeline.
However you’ll soon realize that there are a number of more complex operations that are required when moving more than a few simple files. To retain migration fidelity and ensure you have all the important aspects of your data will require much effort as you’ll need to create the correct expected Xml for the Manifest and related files.

Higher fidelity migration considerations

If your use cases are around migrating content from SharePoint or another content management system to SPO, you’ll likely encounter common aspects such as:

  • Item metadata
    • Setting the correct values for the appropriate SPObject element in the Manifest.
    • Attributes that should be set in a particular format. You may require to transform a value as it may require a different format for SPO than what you read directly from your source content management system.
  • Ensuring referenced values (such as GUIDS for Lookup Lists) are specified correctly in the Manifest
    • If you’re migrating folders into an existing SPO document library then you’ll need to consider what target parent folders to include in the Manifest (even though you aren’t migrating to those parent folders).
    • References to Lookup Lists in the same scope, ensuring the related attributes are set in the Manifest and related xml files.
    • You’ll only know if there’s an error once you’ll submit the migration package and wait for it to process, this can be extremely time consuming when generating entries in the Manifest for different types of lists and not just document libraries, such as custom lists and attachments.
  • Versions
    • Obtaining the all the versions for a document and then ensuring the Manifest contains the appropriate SPObject and type, referring to the appropriate version for that document and in the correct order, including all the metadata for a version.
  • Access control – Users, Permissions and Roles
    • Ensure the Manifest and UserGroup.xml contain all required users and are correctly mapped to 0365 Active Directory users for the respective SPO target.
      • Ensure item metadata references the user Id value rather than the actual user@domain format.
    • Ensuring the Manifest contains the correctly specified SPObject DeploymentRoles and DeploymentRoleAssignments elements.
    • Ensuring List or Libraries and Items have the correct entries in the Manifest.
  • Handling embedded Properties in Office documents
    • Ensuring these properties do not override the item metadata during the import into SPO
  • Pre and post updates using CSOM on a library or list operations not supported by the pipeline.
    • Provisioning the fields, content types in a list of library in advance
    • Post updates of ensuring an workflow associations are migrated and applied

Manual manifest testing

If you may submit a migration job and it fails and the analysis of the logs shows something unspecified raised an exception during the import process at times it may be quicker to tweaking the manifest xml manually to fix minor issues. You can then use Azure Storage Explorer to upload the Manifest files or replace them in your respective container and then submit the job again to the pipeline.
 

New Microsoft CSOM API Migration Methods

The following methods are available in the latest SharePoint Online Client Side Object Model. Here’s a link to the current version as of the time of this post:

https://www.nuget.org/packages/Microsoft.SharePointOnline.CSOM/16.1.3912.1204

Create Migration Job
https://msdn.microsoft.com/EN-US/library/office/microsoft.sharepoint.client.site.createmigrationjob.aspx

Delete Migration Job
https://msdn.microsoft.com/EN-US/library/office/microsoft.sharepoint.client.site.deletemigrationjob.aspx

Get Migration Job Status
https://msdn.microsoft.com/EN-US/library/office/microsoft.sharepoint.client.site.getmigrationjobstatus.aspx

There’s also a set of new PowerShell commands that help import file shares into SPO:
https://technet.microsoft.com/en-us/library/mt143608.aspx

Azure Storage

Use the latest Azure Storage API to create containers and upload files. Here’s a link to the current version as of the time of this post:

https://www.nuget.org/packages/WindowsAzure.Storage/4.3.0

Tips for Large Migrations

  • Target many BOTs as possible (this will happen if jobs are going to different content databases)
  • Make sure all the sites migrating are not in the same content database – if they are, then it is highly recommended to contact Microsoft teams to ensure site collections are created within different content databases. If you’re migrating TB’s of data, before you provision your target structure in SPO, ensure that you’re site collections are split into several content databases
  • Utilize the services of Microsoft American Cloud Services (ACS), or Microsoft Consulting Services (MCS) and of course use Metalogix Content Matrix.

Additional related reading
Migration to SharePoint Online Best Practices and New API Investments [video]
http://channel9.msdn.com/Events/Ignite/2015/BRK3153

Content migration schemas
https://msdn.microsoft.com/en-us/library/bb249989(v=office.15).aspx

Deployment Package Format (MS-PRIMEPF)
https://msdn.microsoft.com/en-us/library/cc313163(v=office.12).aspx

Microsoft SharePoint Deployment (this is ultimately what the back end will use)
https://msdn.microsoft.com/EN-US/library/office/microsoft.sharepoint.deployment(v=office.14).aspx

stsadm (2007, 2010)
https://technet.microsoft.com/en-us/library/cc261956%28v=office.12%29.aspx

2013 SharePoint Export-SPWeb PowerShell command (to export lists and libraries – to help understand the manifest)
https://technet.microsoft.com/en-us/library/ff607895.aspx

SAS Keys
http://azure.microsoft.com/en-in/documentation/articles/storage-dotnet-shared-access-signature-part-1/

Azure Storage API
https://www.nuget.org/packages/WindowsAzure.Storage/4.3.0


Written By: Jay Dave

Leave a Comment

Add new comment