Revisiting SharePoint Remote BLOB Storage

Revisiting SharePoint Remote BLOB Storage

By Trevor Hellebuyck | October 04, 2011

Since the release of SharePoint 2010 there has been a lot of debate over the value and application of Remote BLOB Storage (RBS) with SharePoint 2010.  That debate has been reinvigorated with the updated Plan for Software Boundaries guidance that was published with Service Pack 1 for SharePoint 2010.  Frankly I am disturbed by the volume of blog posts that include inaccurate information about RBS and the updated Software Boundaries and Limits guidance.  It seems that some “experts” influence far outweighs their competence on the subject of RBS.  I have personally seen the confusion created in the ecosystem manifest itself in conversations I have with customers on a daily and weekly basis.  For this reason I think it is pertinent to revisit the value of BLOB externalization, correct common misconceptions, and discuss how the updated guidance from Microsoft may impact your consideration to implement SharePoint with a remote BLOB storage solution.

A Brief History of BLOB Externalization

First a brief history lesson before we tackle RBS.  BLOB externalization is not a new concept.  In fact most legacy ECM systems store unstructured data (files) separate from metadata stored within a relational database.  Microsoft originally developed SharePoint this way using the same Web Storage System that Exchange Server uses.  With the release of WSS 2.0/SharePoint 2003, Microsoft moved to storing all data (structured and unstructured) within SQL Server databases.  Many vendors attempted to address database growth through the use of archive tools that will pull BLOBs from the database in a post-processing batch job. While this worked to solve database bloat problems, it created compatibility issues with out-of-the-box and third-party SharePoint solutions which had to be “aware” of the archive product in use and understand how to interpret the stub left behind.  Definitely not the most elegant solution and it certainly didn’t address the core issue.  It wasn’t until the release of Service Pack 1 with SharePoint 2007/WSS 3.0 that Microsoft introduced support for BLOB externalization via the EBS interface.   Subsequently Microsoft introduced support for Remote BLOB Storage (RBS) along with continued support for EBS* with SharePoint 2010.

BLOB externalization isn't about being able to leverage commodity disk but rather being able to leverage the "optimal" disk based on the content being managed/stored.  The goal is to make sure that patient records, invoices, purchase orders, lunch menus, and vacation pictures land on the most optimal storage device. For obvious reasons, not all content is created equal nor should it be treated as such.  Subsequently there are scenarios that SharePoint simply cannot support out of the box or with the RBS FileStream provider.  For example, take SEC17A-4 (Electronic Storage of Broker-Dealer Records) requirements for client/customer records.  Once being declared a record, any client-related document (an IRA account opening document for example) has specific requirements for storage (they must be immutable and unalterable).  Third-party RBS products such as Metalogix StoragePoint facilitate this scenario through support of WORM (Write Once, Read Many) and CAS (Content Addressable Storage) devices.

In the process of optimizing the storage environment for SharePoint, BLOB externalization accomplishes some critical goals.  It is no secret that relational databases (including SQL Server) are not the ideal place to store large pieces of unstructured data.  No, this isn’t a dig at SQL Server but rather stating the obvious fact that a system optimized for the storage of highly relational, fine-grained transactional data is not an ideal place to store large pieces of unstructured data.

The problem with SQL Server is that the performance cost related to storing BLOBs in the database is expensive.  If you consider the RPC call between the web server and the database that contains a large payload (metadata plus the BLOB) and the IO, processor, and memory requirements for storing the BLOB, you have a very expensive process.  Yes, it is true that Microsoft has optimized BLOB performance in the subsequent releases of SQL Server, but it is still more optimal to store BLOBs outside of the database when you consider a typical SharePoint farm under load or the process for executing a bulk operation such as a full crawl.  The updated guidance from Microsoft would certainly support this assertion.

Microsoft itself has documented this fact in many of its own publications and even alluded to the initial value of BLOB externalization as being a way to improve the performance of your SharePoint environment.  Additionally SQL Server is very rigid in terms of the type of storage it can leverage and methods in which you back up the environment.  This brings me to my next point.  What was the intent of providing BLOB externalization interfaces within the SharePoint product in the first place?

The original sizing guidelines/limitation for content databases with WSS 3.0/SharePoint 2007 was 100 GB (collaboration sites).  With SharePoint 2010 Microsoft increased the size limit to 200 GB and changed the limit yet again with Service Pack 1 for SharePoint 2010 (more on this later).   These limits proved to be problematic for many looking to implement SharePoint pervasively throughout their organization.  Not only is database growth a problem, there are challenges with segmentation of content to work around database size restrictions, along with SQL Server being a less than optimal place to storage BLOBs.  Additionally backup/restore is challenging as SharePoint environments continued to grow both in size and criticality.

Microsoft originally positioned BLOB externalization as a way to reduce the size of your SharePoint content database.  While there is some debate on this topic, it is generally agreed upon in the SharePoint community that the content database size limitations did NOT include externalized BLOBs (this changes with Service Pack 1 for SharePoint 2010).  When the StoragePoint team released StoragePoint 2.0, we spent quite a bit of time creating and shaping the messaging for the product that included the following benefits and holds true as the basis for BLOB externalization:

Reduce the size of your SharePoint content databases by externalizing BLOBs.  Roughly 90-95% of your content database consists of BLOBs (this varies with auditing enabled).  By externalizing the BLOBs we can reduce the size and number of databases required to support your environment.

  • Optimize performance by freeing SQL Server from the burden of managing unstructured pieces of data
  • Support a variety of storage platforms based on business requirements (storage costs, compliance, performance, etc.) including SAN, NAS, and Cloud storage.
  • Create new opportunities to replicate and backup SharePoint content in a more efficient manor

If you consider that roughly 90-95% of a content database is comprised of BLOBs then you stand to have significant reduction in the database size and an increase in the size of the content you can manage per content database.  One of the metrics that we often referred to with StoragePoint was the management of 2 TB of content.  If you reduce a 2 TB content database by 95%, you end up with a 102.4 GB content database and 1945.6 GB (1.9TB) of externalized BLOBs.  This would be well within the database size limits for SharePoint 2010 and at the high end of the limit for SharePoint 2007.  This sounds familiar doesn’t it?  I think I have seen something like this in the SP1 limits for SharePoint 2010 … let’s take a look.

Service Pack 1 Consideration

Prior to the release of Service Pack 1 for SharePoint 2010, the content database did not include externalized BLOBs (yes, this is debatable, but I can tell you that this is generally accepted based on the lack of clarity in the original Plan for Software Boundaries documentation).  Microsoft revised this guidance along with the database size limits.  For SharePoint 2010, the 200 GB size limit is still in effect with a new option to expand a “collaboration” site to 4 TB.  Now for the fun part … in order to expand a content database beyond the 200 GB limit, you need an optimized SQL Server disk subsystem.  Specifically Microsoft recommends that you have 2 IOPS (inputs/outputs per second) per GB of storage.   Note that I am generalizing a bit on the SP1 guidelines and limits so you can read them for yourself here.

While the databas"size" limitation includes the externalized BLOBs in the calculations, externalized BLOBs are not included in the IOPS requirement.  In order to manage a 4 TB database, you must have a disk sub system that supports 2 IOPS per GB.  If you are not familiar with this concept, I can tell you that this is an expensive disk configuration (more on this below).  With StoragePoint in place you can have a 4 TB "content database" that consists of approximately 200 GB in the SQL Database and 3.8 TB of externalized BLOBs.  All without an expensive disk requirement.  Sound familiar?  This is the same messaging that was advertised with the original release of StoragePoint 2.0 with SharePoint 2007/WSS 3.0 SP1.  If you do believe that the IOPS requirement includes the externalized BLOBs, then you have to discount Microsoft's support for NAS storage (via iSCSI) with the RBS FileStream provider.  Most NAS devices were not intended to support such a high level of IOPS.  The new guidance is simply reaffirming what the StoragePoint team has asserted all along.  Using a 95% reduction in the database (a typical database is comprised of 95% BLOBs), you would end up with a 200 GB content database (within Microsoft's original guidelines).  If you decide to keep the BLOBs in the database, then you need to have lots of expensive disk to maintain performance of your environment.

Let’s take a practical example following Microsoft’s database size limits and guidelines for disk performance and determine what a reasonable disk subsystem might look like.  Remember that Microsoft requires 0.25 IOPS per GB for content databases over 200 GB (2 IOPS per GB is highly recommended for optimal performance).  Note that in order to keep things brief and to the point, I am using some rough estimates to calculate IOPS.  Disk performance is impacted by hard disk specs, RAID level, and controllers.

IOPS = 1/(Average Latency in ms + average seek time in ms)

The following tables illustrate the number of disks required to achieve both 0.25 IOPS per GB (minimum requirement) and 2 IOPS per GB (recommended).  Note that for this example we will assume that the IOPS requirement is for data beyond 200 GB leaving us with 3.8TB of data that requires optimal disk configuration (minimum IOPS = 972; recommended IOPS = 7792).  Note the following assumptions used when calculating IOPS in the tables below.

  1. For each disk type IOPS estimates were used.  IOPS will vary based on disk type and manufacturer
  2. RAID 5 and RAID 10 disk configurations were used as these tend to be the most common configurations for database servers (RAID 10 being the preferred configuration)
  3. The IOPS calculations make the assumption that 0.25 IOPS/GB and 2 IOPS/GB is required for databases above 200 GB.  The initial 200 GB of data is not included in the minimum and recommended IOPS calculations.  Additional disks would be require as including the 200 GB in the calculations would require an additional 50 and 400 IOPS respectively.  
  4. There is an IOPS penalty that varies based on the RAID configuration.  For RAID 10, the IOPS penalty is calculated at 0.8 and for RAID 5, the IOPS penalty is calculated at 0.57. 

Disk Configuration Sample for Minimum IOPS

Drive Type

IOPS per Disk

RAID Level

Disk Capacity (GB)

# Disks

Usable Capacity (GB)

Max IOPS/span>

7200 RPM SATA

90

RAID 10

1024

14

7168

1008

10000 RPM SATA

130

RAID 10

1024

10

5120

1040

10000 RPM SAS

140

RAID 10

1024

10

5120

1120

15000 RPM SAS

180

RAID 10

1024

8

4096

1152

7200 RPM SATA

90

RAID 5

512

20

9216

1026

10000 RPM SATA

130

RAID 5

512

14

6144

1037.4

10000 RPM SAS

140

RAID 5

512

14

6144

1117.2

15000 RPM SAS

180

RAID 5

512

10

4096

1026

 

Disk Configuration Sample for Recommended IOPS

Drive Type

IOPS per Disk

RAID Level

Disk Capacity (GB)

# Disks

Usable Capacity (GB)

Min IOPS

Max IOPS

7200 RPM SATA

90

RAID 10

1024

110

56320

1008

7920

10000 RPM SATA

130

RAID 10

1024

76

38912

1040

7904

10000 RPM SAS

140

RAID 10

1024

70

35840

1120

7840

15000 RPM SAS

180

RAID 10

1024

56

28672

1152

8064

7200 RPM SATA

90

RAID 5

512

110

28160

1026

5643

10000 RPM SATA

130

RAID 5

512

76

19456

1037.4

5631.6

10000 RPM SAS

140

RAID 5

512

70

17920

1117.2

5586

15000 RPM SAS

180

RAID 5

512

56

14336

1026

5745.6

As you can see the IOPS requirements can require significant numbers of disks to support both the minimum and recommended IOPS requirement for a single 4 TB content database.  This results in an expensive disk sub system that is often overprovisioned to meet IOPS requirements when keeping BLOBs in SQL Server databases. When you begin to consider replication of environments for disaster recovery and nonproduction scenarios (i.e. moving production data into a nonproduction environment for testing), organizations will experience a 2-5X multiplier on the disk subsystem required to support SQL Server.  Obviously this is not the ideal scenario for most organizations deploying SharePoint on any reasonable scale.  RBS and products like Metalogix StoragePoint allow organizations to store content on the appropriate storage without the need to meet an expensive IOPS requirement.

Why Not Just Use the RBS FileStream Provider?

Somehow the RBS FileStream provider has evolved into a solution that some would actually consider for a medium or large scale SharePoint environment.  I think folks forget why this provider was created in the first place. WSS 3.0 with the WIDE (Windows Integrated Database) option does not have a database size limit. In theory, and in practice, organizations can and have stuffed large volumes of content into this "at no additional charge" product.  With the release of SharePoint Foundation 2010 and SQL Server 2008 Express edition, Microsoft introduced database instance limits.  SQL Server 2008 R2 Express Edition has a 10 GB instance limitation ... wait for it ... now you see the problem.  How can a customer upgrade without buying SQL Server licenses?  Enter the RBS FileStream provider.

The problem with the RBS FileStream Provider is that it lacks basic features required to call it an enterprise solution.  These include a lack of user interface, lack of support for "remote" storage, and lack of a multithreaded garbage collection process (this issue plagues many StoragePoint competitors as they opt to use the out-of-the-box garbage collector with RBS).  But more importantly it fails to address a very important challenge.  RBS FileStream does not bypass SQL Server for the processing of BLOBs.  RBS FileStream pulls the BLOB out of the initial RPC call and then redirects it right back to SQL Server using the FileStream column type.  Again, for obvious reasons this is not an efficient process.  I am not saying that the RBS FileStream provider is not a viable solution, but organizations considering this option should proceed with caution. Backing out of the RBS FileStream provider once you have amassed large volumes of content can prove cumbersome and time consuming.

Backup and Restore Considerations

Backup/restore and disaster recovery can be a complex topic and, for this reason, I am not going to explore this in great detail in this post. Any RBS solution for SharePoint, including StoragePoint, will change the process for backing up and restoring SharePoint environments. What’s lost on most people is that this is not necessarily a negative aspect of RBS. Often the change is very positive and provides new ways for backing up SharePoint environments that weren’t previously possible.

Before we explore backup/restore processes, it is important to first understand the anatomy of a BLOB when it is stored outside of SharePoint content databases. Externalized BLOBs are immutable, which means they will never change once they are written out to external storage. There is a one-to-one relationship between a BLOB and a given version of a file/document in SharePoint. This means that SharePoint will only create and delete BLOBs (StoragePoint actually deletes them as part of a garbage collection process). It may not be immediately apparent but this is actually a good thing. Traditionally you would backup SharePoint content databases using a simple or full recovery model. This means that you are taking full backups on a regular basis that contain objects that will never, ever change. This is less than efficient. By separating BLOBs from the database you can now backup (or replicate) a BLOB one time rather than capturing it in multiple backups. This approach reduces backup storage costs and provides an opportunity for DR scenarios (warm/hot failover) possible.

In general, the backup process involves backing up the content database followed by the external BLOB store(s). A farm-level restore would involve restoring your BLOB store followed by your content database(s). In many cases it isn’t necessary to back up the external BLOB store as there are ways to replicate it to multiple locations. Item-level restores tend to be to area of biggest concern when using an RBS solution like StoragePoint. Fortunately StoragePoint has some built-in features to make item-level restore feasible. StoragePoint includes a feature called “Orphaned BLOB Retention Policies” that allows for the retention of BLOBs for which the corresponding database item has been deleted. These retention policies are used in conjunction with item-level restore tools to guarantee that item-level restore is available for a definable period of time.

Conclusion 

RBS is clearly a viable option for organizations using SharePoint where the environment will grow consistently or even exponentially over a period of time.  Microsoft’s updated guidelines and database size limits are a confirmation of sorts for the opportunity that RBS presents for SharePoint deployments.  If you are deploying SharePoint in any capacity, you should consider RBS as an option for optimizing the storage, for both active and archive content for your SharePoint environment.


Trevor is a recognized SharePoint innovator and the principal architect of StoragePoint, the ground-breaking storage optimization software for SharePoint. Trevor joined Metalogix in 2010 with the acquisition of BlueThread Technologies. He was instrumental in the launch and growth of StoragePoint while serving as Chief Operating Officer (COO) of BlueThread Technologies, a company focused on developing applications for Microsoft products and technologies. StoragePoint has become the #1 Remote BLOB Storage (RBS) solution in the market. Prior to BlueThread, Trevor led technology teams in Enterprise Content Management (ECM) and Enterprise Application Integration at NuSoft Solutions, acquired by RCM Technologies in 2008. 

Written By: Trevor Hellebuyck

Leave a Comment

Add new comment