Stretched farms and SharePoint 2013 : what I don’t understand

One of my customer (a huge company) is using MOSS 2007 in a stretched farm mode configuration (2 Data Centers); he recently decided to move to SharePoint 2013. Given the current SLA, the stretched farm configuration provides a good High Availability solution and the current Disaster recovery option is ok.

Since the SLA is the same, the first option to consider was to stick to the stretched farm configuration, which is a quite convenient architecture, at least on the high availability point of view ; the servers in the different DC can be used all the time and not only in case of disaster discovery –> this is less expensive (in term of servers & SharePoint license) than having 2 farms (1 in each DC).

But let’s put it like this : our customers love the stretched farm.

The problem is the requirement for the stretched farm is very strong.

Initially when SharePoint 2013 shipped (in January 2013), the stretched farm architecture was not supported anymore.

Then in May 2013 Microsoft changed his mind and started supporting stretched farm, but with very strong requirements :

in the Technet we can read this

For a stretched farm architecture to work as a supported high-availability solution, the following prerequisites must be met:

  • There is a highly consistent intra-farm latency of <1ms one way, 99.9% of the time over a period of ten minutes. (Intra-farm latency is commonly defined as the latency between the front-end web servers and the database servers.)

  • The bandwidth speed must be at least 1 gigabit per second.”

If you want to check it on your farm, use the great script developed by Eric Strachan (Microsoft).

Problem number 1 : not realistic

This is almost impossible to achieve, specially in a virtualized environments, believe me I’ve checked it with several customers; ok in labs we can, but rarely in the real world; the REAL WORLD, I really mean it : this means PRODUCTION environments and CUSTOMERS.

I’m even not sure that SharePoint farms in Office 365 meet this requirements…

Problem number 2 : and within the same DC ?

We don’t really understand this : why is this important when we have 2 Data centers ? what matters is the latency between ANY web server and the SQL servers, right ? so why is it now required in the stretched farm scenario (between 2 DC) and not within the same DC ? latency is latency…

Does that mean that single Data center SharePoint 2013 farms that don’t meet the latency requirements are not supported ?

Does this means that if we don’t meet the latency requirement, moving to a 2 farms architecture (1 in each DC) won’t solve our problem because the requirement is the same within the same DC… ???

Problem number 3 : OWA

There is something else : the Technet Documentation clearly specifies that OWA is not supported in stretched mode :

Stick to one data center. Servers in an Office Web Apps Server farm must be in the same data center. Don’t distribute them geographically. Generally you need only one farm, unless you have security needs that require an isolated network that has its own Office Web Apps Server farm

Sooo ? As far as I know, if you don’t meet this requirement, you’ll need several farms in several Data centers (this is more expensive) and you will have to rely on some synchronizations mechanism like log shipping, mirroring or our favorite technology : SQL Server Always On Availability groups which will make your Database Administrator ‘s life more confortable (we have many databases to synchronize in SharePoint 2013, so move then to the same availability group and synch the group as a whole).

Here is the lab that my partner Isabelle Van Campenhoudt and I successfully completed last year. We still have to test it in Azure, but last year Azure didn’t support the availability group listeners, now it does.

And to be honest I like the Sharepoint 2013 documentation that specifies (for Always On Availability groups) : “Replicas can be on different subnets as long as latency does not cause performance issues.”

 

So my (customer’s) message to Microsoft is : please clarify the situation and update the Technet documentation. SharePoint 2013 is a great product, we love it and some of our customers are not yet ready for Office 365. (But the way, my customers want a better SLA than what Office 365 provides, which is pretty much ok for most customers, believe me).

Message to my customers :

  1. NO, sharePoint is NOT dead, but its future is in the cloud : Office 365
  2. Microsoft will provide new versions of SharePoint On Premise.
  3. Office 365 is a nice product that you will use eventually  (and yes your policy will change).
Advertisements

5 responses to “Stretched farms and SharePoint 2013 : what I don’t understand

  1. Hi Serge,
    Thanks for this post… And I guess who this custoemr can be :).
    However I have to disagree with you and Customer on 2 things:
    1. Yes it IS possible to meet the requirement for stretched farm. It is a question of mean and of priority you wish to put on SharePoint. Note: I have implemented other MS distributed apps with the same requirements in the past and helped by skilled network guys, it was not a problem at all. The problem with larger SharePoint on-prem Customer is that their organization is too divided to take the responsability of the final result
    2. I’ve always considered the strecthed farm as a mean to achieve HA but NOT DRP because it will NOT protect you against some kind of disaster

    I’ll be happy to discuss this with you at next SPS in Belgium 🙂

    Marc L.

  2. I’m currently working on the same problem. I simply don’t buy the 1msec requirement. It seemed to suddenly appear in 2012 when stretched farms leveraging SQL mirroring for HA were gaining popularity. A TAM I knew stated that the SharePoint group pulled it out of thin air to discourage geographically separated stretched farms.
    I’ve been researching it for six months, and all I ever see is the broken record “1ms latency between servers.” The problem is that, as you state, I don’t think most folks really have that. When I was first building SP2010 farms I know for a fact we tried to keep latency between servers under 100msec.
    So I believe this is a case that some engineer pulled the number out of thin air and it became law. After that, nobody ever considered actually doing testing on it. Whenever I try to challenge the 1msec, the reaction I get is almost religious in nature – it’s heresy to challenge the Microsoft Doctrine.

    • Actually, having latency greater than 1ms increases the risk of corruption in the configuration database. Alternatives: be ready to rebuild your Farm, or choose 2 farms synced via always on ag async lode, log ship. or mirroring. Carefully Read the search and use profile DR options recently published in technet 😉

      • “having latency greater than 1ms increases the risk of corruption in the configuration database”

        I’ve been working with SharePoint for 11 years and this is the first time I’ve ever heard this. Also, “increases the risk” is not a valid answer, since it suggests there is a risk in the first place, and it increases as latency grows. Are you truly suggesting that at 1msec latency the possibility of corruption just skyrockets? Or does it “increase” 1%?

        As a side note, if what you are saying is true (and I’d like to see a cite), then it introduces a concern I’ve never had about SharePoint – if the config database is truly that sensitive, I may have to reevaluate my recommendation of SharePoint at all.

        A final note: Azure doesn’t provide an SLA for latency between virtual machines. So it sounds like putting SharePoint on Azure isn’t officially supported.

  3. @philo : regarding the 1msec latency and db corruption risks; yes the MS product support AND members of the PG. Basically the problem is that SP does not follow the 3 tier architecture prnciples , but is a 2 tier application and there is no centralized service layer providing transactions between the web front ends and the DB; and at some point garbage can be written in the db…->logical corruption.
    What I don’t know is the probability.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s