Monday, August 17, 2015

Project-level Best Practices for delivering a performant Add-in based application

Indeed, all of the below are open doors!!
But yet in practice often either overlooked, or on purpose but due wrong reasoning (e.g. save time now in the project) bypassed in development project(s) to deliver a SharePoint Add-in based application:
  1. Performance and Capacity Management must be applied as an integral subject during the application development project
  2. In the requirements, agree with the application owner on the performance aspects: page download times considered acceptable, applicaton parallel usage
  3. Include performance Best Practices in the development guidelines. And make sure that all involved developers know of the guidelines, and that they apply them in their individual developed Add-Ins
  4. Do not only develop + test SharePoint Add-ins ('Apps') in isolement; also conduct integration test with multiple Add-ins on a page as the user is likely to use them, and monitor the page payload
  5. Thorough intake of any external Add-ins before before purchase; intake on architecture, functionality, capacity management and maintenance aspects
  6. Structural monitor and proof the application performance during the project, to detect in early stage when something is introduced with a severe performance decrease impact. Load testing is a good means to implement this in the project, for performance quality assurance

Some architectural and technical tips

  • Cache where appropriate (but be aware that caching costs resources itself)
  • Reduce the number of network roundtrips from client to application server - batch requests
  • Retrieve resources that are used in multiple Add-ins from a shared location - e.g. root site of the HostWeb, or a CDN (external or internal)
  • Reduce the impact of (NTLM) Authentication by retrieving non-authorized resources from anonymous accessible location
  • Utilize the sprites-concept for images
  • And the same for custom javascripts; for maintenance it is good to separate responsibilities in different libraries, but for performance it is better to collapse in a single resource file
  • Apply minimalization in resource files: javascript and CSS resource files
  • Apply lazy loading where appropriate (e.g. avoid processing and retrieval impact for Add-in functionality that is initial non-visible, and/or only rare used; in favor of delayed execution if and only if the user intends to use the Add-in)

Thursday, August 6, 2015

Load testing SharePoint Add-in (former App) Model

Validate healthy application performance behaviour

Essential for any Enterprise Application is that it can performant and scalable handle the varying usage load by the users. Nothing as embarrasing as a new application that soon after Go-Live, breaks by the enthiousastic usage of the users. To prevent such, you must build trust in the scalability of the application, and establish before Go-Live that the application – application software + system infra - can handle the expected load. Introduce load testing.
This also holds for modern SharePoint application that is composed with the Add-in model, former SharePoint App-model. But loadtesting of AddIn model does bring some extra peculiarities to loadtesting. I enumerated below the ones I encountered.
And note: our loadtesting proofed both valuable and successful: initial the loadtest revealed some performance and scalability problems. We then made some essential changes in the application code (in particular in the applied custom Add-ins / Apps), until we achieved our usage load target goal. And at the crucial moment of Go-Live, the application did not give a blinch, and perfectly handled the usage load of > 14.000 users.

Application Performance health factors

2 health factors monitored:
  1. Responsiveness of the application for the user, measured as Page Download Time
  2. Scalability of the SharePoint infra, measured as CPU, Memory and I/O utilization on the servers

Application Performance validation approach

  1. Identify target goals for application utilization
  2. "Green zone"
  3. Proof the health factors at the target-utilization goals via load testing, to simulate the real usage
  4. Identify the ‘breaking’ point via increased load/stress testing
  5. "Red zone" - performance issues monitored
  6. Determine the rootcause of the issue; this can be non-optimal code, insufficient infra parts (CPU, memory, network throughput, database IOPS)
  7. Fix the issue(s)
  8. Repeat the validation, at step 2

Loadtest execution

loadtest preparation

  1. Identify the usage/application scenarios you will use to build trust. You should select scenarios for which you expect these will be used during typical usage. An heavy transaction that in the normal operation will only be rare executed, will have a neglectable effect on the application load.
  2. Establish the target load. This is the application load for average usage. For web applications, this is typically stated in ‘Page Visits per Second’. Note that this is different from Requests per Second / RPS. In nowadays modern apps, a single page visit encompasses multiple http requests: for the page itself, dependent resources as javascript and css, and javascript calls to execute service calls for data retrieval and application functions.
    The determination/specification of the concrete target value is a challenge on itself. One easily is tempted to overstress the target value - we have 'X' users, so the parallel application usage will be 'X * Y'... However, in reality those 'X' users do not continuously all hit the application: they log on at different times, stay on pages, use other applications, go to the coffee machine, ... In our setup we identified the target value twofold:
    1. Fact: as we were introducing a renewed intranet, we could reuse the application usage statistics of the current intranet;
    2. Prediction: determine the target value via Microsoft (Bill Baer) Capacity Management Formula, an unofficial best practice recommendation
    And in our situation, the 2 values determined via the different paths delivered about the same target value, which confirmed us that we determined a realistic value.
  3. Establish the heavy load: this is abnormal but still foreseenable application usage, in special circumstances.
  4. Determine how-to build trust: manual load testing, custom test software, or utilize a load test tool – e.g. HP LoadRunner, Visual Studio LoadTest.
  5. Get sufficient test accounts to simulate different users. This is also required to prevent cache effect during load test execution. E.g. continuous retrieving user profile values of the same user.
  6. Prepare the test context for the test accounts. E.g. if the application makes use of SharePoint user profile, then the user profile must be provisioned for the test accounts to ensure reasonable load behavior.

Specialities with setting up testscripts for Add-ins / Apps

  1. The load test scenario must join in App authentication flow. In essence, this means that SPAppToken value must be set as FORM POST parameter in submit request to appredirect.aspx. The value is runtime determined in the App launcher, and returned in the initial AppRedirect.aspx response.
    In the Visual Studio webtest recording, the reference is made to this hidden field in the response.
    We encountered that the SPAppToken value is not successful runtime retrieved. This can in some circumstances be corrected by monitoring the traffic via Fiddler, and set SPAppToken to a fixed value that you get from the Fiddler trace.
  2. FormDigest value returned in JSON response from contextinfo call instead of hidden FORM parameter in response body.
    Resolution is augment the Visual Studio loadtest: Add a Text Extraction Rule to extract the value from the /_api/contextinfo JSON response.
  3. Default, Visual Studio LoadTest execution does not mimic browser-cache, resulting that each dependent resource is requested over and over. You can change/fix this by configuring the loadtest script to ‘parseDependentRequests = false’.
  4. Visual Studio LoadTest does not include the execution of javascript in the browser. If required, the activity of the javascript code must be simulated in the test scripts.
  5. With multiple provider-hosted Apps in the load test scenario, the Visual Studio loadtest scenario can make error in the runtime construction of the load test recording and assign a wrong {app_?} value. In such case, you must manually add a '<ContextParameter Name="AppId_1" Value="<APP domain value>" />', and correct the relevant Requests in the script to send request to the correct app-domain:
  6. Visual Studio LoadTest recording misses to set header variable ‘Origin’, which hinders CORS protocol handling.
  7. You can easily overwhelm the usage load by setting the ‘concurrent user’ configuration value. The use of this configuration parameter is misleading: it does not really simulate actual users. It merely sets the threads in the loadtest execution from which to continuously execute the webtest(s) in the loadtest scenario. Per thread, after finishing the webtest, the execution halts for the thinktime value; and then repeats. If you set the thinktime to zero – which is what Microsoft advices on Technet, "Don't use think times…" -, the effect is that continuously requests are fired against your application. The load on the application is then much higher as the value configured in ‘concurrent users’.
  8. Visual Studio loadagent itself can become the limiting factor. If you want to simulate a larger concurrent usage, this results in equal large set of threads in the Visual Studio execution, all of which busy to execute and monitor a webtest instance. The cpu on the load agent grows to 100%, and the load does not linear increase with the number of ‘concurrent users’ aka threads.

Load test monitoring

  • CPU, memory and disk IO per server: WFE, SharePoint backend, AppHost
  • State of the IIS queue on WFE and AppHost
  • Page download times
  • Slowest pages

Interpretation of load test output

  1. The (average) Page Response Time is the summation of the download time for that request, AND augmented with including the download times of all dependent requests beneath that main request.
  2. The RPS / Requests per Seconds output is not fit to determine whether the application + infrastructure can handle the foreseen application usage. The application usage translates in Page Visits per Second, in which each page visit typically encompasses multiple (http) requests: the .aspx request, requests for javascript and css resources. In the App execution model, each App launch on the page is in effect an own page visit. As result, the RPS factor is of little use. You must measure the ‘Page Visits per Second’ factor. Pragmatic way to monitor this is to set the thinktime for webtest on 1 minute; so that each minute the webtest is executed. The ‘Page Visits per Second’ factor then equals the Visual Studio reported 'Test per Second'.