Total Pageviews

21 Jul 2012

Experimenting with Large SharePoint Lists, Indexed Columns and View Threshold

Hey there, folks! I'd like to share my own findings concerning large SharePoint 2010 lists. Of course there are lots of posts and articles that tell us something about this topic, but I hope to contribute something anyway. Hold on to the handles! Let's begin!

We are given
Custom SharePoint List with 5000 items in it.
List View Threshold: 2000
List View Threshold for Auditors and Administrators: 2000
Object Model Override: true
View page size: 30
List Item Count: 5000
Important: all tests run under a non-administrative login 

Working With Standard SharePoint Web Part via GUI

Let's try to sort our list by an indexed Title column by clicking on the header. Remember, we have 5000 items an Threshold is just 2000.



The result was a bit shocking. This is how the standard ListViewWebPart can sort indexed columns in a large list:

It simply can't! It does not use index while getting items sorted by an indexed column. How do we know that? Check out this article. If we sort by an indexed column, there is no trace of "NameValuePair" database table being used. We can easily check it by using standard Developer Dashboard. This is how a query that uses index should look like:
However, when sorting using standard SharePoint Views NameValuePair table is not used - hence I doubt that indexed columns speed up sorting in any way.

This is how the query looks when we try to sort by an indexed column:



For the test's sake let's try to create a view with sorting by an indexed column. Maybe it will work?



No, it won't. The result is the same. If you have more items in the list than a threshold - you will get the same problem. Standard WebPart does not even consider your index:


Now I want to check how filtering works. Again, this is standard ListViewWebPart. I'm trying to filter by an indexed column:

In this case I can't use filter because there are more filtering options than the threshold allows us to process.

Now I'm going to modify the list view to see if it allows me to use filtering there. After all I have only 30 items on my page. It might work.Notice (Not Indexed) hint next to the "contains" options:



The result is:

Well, it warned us. (Not Indexed), remember? So it was expected already. But what about other options? This time we choose "begins with" option:



This time we expect it to work, but here we go again:


The problem was that there were still 5000 rows to be processed after applying the filter. Let's try to narrow it down:


Success. When number of total filtered items is fewer than a threshold it's going to work just fine:


Same rule applies to any other filtering option except of "contains" one.

ID Column Works Almost Like an Indexed Field, But It's Slightly Better

You can sort by ID column even when there are more items in the list than threshold allows. Unlike other indexed columns ID column will never give you "exceeds the list view threshold" message on the standard ListView webpart unless your page size is bigger than this threshold.

Enforcing Unique Values upon Columns Does Not Make Them Sortable in Large Lists

Maybe we can sort by ID in large lists because the ID column contains unique values? Let's check if uniqueness helps any column to act as an ID column. We've created a Number column that enforced unique values:


Now let's try to sort by this unique column:


The result is...


So, making a column unique does not make it act as an ID column. This is why "ID column is a bit better than others".

Creating and Deleting Index Happens Almost Instantly

Yes, as simply as that. You don't have to wait for a timer job or something like that to run before you get your index created or removed. In fact, index rebuilds every time you add, delete or edit any item in the list.

Daily Time Window for Large Queries 

When setting "Daily Time Window for Large Queries" or  a "happy hour" in Central Administration users will receive additional info when they hit the threshold. This info simply tells users at what time it is allowed to exceed the view threshold.

Threshold, Indexed Columns and Object Model

SPQuery.RowLimit Does Not Help?

Right, now to the code. We create a guinea-pig-visual-web-part for the tests. Let's say you have your beautiful list with more than 5000 items in it with Title being an indexed column. We write an SPQuery to get 100 items. So we write something like this.


    SPList list = web.Lists["UniqueList"];
    SPQuery query = new SPQuery();
    query.Query = "<Where><Contains><FieldRef Name=\"Title\"/><Value Type=\"Text\">Title</Value></Contains></Where>";
    query.RowLimit = 100;
    SPListItemCollection items =  list.GetItems(query);

In this case, even though we have SPQuery.RowLimit set to 100 - there are 5000 items that fulfil condition of the "Where" clause. In other words, even though we use NameValuePair talbe - there are still 5000 items to be processed in it. This is an interesting thing to remember. So this is what we get in this case:



But let's try to narrow the filter down:

    SPList list = web.Lists["UniqueList"];
    SPQuery query = new SPQuery();
    query.Query = "<Where><Contains><FieldRef Name=\"Title\"/><Value Type=\"Text\">Title_18</Value></Contains></Where>";
    query.RowLimit = 100;
    SPListItemCollection items =  list.GetItems(query);

Now it's just worked fine. We didn't hit the threshold because we didn't have to process all 5000 items, but instead processed about 110 items using index table. Developer Dashboard proves this fact:

Let's try to delete our index and see what happens.

Good! Because we have no index to use - we were forced to process all 5000 items again. And of course, this operation was blocked.

But I Am Using RowLimit! Why Do I have to Process 5000 Items?

You don't have to. I've modified the code snippet so that it looks like that:

  SPList list = web.Lists["UniqueList"];
  SPQuery query = new SPQuery{RowLimit=100};
  query.Query = "<Where><Contains><FieldRef Name=\"Title\"/><Value Type=\"Text\">UniqueTitle</Value></Contains></Where>"
  + ContentIterator.ItemEnumerationOrderByNVPField;



ItemEnumerationOrderByNVPField is a property that returns "<OrderBy UseIndexForOrderBy='TRUE' Override='TRUE' />" string. This OrderBy overrides any other possible OrderBy's in your SPQuery and makes sure one of the indexed columns is used. I'm not really sure what exactly indexed columns will be sorted when I have several of them.


This technique of using ItemEnumerationOrderByNVPField only helps when:

  • Using indexed columns in the Where clause
  • RowLimit (or number of items returned) is less than the threshold
  • You don't use sorting in Where clause. If you do - it will be overridden anyway (very inconvenient limitation) 
  • You don't mind your items being sorted by indexed columns. 

There is also ContentIterator.ItemEnumerationOrderByID property that returns the string "<OrderBy Override='TRUE'><FieldRef Name='ID' /></OrderBy>". Let's try using this one. Maybe it helps if we don't want the items to be sorted when returned:

  SPList list = web.Lists["UniqueList"];
  SPQuery query = new SPQuery(RowLimit=100};
  query.Query = "<Where><Contains><FieldRef Name=\"Title\"/><Value Type=\"Text\">UniqueTitle</Value></Contains></Where>"
  + ContentIterator.ItemEnumerationOrderByID;

The result is:

What if I want some control over columns that are being sorted? 

Warning: Don't use this snippet in your projects, it won't work as you expect it to:

  SPList list = web.Lists["UniqueList"];
  SPQuery query = new SPQuery{RowLimit=100};
  query.Query = "<Where><Contains><FieldRef Name=\"Title\"/><Value Type=\"Text\">UniqueTitle</Value></Contains></Where>"
  + "<OrderBy UseIndexForOrderBy='TRUE' Override='TRUE'><FieldRef Name='Title' Ascending='FALSE' /></OrderBy>";
  SPListItemCollection items = list.GetItems(query);

I was just curious what the result would be in this case. Unfortunately it said that I've hit the threshold yet again. Oh well...

ContentIterator.ProcessListItems Method (Microsoft SharePoint Server 2010 only)

If you can't narrow down the number of items to be processed by using filter - one of the right ways to avoid hitting threshold is by using ContentIterator.ProcessListItems


    SPList list = web.Lists["UniqueList"];
    SPQuery query = new SPQuery{RowLimit=5000};
    query.Query = "<Where><Contains><FieldRef Name=\"Title\"/><Value Type=\"Text\">UniqueTitle</Value></Contains></Where>"
     + ContentIterator.ItemEnumerationOrderByNVPField; // this is a Must

   ContentIterator contentIterator = new ContentIterator();
   List<SPListItem> items = new List<SPListItem>();
   contentIterator.ProcessListItems(list, query,
       delegate(SPListItem item)
       {
           items.Add(item); // not sure if it's the most efficient way!
       },                   
       delegate(SPListItem item, Exception eItemFailed)
       {
           return true;
       }
   );

Object Model Override, SPList.EnableThrottling and SPQueryThrottleOption.Override methods

This is simply a list of ways to bypass threshold.
I don't want to repeat that was already written in a good article "Working with Large Lists in SharePoint 2010 - List Throttling" by Steve Peschka, so, please, have a look.

Conclusion



  • Combine Filtering and Sorting to avoid hitting threshold. Narrow down the number of rows to be processed.
  • If you have access to SharePoint Server 2010 Edition consider using ContentIterator methods.
  • Read additional info from the sites listed below



Additional articles:
Writing Efficient Code in SharePoint Server
2010 Large Lists Documentation: white papers and blog posts
Large List Performance (very nice article)