One of the major third-tier developer components introduced in SharePoint 2010 is LINQ to SharePoint. In a recent debate on Twitter, I tried to convey some of the problematic aspects of using LINQ to SharePoint (or even LINQ in general) but because Twitter tends to be rather ADHD about its attention span, I thought I’d sum up my thoughts for posterity.
Now, if you have no idea what LINQ is, I’ll give you a brief intro. However, regardless in what tier you do your SharePoint development, I’ve tried to make this as non-technical as possible, so you should be able to follow along. I’ve even included the mandatory car example.
LINQ is What?
Developers, and especially third tier developers (meaning Visual Studio/.NET development following Marc D. Anderson’s Middle-Tier Manifesto), often manipulate data in some form or another. For a platform like SharePoint, that essentially is a glorified database, data manipulation is a rather important part of any development project.
Note: I’m simplifying the topic here. I realize that some of the terms used are not 100% accurate, but I’m trying to explain this to those with little or no prior development experience. If this bothers you, feel free to point out the inaccuracies in the comments below.
By data manipulation, I mean tasks such as finding, sorting, modifying, presenting, and filtering data, in SharePoint likely stored in lists or libraries somewhere.
To accomplish such tasks, developers have a wide range of tools and languages available, depending on the underlying data source. For example, if the underlying data source is a traditional database (often called a relational database), developers can use a data language called SQL. In non-traditional databases, such as a wide variety of what are now called NoSQL databases, graph databases, and big data implementations, languages and tools often vary by the individual database. And, for highly proprietary solutions, like SharePoint, there’s platform specific options like CAML queries.
So, as you may see, working with data requires that the developer knows a lot about the underlying data source because the method used to access and query data varies greatly. Obviously, this puts a huge burden on the developer because one needs to know not just the details of the language in which that developer normally works, but also the intricacies of multiple underlying data manipulation methods.
In fact, this is why database developer is a specialized job. These developers don’t necessarily know the programming aspects of development, but rather specialize on knowing how to build efficient and accurate queries for data access.
Here’s where LINQ comes in to sort-of save the day. LINQ provides, or at least attempts to provide, a unified language that allows developers that are less than specialists on databases to use one language to access data, regardless of what the underlying data source wants or needs. Rather than having to know how a certain implementation of a database works, a developer can focus on learning LINQ and then let what’s known as a LINQ provider.
The task of the LINQ provider is to translate what the developer writes into what the underlying data needs. For example, the LINQ to SharePoint provider will translate LINQ into CAML so that developers won’t need to even learn CAML to make queries against SharePoint data. Other providers perform similar translations, such as the now defunct LINQ to SQL, the newer LINQ to Entities, or LINQ to Objects.
In fact, you can write your very own LINQ to MyStuff provider if you want. It’s not even that difficult, but the consequences are profound, and in there lies the first problem with LINQ and I’ll get to that in a moment.
Because LINQ relies on providers, you abstract yourself from the underlying data. Normally, this is a great idea, in fact, abstractions are a key component in any modern development, saving you both time and the need to learn all the underlying technology.
Consider hardware drivers for a moment. It would be a right mess if every developer needed to learn how to actually generate the electrical signal needed to send a bit down the pipe of every vendor’s Ethernet network adapter. Instead, you can (or mostly won’t) write against a hardware driver, which exposes a set of capabilities such as ‘send some stuff down the pipe’ but leaving the hardware details out-of-sight of the developer.
LINQ works in much the same way. Developers don’t need to learn how the data source queries its data. Instead you get a set of exposed capabilities such as ‘get some stuff for me’ and leaving the data access details out-of-sight of the developer.
Good idea, right? Well, there are problems in the garden of Eden, as always…
You see, because LINQ relies on providers (or drivers) to access its data, as a developer you get many of the same problems that developers working against hardware get.
Even though all LINQ providers can expose the same capabilities, not all of them do. In fact, even providers that give you the same capabilities does not necessarily give them to you in the same manner.
Let’s say that you write a LINQ query to retrieve the Title of a certain data item. The idea of LINQ is then that by simply returning the Title property, you don’t need to worry about the underlying mechanics for finding the Title.
This is great, except that you have no idea what the underlying provider does to retrieve the Title, and this can be a serious performance problem. In fact, behind the scenes, because the provider is responsible for everything, the Title property can be tied to functionality that requires additional and expensive database or even web service calls.
Until you’re building applications that rely on high performance, however, this may not be a huge deal. It is, or can be, a huge deal if your solution needs to support many concurrent users, but the essence is that you don’t know when you make the query whether it is going to download the entire web to your server. LINQ effectively hides that, and there’s no way, looking at the query, that you can possibly know what’s actually going on.
To make matters worse, there’s no way for you to control how the provider works either, unless you understand intimately how that specific provider handles queries. There’s really no way to optimize a LINQ query without knowing both how the provider works and how the underlying data source works.
In other words, where writing efficient code previously required knowing the details of a data model, you now need to know the details of the data model and the details of the provider. We had a problem with writing code before? Now we have the same problems and another problem.
Again, unless you’re even slightly worried about performance, this may never impact you.
Sadly, most SharePoint developers I meet are more concerned with actually getting the code to do what they want rather than writing matinaintable code that works efficiently and scales well, but that’s another rant.
This is, however, a readability issue. Someone glancing over a LINQ query may get a general idea about what goes on, but won’t know anything about how it’s done, or even whether they can extend or modify the query, because they don’t know, from looking at the LINQ query, what the provider supports, how it supports it, or even which provider is in use.
Here’s one example, of many I might add, using what are known as enumerations. Enumerations in programming languages are when you have a variable or property that can have one of more predetermined values. For example, using the never-boring car example, a car may have one of three different engine types (petrol, diesel, electric), and no other engine types are valid for a car. In this case, you can use an enumeration to force the value to be valid, as opposed to having just a line of text, prone to misspelling, determine the engine type.
Note: In SharePoint, by the way, this would be akin to a choice field.
When you then want to find all diesel cars, you want to use that enumeration to make sure you get an accurate query, which is elegant and maintainable, but also here’s where the problems start.
LINQ providers are nowhere near consistent in how they support enumerations. One such example is LINQ to Entities, which up until version 5.0 does not support enumerations at all. To work around the issue, you end up with ‘hacks’ such as converting the enumeration to a number value and querying that number value instead.
Enumerations isn’t the only issue. Consider that you may want to not only know which engine a car has, but based on the engine type, you also want to know the possible engine sizes. To accomplish this, you might want to use what is know as a Join, in which you join the result of a second query with the result from the first query. Not all providers can support joins, and to make matters worse, some providers for the same underlying data source will support joins, while others providers to the exact same data source will not.
So, to a developer who needs to maintain a piece of code containing LINQ, the developer needs to not only know LINQ but also know which provider LINQ uses in order to know whether a modification will work. Maintenance becomes a nightmare forcing developers to sift through tons of included DLLs to determine which provider and which provider extensions are in effect.
LINQ to SharePoint does support enumerations, though, and to some extent also joins, so problem solved, right? Well, not exactly…
The problem is that enumerations in LINQ to SharePoint are rather quirky, especially when you take into consideration the tendency of SharePoint data models to change, depending on the needs of the user. A SharePoint site owner can and often will change the definition of a choice field, by either removing or adding additional values. They may add or remove columns, or even entire lists, because that’s just the way SharePoint is supposed to work.
We want this to happen because one of the primary and most powerful features of SharePoint is its ability to adapt to changes in needs. If someone invents nuclear powered cars or cars that run on M&Ms, we want our SharePoint solution to be able to store that, right?
Well, LINQ to SharePoint won’t pick up on such changes. In fact, as I’ll explain in a moment, LINQ to SharePoint is highly static and works only for a definition of the data model that was current at the time of the start of development. This happens because to get LINQ to SharePoint to understand your data, you need to run a tool called SPMetal, that essentially takes a snapshot of your data model and builds query classes based on that data.
If you change the model later, for example by adding new columns or changing the definition of a choice field, your .NET code breaks or at least won’t be able to work with the modified data.
There are ways to handle changes in data models or objects in SharePoint. LINQ to SharePoint, however, isn’t a good approach to this. Although technically you can extend your LINQ to SharePoint classes after a data model change, it is again a highly specific method to this particular provider. Besides, it’s far from simple, as explained in this 2,500 word post from the SharePoint developer team blog.
Note that the problem of changes in the underlying data isn’t unique to LINQ to SharePoint. It is a problem with LINQ in general because the queries are strongly typed at the time of building and can’t easily be as dynamic as we need SharePoint to be.
Compare that to CAML for example, which is essentially a long string of text that you can build dynamically based on changes detected, LINQ becomes a cumbersome additional piece of technology you need to learn without actually gaining any real benefits.
Thus, the benefit of having a unified language for queries is more or less lost. No, you’re not getting one language that works the same across any data source. No, you don’t get the benefit of not needing to know the underlying data source. Yes, you need to know more because you also need to know how the provider works.
I rarely write conclusions and frankly hate them but I thought I’d make an exception for those that are too lazy or stupid to read the entire text.
First, LINQ is a great idea. I use it a lot. I also spend days, perhaps weeks, learning the intricate details of individual LINQ providers before I start using them. With that investment, LINQ can be a great time saver, if you know what you’re doing very well.
Second, LINQ is not easier than other options. It is in fact more difficult. Even though you can write a LINQ query fairly quick, that doesn’t mean you understand, control, or know what you’re doing. It’s a monkey-see, monkey-do problem; it’s really easy to learn a few tricks that will impress a novice.
Third, LINQ isn’t nearly as universal as it may seem. The syntax may be well defined, but that’s as useful as saying that since English grammar is fairly well defined, everyone who understand English grammar also understands what every English sentence means.
In even shorter term; use LINQ only when you fully understand both the provider and the underlying data source.
Until then, you’re just another monkey who can push a button to get a treat.