Stored Procedure is not always necessary
ORMs usually make the job of interacting with databases much easier, since they abstract us from relational models by allowing us to work directly with an object-oriented model.
They also allow us to make changes in entity models with less effort. But this abstraction is not without cost. Sometimes, by forgetting the existence of a database behind, mistakes can be made that will negatively affect the application's performance.
You might have once designed an impeccable, pure object model using an ORM to avoid contamination with profane languages such as SQL. In sum, a masterpiece. It goes into production and some time later, there are complaints about slowness and an overload of the database. At that moment, the DBA appears and says: ''Are you using ORM? No...that has to go to Stored Procedure to work fast''.
Although in some cases it might become necessary, most of the times the mere fact of changing the way in which queries are written in the ORM is sufficient to significantly improve performance with no need to make high-impact changes in the application and architecture of the solution.
What follows is a series of tips that will improve the writing of queries in ORMs:
Select new… and lazy loading
Putting together the query to generate an object only with the properties of each entity we will be using. It makes no sense to get 20 columns from a table if we only need 3 of them for the piece of code that is being executed.
With this technique, apart from reducing the amount of data we read on the database, we avoid generating a large amount of unchained elements due to lazy loading (known as the select n+1 problem).
Let's assume that we obtain a list of base products represented by the ''Product'' entity. In a relational model, it would have two columns: Product and Category.
An option to obtain data and transform it into a viewmodel to show in a visualization could be the following:
But what are the disadvantages of doing this?
- Properties are obtained from the Product entity that will not be used later
- To Access the Description property of the product category, the lazy loading mechanisms will trigger a query for each product obtained from the base. Working with a large data volume may cause serious performance problems.
An alternative is to do it this way:
In this case, the ORM has the intelligence to detect that we only need 4 fields. Furthermore, since we are obtaining a field that is in another table, it will do the join accordingly. In sum, it will generate a single select that will join the Product and the Product Category, getting only the 4 properties we need.
Count vs. any
When you want to know the existence of an entity or record by applying certain criteria, it is always recommended to use ‘’Any’’ instead of ‘’Count’’. ‘’Any’’ will be translated to an ‘’Exists’’ or to a ‘’Select TOP’’, depending on the ORM that we are using, whereas the Count will be translated to the aggregation function of the SQL count with a higher execution cost.
We must always be careful when using ToList. A query made through LINQ is not executed against the database until the numbers are not listed. This enumeration can occur when doing a foreach over the results or by using ToList. If all the records of the database are not lifted before applying the filters, then the filters must be applied on the table before doing the ToList.
For example, something like this:
The first sentence will force the ORM to generate a select without ''where'' in the database and a list of objects with all the records present in the table. The second sentence will filter the products in the memory collection.
With a subtle change, we obtain the desired behavior:
By not using ToList in the first sentence, the query is not executed. The where in the second sentence is added to the query expression to be built and only then, in the ToList at the end of the second line, the query is executed but with the corresponding where to obtain only the desired records from the database.
Mapping of bi-directional relations
Many times, when related objects are mapped, the relation is put together in a bidirectional way. For example, if we have the Venta (sales) object with its items, it could be put together as follows:
In principle, this relation shouldn’t cause problems given that the Venta items should have a reasonable amount. Now, if we add the bidirectional relation to the Product entity with ItemdeVenta, as soon as the database starts growing in terms of amount of data, we will have a performance problem (and a very hard-to-find one).
In the example above, the ItemsdeVenta that make reference to a certain product will tend to grow overtime as the business transactions are processed. If we invoke Producto.ItemsDeVenta, we will obtain all the Itemsdeventa where the Product is included. Likewise, if one assigns a Product to an ItemdeVenta, the same ORM can invoke it internally and if we do not take the necessary precautions we will not have control over the performance degradation as the application starts being used massively.
A sentence like the following may trigger the ORM to lift millions of records from ItemdeVenta:
Additionally, without even accessing the property you might run into a problem. For example, by creating an ItemdeVenta:
The ORM can internally access the ItemdeVenta property in the Product class to maintain consistency in the bidirectional relation; therefore, we would potentially be obtaining from the database an enormous amount of records without even accessing a property that is a collection. Here is when the DBA tells you that it sees that you are making a query in its profiler that gets a million records, and of course you swear you are not making any queries.
Value Types Nulleables
It is important to be careful when using Value Types. If the type is nulleable on the table, then it needs to be maintained also at an object level. When this inconsistency between table model and objects occurs, the ORMs usually detect that the entity has had a change and try to update the field with the Value Type default value.
The mentioned tips are only a few cases that I’ve encountered (and suffered through). Beyond these specific examples, what I recommend is to verify what SQL query is being generated by the ORM.
Each ORM has its own particularity when it comes to translating code to query. Therefore, we must never stop controlling that generated queries are performing well, and investigate whether there is a better way to write LINQ code to generate a better SQL.
In most cases, we can achieve efficient and well-performing queries with no need to write the query directly in SQL or program a Stored Procedure.
It is essential to always bear the database in mind, and get into the habit of verifying how the queries we write with LINQ end up being translated in SQL, particularly when they are complex. It is also essential to know the details and limitations of the ORM we are using and the features of the database engine over which we are persisting our data.
Baufest Technical Expert