The article is a part of JPA & Spring pitfalls series. Below you can find the actual list of all articles of the series:

Spring

JPA

Preface

N+1 SELECT problem is probably the most known and common problem for all Object-Relational Mapping tools which support lazy loading. It’s on developer’s side to be aware of it and to know how to prevent it in the right way, because hidden N+1 SELECT vulnerability may cause noticeable performance issues.

What is the N+1 SELECT problem?

Let’s assume we have the following two entity classes:

As we know, @OneToManyby default uses lazy loading, therefore when we retrieve a Room from the database, its items won’t be loaded until we call the getter method – then the proxy will execute a DB call to fetch the items.

Now, let’s see how nicely the following method looks:

But apart from looking nicely, it’s also very inefficient…

Assuming that in our database we have 1000 Rooms, each with 2 Items, when we run this method and analyze Hibernate metrics we can see the following:

Hibernate needs to execute 1001 JDBC statements, so we could just get all names of room items. What actually happens in this code?

 

First, we retrieve the list of all N (in our example N = 1000) rooms (with 1 query), then for each room we access its items. The items list lazy loaded, therefore with each room.getItems() there will be a separate DB query executed to fetch them (since there are N rooms, that means N queries). So we end up with executing 1+N DB queries, whereas we could easily do it with just 2 queries. Namely, we could first fetch the rooms (1 query), then fetch the list of items that belong to all of the rooms (with something like SELECT * FROM Item WHERE roomId IN <room ids>) and match them (Rooms and Items) in memory. In fact, if this is not enough, we can optimize it to do it with just a single query.

N+1 DB queries versus only 1 or 2 is a huge performance difference! Especially when N is big enough.

Solutions

Actually, in JPA we have plenty of ways to solve it and I won’t describe them all, because some of them are for very specific cases or are very similar in what they do. Here, I’m going to describe those most commonly appearing on the web and I’ll explain why in the great majority of cases the last of them is usually the best option.

FetchType.EAGER instead of lazy loading

Although, this approach is very easy to find on StackOverflow and to apply it to the code, but rarely it may be called “a solution”. In case that you literally always need attributes of both sides of the relationship, then perhaps it may be right way. But in the vast majority of situations there are spots in the codebase where you need both (Rooms and Items) and others when you need only attributes of one (Rooms). In that case, setting globally fetch type to EAGER is like replacing one problem with another. It may work for N+1 SELECT problem or for LazyInitializationException, but worsen performance in other parts of the application due to loading more data than you actually need.

@BatchSize

@BatchSize(size = M) is not a JPA annotation, but typically a Hibernate one. It tells Hibernate that instead of initializing lazy associations one by one when they are actually accessed, it should initialize them in batches of M among proxies retrieved in the current session.

This way instead of N + 1 queries, will have N/M + 1, which may be a decent optimization, however, I’d still recommend using the below approach.

Join Fetch, EntityGraph, JPA Criteria API, etc.

The idea behind Join Fetch, EntityGraph and similar approaches is to avoid setting eager fetching globally on the entity class, but doing it explicitly per case. So even if generally associations would be loaded lazily, we can explicitly ask that this time some of them should be loaded eagerly.

With Join Fetch we would need to just slightly modify the query and that’s it.

Simple, isn’t it? And yet it’s a hundred times better that globally changing entity’s fetch type.

A small disadvantage of Join Fetch may be the fact that fetching strategy is directly in the query and is kept outside of the entity class. To address that JPA 2.1 introduced EntityGraph functionality which allows you to define and name common combinations of association fetching, and then use them by names when doing queries.

To define it in the Room class we add:

And then we may use it as follows:

Just compare session metrics after executing the same code, but using Join Fetch (same for EntityGraph):

Just 1 JDBC statement instead of the initial 1001, that’s way better!

Conclusion

Simply put: think twice (or more) when you intend to solve some problems by only changing lazy loading to eager loading. Usually this is not the right solution and using Join Fetches or EntityGraphs is a good way to go.

Thanks for reading and stay tuned to our blog!

Java Developer

Michał is an eager fan of the fresh approach to Java programming. Enthusiast of Spring tech stack and refactoring techniques. Enjoys solving Java quirks, algorithmic puzzles and... Rubik's cube. Privately, amateur drummer.