Software Development

Backend Development

JPA: N+1 SELECT Problem [Spring & JPA Pitfalls Series]

10/05/2019 |

6 min read

Michał Marciniec

The article is a part of the JPA & Spring pitfalls series, which you can check out here.

JPA: N+1 SELECT Problem - Preface

N+1 SELECT problem is probably the most known and common problem for all Object-Relational Mapping tools which support lazy loading. It’s on the developer’s side to be aware of it and to know how to prevent it in the right way, because hidden N+1 SELECT vulnerability may cause noticeable performance issues.

What is the N+1 SELECT problem?

Let’s assume we have the following two entity classes:

@Entity
public class Room {


   @Id
   @GeneratedValue
   private long id;


   @OneToMany
   private List<Item> items;


   public List<Item> getItems() {
       return items;
   }
}


@Entity
public class Item {


   @Id
   @GeneratedValue
   private long id;
   private String name;


   public String getName() {
       return name;
   }
}

As we know, @OneToMany by default uses lazy loading, therefore when we retrieve a Room from the database, its items won’t be loaded until we call the getter method - then the proxy will execute a DB call to fetch the items.

Now, let’s see how nicely the following method looks:

public List<String> getRoomItemsNames() {
   List<Room> rooms = entityManager.createQuery("SELECT r FROM Room r", Room.class).getResultList();
   return rooms.stream()
           .map(Room::getItems)
           .flatMap(Collection::stream)
           .map(Item::getName)
           .collect(Collectors.toList());
}

But apart from looking nice, it’s also very inefficient...

Assuming that in our database we have 1000 Rooms, each with 2 Items, when we run this method and analyze Hibernate metrics we can see the following:

    18600 nanoseconds spent acquiring 1 JDBC connections;
    10807500 nanoseconds spent preparing 1001 JDBC statements;
    75006500 nanoseconds spent executing 1001 JDBC statements;
    14505600 nanoseconds spent executing 1 flushes (flushing a total of 3000 entities and 1000 collections);

Hibernate needs to execute 1001 JDBC statements, so we could just get all names of room items.

What happens in this code?

First, we retrieve the list of all N (in our example N = 1000) rooms (with 1 query), then for each room, we access its items. The items list lazy-loaded, therefore with each room.getItems() there will be a separate DB query executed to fetch them (since there are N rooms, that means N queries). So we end up executing 1+N DB queries, whereas we could easily do it with just 2 queries.

Namely, we could first fetch the rooms (1 query), then fetch the list of items that belong to all of the rooms (with something like SELECT * FROM Item WHERE roomId IN <room ids>) and match them (Rooms and Items) in memory. In fact, if this is not enough, we can optimize it to do it with just a single query.

N+1 DB queries versus only 1 or 2 is a huge performance difference! Especially when N is big enough.

Read: JPA & Spring Pitfalls

N+1 SELECT problem solutions

Actually, in JPA we have plenty of ways to solve it and I won’t describe them all, because some of them are for very specific cases or are very similar in what they do. Here, I’m going to describe those most commonly appearing on the web and I’ll explain why in the great majority of cases the last of them is usually the best option.

FetchType.EAGER instead of lazy loading

Although this approach is very easy to find on StackOverflow and to apply it to the code, rarely it may be called “a solution”.

In case that you always need attributes of both sides of the relationship, then perhaps it may be the right way. But in the vast majority of situations, there are spots in the codebase where you need both (Rooms and Items) and others when you need only attributes of one (Rooms). In that case, setting globally fetch type to EAGER is like replacing one problem with another. It may work for N+1 SELECT problem or LazyInitializationException, but worsen performance in other parts of the application due to loading more data than you actually need.

@BatchSize

@BatchSize(size = M) is not a JPA annotation, but typically a Hibernate one. It tells Hibernate that instead of initializing lazy associations one by one when they are accessed, it should initialize them in batches of M among proxies retrieved in the current session.

This way instead of N + 1 queries, will have N/M + 1, which may be a decent optimization, however, I’d still recommend using the below approach.

Join Fetch, EntityGraph, JPA Criteria API, etc.

The idea behind Join Fetch, EntityGraph, and similar approaches is to avoid setting eager fetching globally on the entity class, but doing it explicitly per case. So even if generally associations would be loaded lazily, we can explicitly ask that this time some of them should be loaded eagerly.

With Join Fetch we would need to just slightly modify the query and that’s it.

entityManager.createQuery("SELECT room FROM Room room JOIN FETCH room.items", Room.class)

Simple, isn’t it? And yet it’s a hundred times better than a globally changing entity’s fetch type.

A small disadvantage of Join Fetch may be the fact that the fetching strategy is directly in the query and is kept outside of the entity class. To address that JPA 2.1 introduced EntityGraph functionality which allows you to define and name common combinations of association fetching, and then use them by names when doing queries.

To define it in the Room class we add:

@NamedEntityGraphs({
       @NamedEntityGraph(name = "graph.Room.items",
               attributeNodes = {@NamedAttributeNode("items")})
})

And then we may use it as follows:

List<Room> rooms = entityManager
       .createQuery("SELECT r FROM Room r", Room.class)
       .setHint("javax.persistence.fetchgraph", entityManager.getEntityGraph("graph.Room.items"))
       .getResultList();

Just compare session metrics after executing the same code, but using Join Fetch (same for EntityGraph):

    21900 nanoseconds spent acquiring 1 JDBC connections;
    6187100 nanoseconds spent preparing 1 JDBC statements;
    40172700 nanoseconds spent executing 1 JDBC statements;
    10565000 nanoseconds spent executing 1 flushes (flushing a total of 3000 entities and 1000 collections);

Just 1 JDBC statement instead of the initial 1001, that’s way better!

Read: JPA: 8 Common Pitfalls [Spring & JPA Pitfalls Series]

N+1 SELECT problem - Conclusion

Simply put: think twice (or more) when you intend to solve some problems by only changing lazy loading to eager loading. Usually, this is not the right solution, and using Join Fetches or EntityGraphs is a good way to go. Thanks for reading and stay tuned to our blog!

Rated: 5.0 / 5 opinions