Hibernate Cartesian Product Problem
This article shows entity setups in which Hibernate executes a Cartesian Product SQL query, which may cause serious performance problems for large data sets. We’ll also study a few typical solutions.
Compare this to the previous article on N+1 selects problem to gain a broader picture.
Code example
Cartesian product problem in Hibernate is best illustrated with a code example.
Consider this Doctor Entity. Each doctor has a bi-directional relation to potentially many appointments and patients.
@Entity
public class Doctor {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String name;
@OneToMany(mappedBy = "doctor", fetch = FetchType.EAGER)
private Collection<Appointment> appointments;
@OneToMany(mappedBy = "leadingDoctor", fetch = FetchType.EAGER)
private Set<Patient> patients;
Both Appointment and patient entities are very basic.
@Entity
public class Appointment {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private LocalDateTime appointmentTime;
@JoinColumn(nullable = false)
@ManyToOne
private Doctor doctor;
@Entity
public class Patient {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String name;
@JoinColumn(nullable = false)
@ManyToOne
private Doctor leadingDoctor;
Let us assume that for each doctor we are sure we will access both appointments and patients in both persistent and detached state. Due to that, Doctor entity overrides the default fetch plan for both @OneToMany associations and fetches them eagerly.
Now take a look at the following test method. It creates a single doctor with 3 appointments and 4 patients. In another transaction, it just fetches our doctor.
@Test
public void whenEagerLoadingCollections_thenCartesianProductQuery() {
createDoctorsWithAppointmentsAndPatients(1, 3, 4);
doctorDao.executeInTransaction(entityManager -> {
Doctor doctor = entityManager.find(Doctor.class, 1L);
});
}
This results in the following SQL query being executed.
select
doctor0_.id as DOCTOR_ID,
doctor0_.name as DOCTOR_NAME,
appointmen1_.doctor_id as APPOINTMENT_DOCTOR_ID,
appointmen1_.id as APPOINTMENT_ID,
appointmen1_.appointmentTime as APPOINTMENT_TIME,
appointmen1_.doctor_id as APPOINTMENT_DOCTOR_ID,
patients2_.leadingDoctor_id as PATIENT_LEADING_DOCTOR_ID,
patients2_.id as PATIENT_ID,
patients2_.leadingDoctor_id as PATIENT_LEADING_DOCTOR_ID,
patients2_.name as PATIENT_NAME
from Doctor doctor0_
left outer join Appointment appointmen1_
on doctor0_.id=appointmen1_.doctor_id
left outer join Patient patients2_
on doctor0_.id=patients2_.leadingDoctor_id
where doctor0_.id=1
Hibernate Cartesian Product Problem
The default fetch plan for @OneToMany and @ManyToMany entity associations is LAZY. Overriding both collections association to EAGER, makes Hibernate fetch all the Appointments and Patients in a single, left join, SQL query. On the contrary to N+1 selects problem, the number of queries is not an issue here, but take a look at the returned result set.
The query which is generated by eager fetching both associated collections results in a big redundancy in returned data set. For a doctor with 3 appointments and 4 patients, we’re receiving 3 x 4 = 12 rows. For this particular example it’s not a big issue, but imagine a doctor with 200 patients and 50 appointments returning 200 x 50 = 10 000 rows. This data needs to be wired from the database to application and later de-duplicated by Hibernate, which may lead to performance and memory issues.
A Solution to Cartesian Product Problem
Fetching Associated Collections Independently
By default, Hibernate uses a LAZY fetch plan for both @OneToMany and @ManyToMany entity associations, and the above example clearly shows a reason for this. In order to avoid Cartesian Product Problem, one approach is to return to defaults for two above types of entity associations and load associated collections with separate queries.
Doctor doctor = entityManager.find(Doctor.class, 1L);
List<Appointment> doctorsAppointments = entityManager.createQuery("select a from Appointment a where a.doctor.id = :doctorId")
.setParameter("doctorId", doctor.getId())
.getResultList();
List<Patient> doctorsPatients = entityManager.createQuery("select p from Patient p where p.leadingDoctor.id = :doctorId")
.setParameter("doctorId", doctor.getId())
.getResultList();
We’re effectively sending three separate SQL queries to our DBMS carrying 1, 3 and 4 rows respectively.
The drawback is that associated entity collections won’t be available from within the doctor object. Calling doctor.getAppointments() will result in an extra select statement when called within same transaction or a LazyInitializationException if outside. We are losing the benefit of bi-directional @ManyToOne mapping.
Eager Fetching With Independent Selects
Please note that this functionality is only available in Hibernate. Hibernate allows configuring a global setting for eagerly fetched collections. This solution to Cartesian Product Problem allows for each associated collection to be fetched with a separate SQL query. In nature, it’s similar to what we manually did above, but it also has a benefit of loading all related entities directly into Doctor entity.
This approach requires an additional @Fetch annotation for eagerly loaded entity annotations within Doctor entity.
@Entity
public class Doctor {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String name;
@OneToMany(mappedBy = "doctor", fetch = FetchType.EAGER)
@org.hibernate.annotations.Fetch(FetchMode.SELECT)
private Set appointments;
@OneToMany(mappedBy = "leadingDoctor", fetch = FetchType.EAGER)
@org.hibernate.annotations.Fetch(FetchMode.SELECT)
private Set patients;
Now calling our original test method and selecting Doctor with EntityManager.find() results in the following three SQL queries being executed one after another.
select
doctor0_.id as id1_1_,
doctor0_.name as name2_1_
from Doctor doctor0_
where doctor0_.id=1
select
patients0_.leadingDoctor_id as leadingD3_2_0_,
patients0_.id as id1_2_0_,
patients0_.id as id1_2_1_,
patients0_.leadingDoctor_id as leadingD3_2_1_,
patients0_.name as name2_2_1_
from Patient patients0_
where patients0_.leadingDoctor_id=?
select
appointmen0_.doctor_id as doctor_i3_0_0_,
appointmen0_.id as id1_0_0_,
appointmen0_.id as id1_0_1_,
appointmen0_.appointmentTime as appointm2_0_1_,
appointmen0_.doctor_id as doctor_i3_0_1_
from Appointment appointmen0_
where appointmen0_.doctor_id=?
Please be mindful that this approach is likely to introduce side effects, as we are setting a global configuration for each case when Doctor entity is retrieved from the database.
Fetching Entity With a Query
Alternatively, if eager fetching of entity associations is a must, retrieving the Doctor instance with a Query, rather than EntityManager.find, creates a similar result to the one above.
Doctor doctor = entityManager.createQuery("select d from Doctor d where d.id=1", Doctor.class).getSingleResult();
Observing SQL statements logged we see three queries being generated matching exactly the ones in the above solution approach. I wasn’t able to find any documentation related to fact that communicating through Query class results in subselecting eager associations, so please use this approach with care.
Lazy Load Collections and Fetch on Demand
The approach advised by Hibernate itself is to Lazy Load all the @OneToMany and @ManyToMany collection associations and use dynamic fetching strategies for eager loading. Please remember that @OneToOne and @ManyToOne associations will be eagerly loaded by default.
For more on dynamic fetching, take a look at N+1 Selects problem article.
- Here’s a guide to logging SQL statements with log4j
- Explore how to configure a One To One entity association.
- Here’s a guide to Many to one JPA entity mapping
- And Here’s on Bi-directional Many To One JPA entity mapping