cuQuantum Backend in QuTiP’s Solvers

The qutip-cuquantum library provides a backend that leverages NVIDIA’s cuQuantum SDK to accelerate the simulation of quantum systems. This backend is specifically designed for simulating large, composite quantum systems by performing tensor network contractions on the GPU.

This document outlines how to activate and use this backend with QuTiP’s solvers.

Enabling the cuQuantum Backend

To use the cuQuantum-based solvers, you must first enable the backend. This is done by calling the set_as_default function and providing it with a WorkStream object from cuquantum.densitymat.

import qutip
import qutip_cuquantum
from cuquantum.densitymat import WorkStream

# Create a workstream for the backend.
# This can be configured for multi-GPU and multi-node setups.
stream = WorkStream()

# Set cuQuantum as the default backend.
qutip_cuquantum.set_as_default(stream)

The set_as_default function changes several QuTiP defaults to route computations through the cuQuantum library. This includes setting the default data format for quantum objects (Qobj) to CuOperator and configuring the solvers to use GPU-compatible integrators.

This operation can be reversed with:

qutip_cuquantum.set_as_default(reverse=True)

The backend can also be enabled with a context:

with CuQuantumBackend(ctx):
    ...

However be careful when mixing core Qutip object and Qutip-cuQuantum’s one. Qutip’s Qobj do not keep all the internal structure needed for cuQuantum’s optimizations. Qutip-cuQuantum’s states can be distributed in multiple processes and unusable for many qutip’s core features.

Usage with Solvers

Once the backend is enabled, you can use compatible QuTiP solvers as you normally would.

Supported Solvers

The backend has been tested and is known to work with the following solvers:

qutip.sesolve
qutip.mesolve
qutip.mcsolve

Note

Other solvers have not been tested and may not function correctly with this backend.

Integration Methods

The backend introduces its own set of ordinary differential equation (ODE) integration methods, which are wrappers around existing QuTiP methods, optimized for the CuOperator data type. You must use one of the following methods in your solver options:

"CuTsit5"
"CuVern7" (New default)
"CuVern9"

Using any other integration method will result in undefined behaviour.

Example

Here is a simple example of running mesolve on the GPU:

# This example assumes the backend has already been set as shown above.

N = 100
H = qutip.tensor([qutip.rand_herm(N), qutip.qeye(2)])
c_ops = [qutip.tensor(qutip.destroy(N), qutip.sigmax())]
rho0 = qutip.tensor(qutip.basis(N, 0), qutip.basis(2, 1)).proj()
times = [0.0, 1.0, 2.0]
e_ops = [qutip.tensor(qutip.num(N), qutip.qeye(2))]

# No options are needed if you want to use the default "CuVern7" method.
result = qutip.mesolve(H, rho0, times, c_ops=c_ops, e_ops=e_ops)

Performance and Advanced Usage

When to Use This Backend

The cuQuantum backend excels at simulating large compound systems. Its performance advantage comes from using tensor network algorithms on the GPU, which is most effective when the Hilbert space is composed of many smaller subsystems. For small, single-component systems, the overhead of GPU data transfers may make it slower than the default CPU-based solvers.

Multi-GPU and Multi-Node Execution

The WorkStream object can be configured with MPI for distributed execution across multiple GPUs on a single node or across multiple nodes in a cluster. For details on creating a multi-GPU or multi-node WorkStream, please refer to the NVIDIA cuQuantum documentation.

However, users should be aware of the performance implications:

Single-Node, Multi-GPU: Scaling from one to multiple GPUs on a single machine introduces significant overhead. A performance benefit is typically only seen for very large system sizes and requires a properly configured MPI and CUDA environment.
Multi-Node: Extending to multiple nodes adds another layer of communication overhead. This is only beneficial for extremely large systems that cannot fit on a single node. Performance is highly dependent on the cluster’s internode GPU communication protocols (e.g., NVLink, InfiniBand), which can be complex to debug and optimize.

Limitations and Best Practices

Please be aware of the following limitations when using the cuQuantum backend.

Rectangular Operators

Rectangular operators (where the number of rows and columns are not equal) are not supported by the CuOperator data format. If your calculation requires rectangular operators as an intermediate step, you must explicitly create them as dense matrices and only convert the final, square operator to the CuOperator format.

Time-Dependent Systems

Time-dependent systems can be defined in two ways in QuTiP: as a list of [Operator, Coefficient] pairs or as a function that returns a Qobj at a given time t.

Coefficient pairs: This format is supported and works well with the backend.
Function-based `f(t) -> Qobj`: This format works but can be inefficient if not constructed carefully. For best performance, the function should return a time-dependent operator defined on the smallest possible Hilbert space. The full operator can then be constructed using qutip.tensor and qutip.QobjEvo.

For example, to create a time-dependent operator acting only on the first subsystem:

# Recommended approach:
# Define the time-dependent part on the smallest Hilbert space.
def g(t):
    # This returns a 2x2 Qobj.
    return (qutip.sigmax() * t).expm()

# Build the full operator using QobjEvo and tensor products.
oper = qutip.tensor(qutip.QobjEvo(g), qutip.qeye(M))

# Avoid this approach:
# This function is inefficient as it creates a (2*M x 2*M) dense matrix
# and lose the tensor structure used for efficient computations.
def f(t):
    return qutip.tensor((qutip.sigmax() * t).expm(), qutip.qeye(M))

oper = qutip.QobjEvo(f)