cuQuantum Backend’s Data Layers Guide
What the Backend Provides
The backend introduces two new data layers — the internal representation
a Qobj uses to store its numerical data:
CuOperator: for (square) operators. Stores the operator symbolically as a sum of tensor product terms; the full matrix is never formed. The data is used to form cuQuantum’sOperatorobject lazily when needed.CuState: for state vectors and density matrices. Stores data via cuQuantum’sDensePureState/DenseMixedState.
A Qobj backed by CuOperator or CuState largely follows the same
QuTiP API as any other Qobj. In practice, however, not all operations are
supported for every combination of data layers, and combining Qobjs of
different layers may trigger automatic conversions whose cost and correctness
depend on what layers are involved.
This guide covers those nuances.
Setup: WorkStream and Activation
The WorkStream context
All CuStates construction and GPU computation require a cuQuantum
WorkStream context:
from cuquantum.densitymat import WorkStream
ctx = WorkStream()
CuOperator is CPU-side and symbolic, so constructing one does not require
the context. CuState, however, reads settings.cuDensity["ctx"]
immediately in its constructor to allocate GPU memory. GPU computation
(applying operators to states, running solvers) also requires the ctx.
This ctx is registered when the backend is activated.
Activating the backend
Both set_as_default(ctx) and the CuQuantumBackend(ctx) context manager
register the ctx and change the default data type used when constructing
new Qobjs without an explicit dtype. Both also switch the default
solver integrator to CuVern7.
import qutip_cuquantum
# Method 1: permanent change
qutip_cuquantum.set_as_default(ctx)
# Method 2: scoped to a block; reverts on exit
with qutip_cuquantum.CuQuantumBackend(ctx):
...
What activation does NOT do: it does not convert any existing Qobj.
Qobjs do not change their data layer upon entering or leaving the context
— only Qobjs constructed after activation, without an explicit
dtype, are affected.
Getting Operators and States into the Right Data Layer
Explicit construction
The dtype argument sets the data layer at construction, regardless of
whether the backend is active:
sz = qutip.sigmaz(dtype='CuOperator')
n = qutip.num(20, dtype='CuOperator')
psi = qutip.basis(2, 0, dtype='CuState') # requires ctx to be registered
When no dtype is given, each factory function resolves the type from the
global default using its own sparcity hint — a hint that describes the
structural character of the operator it produces:
"sparse"hint (e.g.,sigmax,sigmay,sigmaz) →CSRnormally,CuOperatorwith plugin active"diagonal"hint (e.g.,num,create,destroy,qeye) →Dianormally,CuOperatorwith plugin active"dense"hint →Densealways, regardless of whether the plugin is active
So activating the plugin changes the default for sparse and diagonal
operators, but dense operators always stay Dense.
Explicit conversion
Any Qobj can be converted to a different data layer at any time with
.to():
sz_cu = qutip.sigmaz().to('CuOperator')
n_cu = qutip.num(20).to('CuOperator')
psi_cu = qutip.basis(2, 0).to('CuState') # requires ctx to be registered
Implicit conversion during operations
When two Qobjs with different data layers are combined (+, @,
*, qutip.tensor, etc.), QuTiP automatically converts one operand
toward the other using the lowest-cost path:
Dense/CSR/Dia→CuOperator: cost 1CuOperator→Dense: cost 10
This only applies when at least one operand is already CuOperator.
Implicit conversion cannot produce CuOperator from scratch — if neither
operand is CuOperator, the result will not be CuOperator either.
sx = qutip.sigmax() # CSR
n_cu = qutip.num(20).to('CuOperator') # explicit CuOperator
H = sx & n_cu # sx is implicitly converted to CuOperator; result is CuOperator
Pitfall: CuOperator only supports square matrices. Implicit conversion
of a rectangular object (e.g., a ket) fails with
ValueError("Rectangular CuOperator are not supported").
Building composite systems
Use qutip.tensor() or the & shorthand (they are equivalent) to build
operators on composite Hilbert spaces. For the result to be CuOperator,
at least one operand must already be CuOperator.
With plugin active — standard quantum operators (Pauli matrices,
number/creation/annihilation operators, identity) use sparse or diagonal
sparcity hints and therefore become CuOperator by default, so composite
Hamiltonians are naturally CuOperator without any explicit conversion:
qutip_cuquantum.set_as_default(ctx)
H_xy = qutip.sigmax() & qutip.sigmay() # all CuOperator
Without plugin active — ensure at least one operand is CuOperator;
the others are pulled in by implicit conversion:
# Convert the subsystem operator to CuOperator explicitly
sx_cu = qutip.sigmax().to('CuOperator')
# sigmay() is Dense here; it gets implicitly converted to CuOperator
# because sx_cu is CuOperator
H_xy = sx_cu & qutip.sigmay()
Combining terms
Addition (+) appends terms to the CuOperator term list:
H = qutip.sigmaz() & qutip.qeye(2) # 1 term
H += qutip.qeye(2) & qutip.sigmaz() # 2 terms
Multiplication (@ or * between two Qobjs — equivalent for
operators) takes the Cartesian product of the two term lists, naturally
producing multi-mode coupling terms:
a1 = qutip.destroy(20) & qutip.qeye(20) # a on mode 0
a2 = qutip.qeye(20) & qutip.destroy(20) # a on mode 1
coupling = a1.dag() @ a2 # one term with operators on both modes
coupling += a1 @ a2.dag()
@/* of an M-term operator by an N-term operator produces up to M×N
terms; + produces M+N terms.
Scalar multiplication scales every term’s coefficient without changing the term list:
H = w1 * (a1.dag() @ a1) + w2 * (a2.dag() @ a2) + g * coupling
Diagonal Operators: the Dia Format
Why it matters
When a CuOperator is applied to a CuState, each per-mode operator
stored inside its terms is converted to a cuQuantum type at GPU execution
time:
Stored
_data.Dia→ cuQuantumMultidiagonalOperator(efficient for diagonal/band-diagonal structure)Stored
_data.Denseor other → cuQuantumDenseOperator
This conversion is lazy — it happens at GPU execution time, not when
constructing or converting the CuOperator. So the dtype of the object
stored inside the term determines GPU efficiency.
Default dtypes of common operators
Operator |
Default dtype |
Stored in CuOperator as |
GPU type at execution |
|---|---|---|---|
|
Dia |
|
|
|
Dia |
|
|
|
Dia |
|
|
|
Dia |
|
|
|
CSR |
|
|
|
CSR |
|
|
|
CSR |
|
|
num(), create(), destroy(), qeye() default to Dia and
produce MultidiagonalOperator without any extra steps. sigmaz() is
structurally diagonal but defaults to CSR, so it needs explicit handling.
sigmax() and sigmay() are tiny off-diagonal matrices; DenseOperator
is appropriate for them.
Handling sigmaz and custom diagonal operators
Without plugin active — sigmaz() is CSR; convert via Dia
before converting to CuOperator:
# stores _data.Dia -> MultidiagonalOperator
sz_cu = qutip.sigmaz().to('Dia').to('CuOperator')
With plugin active — sigmaz() already returns CuOperator (with
Dense stored inside). Calling .to('Dia') on it would first materialize
the full matrix (CuOperator → Dense). Use the dtype
argument to bypass the default instead:
# stores _data.Dia -> MultidiagonalOperator
sz_cu = qutip.sigmaz(dtype='Dia').to('CuOperator')
Custom diagonal operators constructed from arrays default to Dense;
convert via Dia:
H_diag = qutip.Qobj(np.diag(energies)).to('Dia').to('CuOperator')
or explicitly construct it as Dia via dtype argument:
H_diag = qutip.Qobj(np.diag(energies), dtype='Dia').to('CuOperator')
or construct with the qdiags operation:
H_diag = qutip.qdiags(energies, dtype='CuOperator')
CuOperator: Tensor Structure and Performance
The symbolic representation
CuOperator stores operators as a sum of tensor product terms:
op = sum_i factor_i * (A_i0 x A_i1 x ... x A_iN)
Each term records which per-mode operator acts on which subsystem. The full d^N × d^N matrix is never formed. cuQuantum exploits this factored structure to apply operators to states efficiently on the GPU — this is a fundamental source of the performance benefit.
The CuOperator container structure (term list, mode indices, transforms)
is CPU-side. The per-mode operators stored inside each term can be either
CPU-side QuTiP Data objects (Dense, Dia) — the common case when
operators are built from QuTiP functions — or GPU-resident cuQuantum objects
(DenseOperator, MultidiagonalOperator backed by CuPy arrays) if
explicitly constructed that way. In the typical workflow, the per-mode
operators are CPU-side, and data is transferred to the GPU when the
CuOperator is applied to a CuState.
Preserving tensor structure
The symbolic structure is preserved by:
CuOperator + CuOperator— appends termsCuOperator @ CuOperator/CuOperator * CuOperator— Cartesian product of termskron(CuOperator, CuOperator)/A & B— extends mode indicesadjoint/transpose/conj— applies transform flags per modescalar * CuOperator— scales coefficients
The tensor structure is destroyed by converting to Dense — whether
via .to('Dense'), .full(), or .to_array(). The reverse conversion
(materialised Dense back to CuOperator) wraps the full d^N × d^N
matrix as a single dense term acting on the entire Hilbert space, with no
knowledge of the tensor product decomposition. Using such a dense term will
generally result in a considerable performance penalty.
# AVOID: destroys tensor structure
H_bad = H.to('Dense').to('CuOperator')
# for large composite systems, this creates an exponentially large matrix
H_arr = H.full()
Conversion Reference
Registered conversions
CuOperator
From |
To |
Weight |
Notes |
|---|---|---|---|
|
|
1 |
Square matrices only |
|
|
1 |
Square matrices only |
|
|
10 |
Expensive: materializes full matrix on CPU |
|
|
1 |
If qutip-cupy installed |
CuState
From |
To |
Weight |
Notes |
|---|---|---|---|
|
|
1 |
Host2Device transfer; requires ctx to be registered |
|
|
1 |
Device2Host transfer |
|
|
1 |
If qutip-cupy installed |
|
|
1 |
If qutip-cupy installed |
No CSR conversion exists for either type. Paths through CSR go via Dense, chained automatically.
Registered CuOperator operations
Operation |
Result |
Notes |
|---|---|---|
|
|
Symbolic — scales all term coefficients |
|
|
Symbolic — scales all coefficients by −1 |
|
|
Symbolic — appends term lists |
|
|
Symbolic — appends terms with negated factor |
|
|
Symbolic — sets transform flags per mode |
|
|
Symbolic — sets transform flags per mode |
|
|
Symbolic — sets transform flags per mode |
|
|
Symbolic — extends mode index list; underlying operation for
|
|
|
Symbolic — Cartesian product of term lists; no GPU computation |
|
|
GPU execution |
|
|
GPU execution |
|
|
Materializes both operators to dense to compare; expensive |
|
|
Materializes operator to dense to check; expensive |