.. _usage-guide:

**************************************
cuQuantum Backend's Data Layers Guide
**************************************

What the Backend Provides
=========================

The backend introduces two new **data layers** — the internal representation
a ``Qobj`` uses to store its numerical data:

- ``CuOperator``: for (square) operators. Stores the operator symbolically
  as a sum of tensor product terms; the full matrix is never formed.
  The data is used to form cuQuantum's ``Operator`` object lazily when needed.
- ``CuState``: for state vectors and density matrices. Stores data
  via cuQuantum's ``DensePureState`` / ``DenseMixedState``.

A ``Qobj`` backed by ``CuOperator`` or ``CuState`` largely follows the same
QuTiP API as any other ``Qobj``. In practice, however, not all operations are
supported for every combination of data layers, and combining ``Qobj``\s of
different layers may trigger automatic conversions whose cost and correctness
depend on what layers are involved.
*This guide covers those nuances.*


Setup: WorkStream and Activation
=================================

The WorkStream context
----------------------

All ``CuState``\s construction and GPU computation require a cuQuantum
``WorkStream`` context:

.. code-block:: python

    from cuquantum.densitymat import WorkStream
    ctx = WorkStream()

``CuOperator`` is CPU-side and symbolic, so constructing one does not require
the context. ``CuState``, however, reads ``settings.cuDensity["ctx"]``
immediately in its constructor to allocate GPU memory. GPU computation
(applying operators to states, running solvers) also requires the ctx.
This ctx is registered when the backend is activated.

Activating the backend
----------------------

Both ``set_as_default(ctx)`` and the ``CuQuantumBackend(ctx)`` context manager
register the ctx and change the **default data type** used when constructing
new ``Qobj``\s without an explicit ``dtype``. Both also switch the default
solver integrator to ``CuVern7``.

.. code-block:: python

    import qutip_cuquantum

    # Method 1: permanent change
    qutip_cuquantum.set_as_default(ctx)

    # Method 2: scoped to a block; reverts on exit
    with qutip_cuquantum.CuQuantumBackend(ctx):
        ...

**What activation does NOT do:** it does not convert any existing ``Qobj``.
``Qobj``\s do not change their data layer upon entering or leaving the context
— only ``Qobj``\s constructed *after* activation, without an explicit
``dtype``, are affected.

Getting Operators and States into the Right Data Layer
======================================================

Explicit construction
---------------------

The ``dtype`` argument sets the data layer at construction, regardless of
whether the backend is active:

.. code-block:: python

    sz  = qutip.sigmaz(dtype='CuOperator')
    n   = qutip.num(20, dtype='CuOperator')
    psi = qutip.basis(2, 0, dtype='CuState')   # requires ctx to be registered

When no ``dtype`` is given, each factory function resolves the type from the
global default using its own **sparcity hint** — a hint that describes the
structural character of the operator it produces:

- ``"sparse"`` hint (e.g., ``sigmax``, ``sigmay``, ``sigmaz``) →
  ``CSR`` normally, ``CuOperator`` with plugin active
- ``"diagonal"`` hint (e.g., ``num``, ``create``, ``destroy``, ``qeye``) →
  ``Dia`` normally, ``CuOperator`` with plugin active
- ``"dense"`` hint → ``Dense`` always, regardless of whether the plugin is
  active

So activating the plugin changes the default for sparse and diagonal
operators, but **dense operators always stay** ``Dense``.

Explicit conversion
-------------------

Any ``Qobj`` can be converted to a different data layer at any time with
``.to()``:

.. code-block:: python

    sz_cu  = qutip.sigmaz().to('CuOperator')
    n_cu   = qutip.num(20).to('CuOperator')
    psi_cu = qutip.basis(2, 0).to('CuState')   # requires ctx to be registered

Implicit conversion during operations
--------------------------------------

When two ``Qobj``\s with different data layers are combined (``+``, ``@``,
``*``, ``qutip.tensor``, etc.), QuTiP automatically converts one operand
toward the other using the lowest-cost path:

- ``Dense``/``CSR``/``Dia`` → ``CuOperator``: cost 1
- ``CuOperator`` → ``Dense``: cost 10

**This only applies when at least one operand is already** ``CuOperator``.
Implicit conversion cannot produce ``CuOperator`` from scratch — if neither
operand is ``CuOperator``, the result will not be ``CuOperator`` either.

.. code-block:: python

    sx   = qutip.sigmax()                    # CSR
    n_cu = qutip.num(20).to('CuOperator')   # explicit CuOperator

    H = sx & n_cu   # sx is implicitly converted to CuOperator; result is CuOperator

**Pitfall:** ``CuOperator`` only supports square matrices. Implicit conversion
of a rectangular object (e.g., a ket) fails with
``ValueError("Rectangular CuOperator are not supported")``.

Building composite systems
--------------------------

Use ``qutip.tensor()`` or the ``&`` shorthand (they are equivalent) to build
operators on composite Hilbert spaces. For the result to be ``CuOperator``,
at least one operand must already be ``CuOperator``.

**With plugin active** — standard quantum operators (Pauli matrices,
number/creation/annihilation operators, identity) use sparse or diagonal
sparcity hints and therefore become ``CuOperator`` by default, so composite
Hamiltonians are naturally ``CuOperator`` without any explicit conversion:

.. code-block:: python

    qutip_cuquantum.set_as_default(ctx)

    H_xy = qutip.sigmax() & qutip.sigmay()   # all CuOperator

**Without plugin active** — ensure at least one operand is ``CuOperator``;
the others are pulled in by implicit conversion:

.. code-block:: python

    # Convert the subsystem operator to CuOperator explicitly
    sx_cu = qutip.sigmax().to('CuOperator')

    # sigmay() is Dense here; it gets implicitly converted to CuOperator
    # because sx_cu is CuOperator
    H_xy = sx_cu & qutip.sigmay()

Combining terms
---------------

**Addition** (``+``) appends terms to the CuOperator term list:

.. code-block:: python

    H  = qutip.sigmaz() & qutip.qeye(2)   # 1 term
    H += qutip.qeye(2) & qutip.sigmaz()   # 2 terms

**Multiplication** (``@`` or ``*`` between two Qobjs — equivalent for
operators) takes the Cartesian product of the two term lists, naturally
producing multi-mode coupling terms:

.. code-block:: python

    a1 = qutip.destroy(20) & qutip.qeye(20)   # a on mode 0
    a2 = qutip.qeye(20) & qutip.destroy(20)   # a on mode 1

    coupling  = a1.dag() @ a2    # one term with operators on both modes
    coupling += a1 @ a2.dag()

``@``/``*`` of an M-term operator by an N-term operator produces up to M×N
terms; ``+`` produces M+N terms.

**Scalar multiplication** scales every term's coefficient without changing the
term list:

.. code-block:: python

    H = w1 * (a1.dag() @ a1) + w2 * (a2.dag() @ a2) + g * coupling

Diagonal Operators: the Dia Format
====================================

Why it matters
--------------

When a ``CuOperator`` is applied to a ``CuState``, each per-mode operator
stored inside its terms is converted to a cuQuantum type at GPU execution
time:

- Stored ``_data.Dia`` → cuQuantum ``MultidiagonalOperator`` (efficient for
  diagonal/band-diagonal structure)
- Stored ``_data.Dense`` or other → cuQuantum ``DenseOperator``

This conversion is lazy — it happens at GPU execution time, not when
constructing or converting the ``CuOperator``. So the dtype of the object
stored inside the term determines GPU efficiency.

Default dtypes of common operators
-----------------------------------

.. list-table::
   :header-rows: 1
   :widths: 20 15 25 30

   * - Operator
     - Default dtype
     - Stored in CuOperator as
     - GPU type at execution
   * - ``num(N)``
     - **Dia**
     - ``_data.Dia``
     - ``MultidiagonalOperator`` (efficient)
   * - ``create(N)``
     - **Dia**
     - ``_data.Dia``
     - ``MultidiagonalOperator`` (efficient)
   * - ``destroy(N)``
     - **Dia**
     - ``_data.Dia``
     - ``MultidiagonalOperator`` (efficient)
   * - ``qeye(N)``
     - **Dia**
     - ``_data.Dia``
     - ``MultidiagonalOperator`` (efficient)
   * - ``sigmaz()``
     - **CSR**
     - ``_data.Dense``
     - ``DenseOperator`` (see note below)
   * - ``sigmax()``
     - **CSR**
     - ``_data.Dense``
     - ``DenseOperator`` (tiny off-diagonal, appropriate)
   * - ``sigmay()``
     - **CSR**
     - ``_data.Dense``
     - ``DenseOperator`` (tiny off-diagonal, appropriate)

``num()``, ``create()``, ``destroy()``, ``qeye()`` default to ``Dia`` and
produce ``MultidiagonalOperator`` without any extra steps. ``sigmaz()`` is
structurally diagonal but defaults to ``CSR``, so it needs explicit handling.
``sigmax()`` and ``sigmay()`` are tiny off-diagonal matrices; ``DenseOperator``
is appropriate for them.

Handling sigmaz and custom diagonal operators
---------------------------------------------

**Without plugin active** — ``sigmaz()`` is ``CSR``; convert via ``Dia``
before converting to ``CuOperator``:

.. code-block:: python

    # stores _data.Dia -> MultidiagonalOperator
    sz_cu = qutip.sigmaz().to('Dia').to('CuOperator')

**With plugin active** — ``sigmaz()`` already returns ``CuOperator`` (with
``Dense`` stored inside). Calling ``.to('Dia')`` on it would first materialize
the full matrix (``CuOperator`` → ``Dense``). Use the ``dtype``
argument to bypass the default instead:

.. code-block:: python

    # stores _data.Dia -> MultidiagonalOperator
    sz_cu = qutip.sigmaz(dtype='Dia').to('CuOperator')

**Custom diagonal operators** constructed from arrays default to ``Dense``;
convert via ``Dia``:

.. code-block:: python

    H_diag = qutip.Qobj(np.diag(energies)).to('Dia').to('CuOperator')

or explicitly construct it as ``Dia`` via ``dtype`` argument:

.. code-block:: python

    H_diag = qutip.Qobj(np.diag(energies), dtype='Dia').to('CuOperator')

or construct with the `qdiags` operation:

.. code-block:: python

    H_diag = qutip.qdiags(energies, dtype='CuOperator')

CuOperator: Tensor Structure and Performance
=============================================

The symbolic representation
----------------------------

``CuOperator`` stores operators as a **sum of tensor product terms**::

    op = sum_i  factor_i * (A_i0 x A_i1 x ... x A_iN)

Each term records which per-mode operator acts on which subsystem. The full
d^N × d^N matrix is never formed. cuQuantum exploits this factored structure
to apply operators to states efficiently on the GPU — this is a fundamental
source of the performance benefit.

The ``CuOperator`` container structure (term list, mode indices, transforms)
is **CPU-side**. The per-mode operators stored inside each term can be either
CPU-side QuTiP ``Data`` objects (``Dense``, ``Dia``) — the common case when
operators are built from QuTiP functions — or GPU-resident cuQuantum objects
(``DenseOperator``, ``MultidiagonalOperator`` backed by CuPy arrays) if
explicitly constructed that way. In the typical workflow, the per-mode
operators are CPU-side, and data is transferred to the GPU when the
``CuOperator`` is applied to a ``CuState``.

Preserving tensor structure
----------------------------

The symbolic structure is preserved by:

- ``CuOperator + CuOperator`` — appends terms
- ``CuOperator @ CuOperator`` / ``CuOperator * CuOperator`` — Cartesian
  product of terms
- ``kron(CuOperator, CuOperator)`` / ``A & B`` — extends mode indices
- ``adjoint``/``transpose``/``conj`` — applies transform flags per mode
- ``scalar * CuOperator`` — scales coefficients

The tensor structure is **destroyed** by converting to ``Dense`` — whether
via ``.to('Dense')``, ``.full()``, or ``.to_array()``. The reverse conversion
(materialised ``Dense`` back to ``CuOperator``) wraps the full d^N × d^N
matrix as a single dense term acting on the entire Hilbert space, with no
knowledge of the tensor product decomposition. Using such a dense term will
generally result in a considerable performance penalty.

.. code-block:: python

    # AVOID: destroys tensor structure
    H_bad = H.to('Dense').to('CuOperator')
    # for large composite systems, this creates an exponentially large matrix
    H_arr = H.full()

Conversion Reference
====================

Registered conversions
-----------------------

**CuOperator**

.. list-table::
   :header-rows: 1
   :widths: 15 15 10 50

   * - From
     - To
     - Weight
     - Notes
   * - ``Dense``
     - ``CuOperator``
     - 1
     - Square matrices only
   * - ``Dia``
     - ``CuOperator``
     - 1
     - Square matrices only
   * - ``CuOperator``
     - ``Dense``
     - 10
     - Expensive: materializes full matrix on CPU
   * - ``CuPyDense``
     - ``CuOperator``
     - 1
     - If qutip-cupy installed

**CuState**

.. list-table::
   :header-rows: 1
   :widths: 15 15 10 50

   * - From
     - To
     - Weight
     - Notes
   * - ``Dense``
     - ``CuState``
     - 1
     - Host2Device transfer; requires ctx to be registered
   * - ``CuState``
     - ``Dense``
     - 1
     - Device2Host transfer
   * - ``CuPyDense``
     - ``CuState``
     - 1
     - If qutip-cupy installed
   * - ``CuState``
     - ``CuPyDense``
     - 1
     - If qutip-cupy installed

No CSR conversion exists for either type. Paths through CSR go via Dense,
chained automatically.

Registered CuOperator operations
----------------------------------

.. list-table::
   :header-rows: 1
   :widths: 35 20 40

   * - Operation
     - Result
     - Notes
   * - ``mul(CuOperator, scalar)``
     - ``CuOperator``
     - Symbolic — scales all term coefficients
   * - ``neg(CuOperator)``
     - ``CuOperator``
     - Symbolic — scales all coefficients by −1
   * - ``add(CuOperator, CuOperator)``
     - ``CuOperator``
     - Symbolic — appends term lists
   * - ``sub(CuOperator, CuOperator)``
     - ``CuOperator``
     - Symbolic — appends terms with negated factor
   * - ``adjoint(CuOperator)``
     - ``CuOperator``
     - Symbolic — sets transform flags per mode
   * - ``transpose(CuOperator)``
     - ``CuOperator``
     - Symbolic — sets transform flags per mode
   * - ``conj(CuOperator)``
     - ``CuOperator``
     - Symbolic — sets transform flags per mode
   * - ``kron(CuOperator, CuOperator)``
     - ``CuOperator``
     - Symbolic — extends mode index list; underlying operation for
       ``qutip.tensor`` / ``&``
   * - ``matmul(CuOperator, CuOperator)``
     - ``CuOperator``
     - Symbolic — Cartesian product of term lists; no GPU computation
   * - ``matmul(CuOperator, CuState)``
     - ``CuState``
     - GPU execution
   * - ``matmul(CuState, CuOperator)``
     - ``CuState``
     - GPU execution
   * - ``isequal(CuOperator, CuOperator)``
     - ``bool``
     - Materializes both operators to dense to compare; expensive
   * - ``isherm(CuOperator)``
     - ``bool``
     - Materializes operator to dense to check; expensive