QSE - a portable quantum circuit simulator
Future Software Technologies
Semester programme:Master Applied IT
Research group:Future Software
Project group members:Egor Knyazev
Nico Kuijpers
Project description
QuantumSimEvo is a portable, open C++ quantum state-vector simulator built for education and for everyday hardware. It simulates quantum circuits exactly — holding the full 2ⁿ-amplitude state and applying each gate as an exact unitary — so students and researchers can develop, debug, and verify quantum algorithms without access to real quantum hardware.
Its core idea is a single abstract operation interface that separates what a circuit means from how its amplitudes are computed, letting three execution backends — a portable OpenMP baseline, an AVX2 SIMD path, and a CUDA GPU path — plug in interchangeably.
The simulator inspects the host at runtime and selects the fastest available backend, degrading gracefully from GPU to vectorised CPU to plain multithreading. Everything ships as one dependency-free binary with Python bindings, so a working, accelerated simulator runs out of the box on an ordinary laptop.
Context
Optimisation, but today's hardware is scarce, noisy, and hard to access. As a result, most of the everyday work of designing, debugging, and teaching quantum algorithms happens on classical simulators, where the quantum state is exact and can be inspected at any point — unlike real devices, whose state collapses on measurement.
The established simulators are powerful but come with friction that gets between a newcomer and the concepts they are trying to learn. Using them often means installing a large dependency stack, fighting a build, choosing among many configuration properties (simulation method, device, precision, threading, circuit transpilation), or owning a particular accelerator. For a physics or computer-science student meeting quantum computing for the first time on a laptop, that setup cost is a real barrier.
QuantumSimEvo was developed as my Master of Applied IT research at Fontys University of Applied Sciences to address exactly this gap. It deliberately trades peak performance for portability and zero setup: it must run correctly on any commodity machine, accelerate wherever the host allows, and let a new execution backend be added without disturbing code that already works. The research question was whether such a dependency-light, education-first simulator could remain competitive with a production framework. The work targets educators, students, and researchers who need an exact, trustworthy reference simulator they can run immediately, anywhere.
Results
QuantumSimEvo was implemented and validated end-to-end, and benchmarked head-to-head against Qiskit Aer, a mature production simulator, on a consumer laptop (Intel Core i9-12900H, NVIDIA RTX 3070 Ti 8 GB, 32 GB RAM).
Architecture.
The backend-pluggable design worked as intended. The CUDA GPU backend was added as a single new subclass behind the abstract interface, with no changes to the circuit API, the algorithm implementations, or the Python bindings — confirming the central claim that the execution engine can grow by addition rather than rewrite. Backend selection happens once at startup, at microsecond cost, with graceful degradation from GPU to AVX2 to OpenMP.
Expressiveness.
A validation suite was built entirely through the public gate API: Bell and GHZ entanglement primitives plus five algorithms — the Quantum Fourier Transform, Grover's search, Shor's factoring in a space-efficient 4n+2 formulation, QAOA for MaxCut, and a constraint-preserving QAOA (XY-mixer) for the Travelling Salesman Problem. Every algorithm ran unchanged across all three backends, including the deepest circuit (TSP), demonstrating that the abstraction survives nontrivial workloads.
Performance.
On the CPU the abstraction proved effectively free: across GHZ, QFT, and IQFT at 25–30 qubits and 1000 shots, QuantumSimEvo stayed within roughly 12% of Qiskit Aer, and was marginally faster on the QFT — notable for a self-contained tool with a much smaller codebase and no external dependencies. On the GPU, Aer remained faster, but only by a bounded factor of 1.2–3.7×, and the gap narrowed as the state vector grew, because the difference is dominated by fixed per-execution overhead (kernel launches, transpilation, vendor cuStateVec kernels) rather than the amplitude sweep itself. For a hand-written CUDA backend that depends on no vendor quantum library, remaining within this range — and closing it precisely where GPU simulation matters most — is a strong result.
Internally, the GPU overtook the AVX2 CPU backend beyond about 21 qubits, where the state vector outgrows the CPU cache and memory bandwidth dominates; the 25-qubit TSP circuit ran 5.3× faster on the GPU than on the CPU, validating the automatic runtime selection policy. The most demanding workload, Shor's factorisation of 115 on 30 qubits (~39,000 gates), completed in under three hours on the laptop — pushing the exponential memory wall (about 16 GiB at 30 qubits) directly in front of the user, which is itself a pedagogical outcome.
Significance.
The results show that an exact, noiseless, dependency-free state-vector simulator can run out of the box on an ordinary laptop and still deliver production-comparable CPU performance, with a working, competitive GPU path where the hardware allows. Lowering the barrier between a learner and the concepts — rather than topping any single performance chart — is the project's main contribution. The work also produced reproducible benchmarks and an extensible platform that future distributed-memory or custom-gate backends can build on as pure additions.