Up to date

This page is up to date for NumDot stable. If you still find outdated information, please open an issue.

Changelog

Here you will find the release notes for each version of the library. Each section includes information about changes, improvements, and bug fixes.

Version 0.12 - 2026-06-17

Many bugs in this release were found by running NumDot against the Python array-api-tests conformance suite from the Consortium for Python Data API Standards.

Added

nd.where(condition, x, y) selects from x where condition is true and from y otherwise, with broadcasting across all three operands.
New elementwise math functions: nd.log2, nd.log10, nd.log1p, nd.expm1, nd.logaddexp, nd.hypot, nd.copysign, nd.signbit, and nd.floor_divide (Python-style floor toward -infinity, including for integer inputs).
nd.cumsum and nd.cumprod compute cumulative sums and products along an axis (or over the flattened input when axis is null).
nd.diff(a, n, axis) computes the n-th discrete difference along the given axis. Output shrinks by n along that axis (empty if n exceeds the axis length).
nd.meshgrid(arrays, indexing) builds coordinate grids from a list of 1-D arrays. indexing is a StringName accepting &"xy" (default, swaps the first two output axes) or &"ij".
nd.expand_dims(v, axis) inserts a length-1 dimension at the given axis (counts from the end if negative). Returns a view, no copy.
nd.broadcast_to(v, shape) returns a view of v stretched to shape. Front-padded axes and any input axes of length 1 are broadcast (zero-stride); the rest must match the target dimension or the call errors.
nd.squeeze(v, axes) accepts an optional axes argument (int or list) selecting which length-1 axes to drop. The requested axes must all be size 1 or the call errors. Without axes the previous behavior is unchanged: drop every length-1 axis.
nd.moveaxis accepts lists for src and dst (in addition to single ints), moving multiple axes in one call. nd.moveaxis(arr, [0, 1], [-1, -2]) swaps the first two axes to the end.
nd.roll(v, shift, axis) cyclically shifts elements; axis may be null (flatten), an int, or a list paired with shift. Negative and over-sized shifts are normalized.
nd.repeat(v, repeats, axis) repeats each element along axis. repeats is an int (every element) or an array (one count per element along the axis); a length-1 array broadcasts as a scalar. axis = null flattens first.
nd.argmax(a, axis) and nd.argmin(a, axis) return the int64 index of the max / min element along axis (flattened when axis is null).
nd.nonzero(a) returns one int64 1-D array per dimension with the indices of non-zero elements.

Changed

nd.split (and the nd.hsplit / nd.vsplit shorthands) now treats an integer indices_or_sections as the number of equal sub-arrays to produce, matching numpy. Previously the integer was interpreted as the size of each chunk, so nd.split(arr, 3, 0) on a length-3 axis returned one full-sized array instead of three length-1 arrays. The parameter has been renamed from indices_or_section_size to indices_or_sections to reflect this. Callers passing the chunk size will silently get different results — pass axis_length / old_value to recover the previous behavior.
Binary ops (nd.add, nd.subtract, comparisons, bitwise, ...) no longer widen the result dtype when one operand is a GDScript int/float/bool scalar. nd.add(uint8_arr, 5) now stays uint8 instead of returning int64, matching the Array API spec and numpy 2.x's NEP-50 promotion. To keep the old widening behavior, wrap the scalar as a typed array (e.g. nd.array(5, nd.Int64)).

Fixed

nd.round and nd.rint now pass integer/bool arrays through unchanged (used to return null). nd.round also now works on complex arrays.
nd.clip accepts null for min and/or max to leave that side unbounded (used to error).
nd.clip follows the same scalar-promotion rule as binary ops, so nd.clip(uint8_arr, 0, 255) stays uint8.
Reductions (nd.sum, nd.mean, nd.min, nd.max, nd.std, nd.var, ...) accept axis lists in any order and with negative indices. nd.mean(arr, [1, 0]) used to return null.
nd.prod and nd.sum on narrow integer dtypes (int8..``int32``, uint8..``uint32``) accumulate at the wider output dtype (int64 / uint64) instead of overflowing. nd.prod([1291, 1291, 1291]) now returns 2151685171 instead of -2143282125.
nd.array(x, dtype) supports casting from complex sources: complex128 → complex64, complex → real (drops the imaginary part with a warning), and complex → bool. These previously errored.
Boolean conversion of complex arrays was inverted: ndb.all([1+0j]) came back false. nd.array(complex_arr, nd.Bool), ndb.all and ndb.any now return true for nonzero values.
nd.concatenate accepts null for axis, flattening inputs before concatenating.
nd.dumpb produces correct bytes for non-contiguous arrays (e.g. results of nd.flip / nd.transpose / strided slices). Saving and reloading these used to silently corrupt the data.
nd.transpose accepts negative axes in the permutation; nd.transpose(arr, [-1]) no longer returns null.
nd.flip accepts negative axes; nd.flip(arr, -1) used to crash.
Result dtype for nd.concatenate, nd.linspace, nd.arange, nd.matmul / nd.dot, and array-from-nested-array conversion follows numpy's result_type rules: uint8 + uint16 → uint16 (was int32), int32 + uint32 → int64, int64 + uint64 → float64.
nd.linspace(a, b, num) lands out[-1] exactly on b when endpoint is true (used to drift a few ULPs, e.g. 29.000000000000004 instead of 29.0). nd.linspace(a, b, 1) returns [a] instead of [nan].
nd.bitwise_right_shift on negative signed integers now sign-extends when the shift count meets or exceeds the dtype's bit width (e.g. int32(-1) >> 32 now returns -1, was 0), matching the array-api spec.
nd.log no longer returns nan on complex inputs whose magnitude is near the dtype's maximum (e.g. complex64 with |z| ≈ 1.8e19).
nd.sign on complex inputs now returns z / |z| (e.g. nd.sign(0.5+1j) is 0.4472+0.8944j, not 1+0j), matching the array-api spec.
nd.abs on multi-element complex arrays no longer returns nan for elements whose magnitude is near the dtype's maximum (e.g. complex64 with |z| ≈ 1.8e19); 0-D scalars were already correct.
nd.eye with a very large k returns all zeros instead of placing a diagonal of ones (e.g. nd.eye(1, k=2**32) now returns [[0.0]], used to return [[1.0]]).
nd.divide on complex inputs whose magnitudes are near the dtype's maximum (e.g. complex64 with |denom| ≈ 1.8e19) used to return 0 or nan for the result; it now returns the correct quotient.
nd.arange with very large integer start / stop and a float step returned the wrong number of elements (e.g. nd.arange(-2305843009213692800, -2305843009213694530, -91.0) returned 17 instead of 20), and could return a non-empty array where it should have been empty.
nd.atan and nd.atanh on complex inputs at extreme magnitudes (e.g. atan(1+2.77e7j)) used to return ±π/2 with the wrong sign, inf, or nan; they now return the correct value.
nd.reshape from a 1-D array to a multi-dimensional shape used to silently produce column-major output (e.g. nd.reshape(nd.array([1, 2, 3, 4, 5, 6]), [2, 3]) returned [[1, 3, 5], [2, 4, 6]]); it now returns row-major [[1, 2, 3], [4, 5, 6]] to match numpy / NumDot's general convention.
nd.arange returns an empty array when step has the wrong sign for stop - start (used to return garbage data).
nd.arange with step = 0 is rejected with a clean error.
nd.arange with integer arguments above 2**53 could return the wrong number of elements; integer dtypes now use exact integer arithmetic.
Reductions over a non-last axis of a multi-dimensional array (e.g. nd.mean(matrix, 0), nd.sum, nd.std, nd.var, nd.max) returned an empty or wrong-shaped result on Linux and Windows builds; macOS was unaffected.
NDArray.to_godot_array returned wrong data (and could crash Godot) on arrays with more than two elements.
nd.concatenate / nd.hstack / nd.vstack no longer fail with a shape-mismatch error on a single NDArray.
Slice errors in array conversions (to_godot_array, to_packed_*, to_vector*, copy, as_type, iteration) raise a Godot error instead of crashing.

Version 0.11 - 2025-09-14

Fixed

nd.arange no longer produces garbage data.
nd.reshape did not provide a view (instead gave a copy) when given 0-D and 1-D arrays.
nd.reshape did not fill in the result array when given single-element 1-D arrays.
Multi-indexing used to crash when using more than 4 indices.

Version 0.10 - 2025-07-08

Careful, we have some breaking changes in this update! Make sure you read the section below closely before updating. I apologize for the inconvenience caused to running projects!

Added

nd.outer and nd.inner functions for dedicated vector multiplication.
nd.squeeze function.
Mathematical constants (pi, e, euler_gamma, inf, nan). These are currently added as functions, due to limitations of Godot's APIs.

Changed

array.get(null) will no longer select all elements. Instead, it will add an new dimension, analogous to np_array[None]. Use array.get(&":") to select all elements instead.
nd.reshape no longer re-interprets the previous shape (if layout is not row-major). Instead, it iterates the previous array in the correct order, filling elements one by one.
nd.flatten no longer makes a copy if it doesn't need to.
nd.reduce_dot is now called nd.sum_product.
Properties are now accessed without parentheses, e.g. array.shape instead of array.shape(). This holds for dtype, shape, size, buffer_dtype, buffer_size, buffer_size_in_bytes, ndim, strides, strides_layout, and strides_offset.

Fixed

array.get(0) and array.get(&"newaxis") no longer fails or crashes the program.
Restored compatibility with older Linux OS by downgrading to GLIBC 2.35.

Version 0.9 - 2025-04-29

Changed

Update xtensor to 0.26.0, xsimd to 13.2.0, xtl to 0.8.0. This may come with both improvements and regressions.

Fixed

nd.any and nd.all always incorrectly evaluated to true and false respectively.
Fix nd.linspace for non-floating types.

Version 0.8 - 2025-03-05

The backend of NumDot has been completely rewritten! The changes shrink the binary size in half, and make some functions a lot faster. Some bugs may have been silently fixed or introduced in the process. Let us know if you run into any trouble! In addition, there's unit tests now that check validity of functions against NumPy implementations as the ground truth. This should help make NumDot functions more reliable.

Added

Added nd.load and nd.saveb functions, to read and write files from .npy.
Added array_equiv function.
Added conversion functions for Plane, Quaternion, Projection and Basis.
Added NDArray functions for getting slices as variant types (if shape is compatible).
Added complex64 and complex128 array creation shorthand functions.
NumDot now appears in the plugins section of the editor preferences.
NumDot now builds for Android (x86_32, x86_64, arm32).

Changed

Optimizations in the build scheme and code architecture reduced the binary size by 50%.
array_equal now also checks for shape equivalence, and doesn't fail if the shapes are not broadcastable.
1-D array assignment is now about 3.5 times faster.
Single-slice indexing now has a lower latency.
Contiguous array conversions (from NDArray to godot arrays) has been optimized, and can be several times faster.
nd.add and nd.abs, nd.remainder, nd.pow and nd.remainder no longer promote values to higher bit counts.
complex numbers can now be booleanized in some situations.

Fixed

Various functions resulting in bools were broken. This is now fixed.
NDArray found in godot arrays will now properly type hint the resulting array, avoiding accidental promotion.
arange was producing garbage data.
bitwise_left_shift and bitwise_right_shift were incorrectly promoting, and producing undefined behavior when the shift was larger than the bit count (now it defaults to 0).

Version 0.7 - 2024-11-12

Added

Added complex numbers data types (complex64 and complex128).
Added real, imag, conjugate and angle functions for complex numbers.
Added complex_as_vector and vector_as_complex functions for convenient complex number creation and manipulation, similar to real and imag.
Added any layout type, which may bring tiny speed improvements.
Added fft and fft_freq functions.
Added pad function.
Added cross function.
Added ndarray.buffer_size and ndarray.buffer_dtype functions for investigation of underlying buffer types.
Added bitwise functions (bitwise_and, bitwise_or, bitwise_xor, bitwise_not, bitwise_left_shift, bitwise_right_shift).
Added matrix diagonal, diag and trace functions.
Added transpose and flatten to NDArray methods.
Added is_close, array_equal and all_close functions.
Added is_nan, is_inf and is_finite functions.

Changed

In-place adaptations of native godot types speed up conversions (to and from NumDot). In particular, in-place adaptations of packed arrays do not need to copy data on read, and will produce instantaneous copy-on-write copies on to_packed_xxx_array calls for the same type.
ndarray.array_size_in_bytes is now called ndarray.buffer_size_in_bytes.
Custom builds can now disable each function / feature individually. This allows for very fine control of what to include in a custom build, which can reduce NumDot builds down to almost 0mb.
Removed NUMDOT_DISABLE_GODOT_CONVERSION_FUNCTIONS to improve usability. A similar option may be re-added to de-optimize conversions to save space.
Functions no longer declare unnecessary default values.
transpose() can now be called without passing a parameter, which reverses the axes.

Fixed

arange produced 0-size arrays when at least two arguments were passed.

Version 0.6 - 2024-10-28

Added

randn function (random sampling from a normal distribution)
For custom builds, OpenMP support (through the new openmp_threshold compile option, disabled by default). This requires your compiler to support OpenMP.
Add support for web exports (wasm32).

Changed

Contiguous scalar assignment (e.g. array.set(0)) is now about 20x as fast as before.
Mask assignments with scalars are now a bit faster.

Fixed

Assignment from boolean to boolean arrays didn't work properly.
Setting with a mask of incompatible shape to the array didn't properly fail.

Version 0.5 - 2024-10-07

Added

Bounds checks are now enabled everywhere.
Negative indices are now supported everywhere.
Added boolean mask indexing, e.g. a.set(5, nd.greater(a, 5)).
Added index list indexing, e.g. a.set(5, [[0, 1], [4, 2]]).
Added a basic convolve function.
Added the sliding_window_view function.
Added array.copy() and nd.copy(array) functions.
Added positive and negative functions.
Added count_nonzero functions.
Added concatenate, hstack and vstack functions.
Added split, hsplit and vsplit functions.
Added the tile function.
Added scalar optimizations for binary functions. This will greatly accelerate calls like nd.add(array, 5), at the cost of some binary size. This behavior can be disabled with the build flag NUMDOT_DISABLE_SCALAR_OPTIMIZATION.
nd.matmul can now handle matrix-vector multiplication.

Changed

Added array.set(x) should now be slightly faster when only a single element is updated.
Accelerated sum_product. This also affects matmul, dot, and convolve operations.

Fixed

nd.range now behaves properly when called as nd.range(x, null) (i.e. range from x to end).
NDArray interpretation inside of Arrays would result in inhomogenous shape errors.
Fixed NDArray.to_godot_array() producing garbage data and shapes.
Fixed NDArray.to_packed_xxx producing arrays that were too large.
Fixed zeros_like and similar producing garbage arrays when the dtype is not given.

Version 0.4 - 2024-10-03

Added

Added NDRandomGenerator, created by nd.default_rng. It offers .random() for floats, .integers for ints and .spawn() for child generators.
Added new namespaces ndb, ndf and ndi, for full tensor reductions to bool, float and int, respectively.
Added nd.median.
NDArray is now iterable over the outermost dimension.
NDArray conversion functions to and from Color, Vector2, Vector3, Vector4, Vector2i, Vector3i, Vector4i, PackedVector2Array, PackedVector3Array, PackedVector4Array and PackedColorArray.
Added nd.as_array shorthands for every data type, e.g. nd.float32.
(Now really) added the logical_xor function.
Added nd.eye.
Added nd.empty_like, nd.full_like, nd.ones_like and nd.zeros_like.
Added NDArray.strides(), NDArray.strides_layout(), and NDArray.strides_offset(), through which you can inspect the strides properties of an NDArray / NDArray view.

Changed

nd.array and nd.as_array, NDArray.get_float, NDArray.get_int, NDArray.get_bool are now up to 2x faster.
NDArray.to_godot_array now slices into the outermost dimension instead of flattening the array. To get floats and ints directly, use .to_packedxxx.
NDArray.to_packed_xxx now require 0D or 1D arrays to work. If the array is 2D, the conversion is not trivial, and a reshape should be used first.
NumDot now uses Vector4i as a surrogate for range objects. They are represented as (bitmask, start, stop, step). This optimizes range creation, interpretation and memory use.

Version 0.3 - 2024-09-25

Added

Added the dot and sum_product functions.
Added the matmul function.
nd.array([...]) can now handle more complex array inputs, e.g. an array of Vector2i.
Added the stack and unstack functions.
Added NDArray to_bool and get_bool functions.
nd.full now supports bools and arrays for the fill value.
Axes, shape and permutation parameters now have support for more different argument types (including NDArrays).
Added NUMDOT_COPY_FOR_ALL_INPLACE_OPERATIONS flag. This flag allows custom builds to de-optimize in-place operations even for optimal types. This reduces the binary size.
Added NUMDOT_OPTIMIZE_ALL_INPLACE_OPERATIONS flag. This flag allows custom builds to optimize all in-place operations, even for non-optimal target types. This increases the binary size a lot and is not recommended.

Changed

In-place operations with optimal destination types are now optimized by default.
Removed NUMDOT_ASSIGN_INPLACE_DIRECTLY_INSTEAD_OF_COPYING_FIRST compile flag.

Fixed

NDArray set didn't honor the index parameters, and didn't broadcast.

Version 0.2 - 2024-09-20

Added

Added an in-place API to NDArray objects, mirroring the nd API. In-place functions can substantially improve performance for small arrays, because creation of intermediate types is avoided.
Added the NUMDOT_ASSIGN_INPLACE_DIRECTLY_INSTEAD_OF_COPYING_FIRST compiler flag, which improves performance of same-type assignment while increasing the binary size.
Added the norm function (l0, l1, l2 and linf supported).
Added the logical_xor function.
Added the any and all functions.
Added the square function.
Added the clip function.
nd.array can now interpret multi-dimensional boolean arrays.
Documentation is now available in the editor.

Changed

Reduced the binary size by half. In exchange, decrease performance of operations that need a cast before running by ~25%. The C define NUMDOT_CAST_INSTEAD_OF_COPY_FOR_ARGUMENTS lets you revert to the old behavior.
Optimized the compiler arguments for the release binary. On web, it optimizes for size (~30% decrease). For downloadable binaries, it optimizes for performance (2% to 30% increase). You can use custom builds to change the default behavior.

Fixed

Reduction functions now behave properly when casting (they used to crash or produce meaningless results).
Array creation could often lead to the wrong dtype.
nd.prod erroneously evaluated as nd.sum.

Version 0.1 - 2024-09-17

Initial release.

User-contributed notes

Please read the User-contributed notes policy before submitting a comment.