Up to date

This page is up to date for NumDot stable. If you still find outdated information, please open an issue.

Optimizing Performance of Operations

If you find your mathematical operation is running too slow, consider these steps to optimize:

Question your Algorithm: Are you using the optimal algorithm for the task? Perhaps it is possible to:
- Use an algorithm with better runtime complexity (e.g. O(n log n) instead of O(n^2).
- Avoid slow functions. A famous example of such an optimization is the fast inverse square root.
Optimize NumDot Use: You may be able to speed up your algorithm by using specific tricks to speed up your algorithm, documented in this article.
Custom Build: When you're sure you optimized everything you can, you can substantially speed up your algorithm by implementing it in C++, interfacing with xtensor directly. This is documented in the articles for Custom Builds.

Vectorization

The most common mistake when using tensor math libraries is to not vectorize enough. This means using manual iteration, when a broadcasting iteration could be used.

Consider the following example:

var vectors := nd.zeros([1000, 2])
for i in vectors.shape()[0]:
    var prog := i / 10.0
    vectors.set(Vector2(sin(prog), cos(prog)), i)

With vectorization, it would execute much, much faster:

var vectors := nd.stack([1000, 2])
var prog := nd.divide(nd.arange(1000), 10)
vectors.set(nd.sin(prog), &":", 0)
vectors.set(nd.cos(prog), &":", 1)

As a rule of thumb: The fewer calls you make to NumDot, the faster your algorithm executes.

In-Place Operations

Every operation in nd allocates new memory. Avoiding new allocations, especially for repeated operations, can improve performance of your operations by up to 2x.

Consider this example:

var positions: NDArray
var velocities: NDArray

func _ready():
    # TODO Use random when we have it
    positions = nd.zeros([1000, 2])
    velocities = nd.ones([1000, 2])

func _update():
    positions = nd.add(positions, velocities)

It would be much faster to directly assign to positions using in-place operations:

# [...]

func _update():
    positions.assign_add(positions, velocities)

Godot Conversions

NDArray has accelerated functions for godot Variant types:

# Slow: This access creates an intermediate 0-D tensor.
var f: float = array.get(5).to_float()

# Fast: This access directly returns a float.
var f: float = array.get_float(5)

# Slow: Conversion is not accelerated.
var packed := PackedFloat32Array()
packed.resize(array.size())
for i in array.shape()[0]:
    packed[i] = array.gef_float(i)

# Fast: Conversion is accelerated.
var packed := array.to_float32_array()

NumDot can also avoid creating tensors for no-axis reductions:

if ndb.all(tensor):
    print("All")

if ndf.mean(tensor) > 0.2:
    print("mean > 0.2")

if ndi.sum(tensor) > 10:
    print("sum > 10")

Unintentional Promotions

GDScript's standard int and float types are fairly powerful (64 bits). Operations on 32-bit types may lead to faster execution times. However, it may happen that you unintentionally promote a type:

var array := nd.ones([2, 5], nd.DType.Float32)

# Unintentional promotion to 64 bit
var result = array.multiply(array, 2.5)

# Result stays 32-bit
var result = array.multiply(array, nd.float32(2.5))

Cache Constants

When operations run every frame, avoid unnecessarily re-creating constants:

var positions: NDArray

func _ready():
    positions = nd.default_rng().random([1000, 2])

func _update():
    # Intermediate tensor created every frame
    positions = nd.add(positions, Vector2(5, 6))

Consider storing the constant tensor:

var positions: NDArray
var velocity := nd.array(Vector2(5, 6))

func _ready():
    # TODO Use random when we have it
    positions = nd.default_rng().random([1000, 2])

func _update():
    # Use of existing tensor accelerates the call.
    positions = nd.add(positions, velocity)

In extreme situations, this may apply even to calls to nd.range and similar!

User-contributed notes

Please read the User-contributed notes policy before submitting a comment.