I have found it!

A lightweight open source
platform independent nd-array library
for the JVM

Platform independent,
extendable and compatible
with any JVM language

Completely open-source
and free to use,
MIT-Licenced

Vendor agnostic
GPU acceleration
through OpenCl

Why Neureka?

Not only is it a flexible nd-array library for general purpose use, but also a tensor library for doing deep learning.

Neureka trains your neural network using a computation graph recorder.
This is contrary to the approaches found in other frameworks such as TensorFlow, Theano, Caffe, and CNTK which require the definition of a computation graph ahead of time.

In that case a developer has to build a neural network structure which cannot change during runtime.
Neureka on the other hand uses the recorded computation graph in order to apply a technique called reverse-mode auto-differentiation.

This technique allows your network structure to change during runtime arbitrarily with zero lag or overhead.

This powerful feature was inspired by PyTorch which also uses a dynamic computation graph to achieve such a high degree of flexibility.

Why Java?

Although Java is a robust and safe language, it is often times considered to be too verbose and explicit for simple prototyping or more explorative workloads... Therefore popular machine learning and tensor / deep learning libraries rely on python, which in many cases offers a more concise syntax.
So one might come to wonder, why would anybody ever build a deep learning library for Java?
The answer is simple!

Nobody did!

This library was written for all JVM-languages, namely: Groovy, Kotlin, Scala, and Jython just to name a few.
Take a look at the following examples side by side!

Neureka can be integrated by any language which compiles to, or understands JVM bytecode!


var x = Tensor.of(3d).setRqsGradient(true)
var b = Tensor.of(-4d)
var w = Tensor.of(2d)
var y = ((x+b)*w)**2
y.backward(1)
// x.getGradient(): "(1):[-8]"


var x = Tensor.of(3d).setRqsGradient(true);
var b = Tensor.of(-4d);
var w = Tensor.of(2d);
var y = Tensor.of("((i0+i1)*i2)^2", x, b, w);
y.backward(1);
// x.getGradient(): "(1):[-8]"


val x = Tensor.of(3.0).setRqsGradient(true)
val b = Tensor.of(-4.0)
val w = Tensor.of(2.0)
val y = Tensor.of("((i0+i1)*i2)^2", x, b, w)
y.backward(1.0)
// x.getGradient(): "(1):[-8]"

Both code snippets express the following equations - `f(x) = ((x-4)*2)^2 | f(3) = 4` - `f(x)' = 8 * x - 32 | f(3)' = -8`

Neurekas API is designed to enable operator overloading in certain languages based on carefully chosen method names which will be translated to operators. This makes it possible to use tensors in mathematical expression in both Groovy and Kotlin to allow for rapid prototyping and highly readable math heavy code...

- `+` : `plus(...)` - `-` : `minus(...)` - `*` : `times(...)` - `/` : `divide(...)` - ...

If you prefer fast prototyping with Jupyter, then Neureka can be used there too.
BeakerX is a jupyter extension that supports many JVM languages like Groovy, Scala, Clojure, Kotlin and Java.

Performance

Not only are the operations within the default backend implemented as generalized, modular and concise as possible, they are also optimized for multi-threading and specifically designed to be auto-vectorizable by the JVM into SIMD machine code instructions.

However performance wise Neureka still has lots of room for improvement, but because it is a lightweight and highly extensible library with a consistent API designed to allow for the support of any backend, you can easily go the extra mile to improve performance for your specific use case. like for example implementing a more specialized kind of OpenCL kernel for convolution...

Currently Neureka is mostly held back by the JVM's lack of allowing for more memory localized types and also a lack of an API for consistent SIMD vectorization.
(...take a look at the upcoming vector API...)

This upcoming vector API alongside the introduction of inline/value types from Project-Valhalla will greatly benefit the performance of Neureka as well as improve Machine Learning on the JVM in general.