Why Can't RNNs Learn Math? Automata-Inspired RNNs for Exact Computation

Simon Bührer, Preprint, July 2025

PDF
Code

Abstract

RNNs are Turing-complete in theory but usually fail at exact arithmetic in practice. This work compiles p-stack automata into trainable RNNs, using stack splitting so the state grows linearly instead of exponentially, and a five-layer Clipped-ReLU network that reproduces the automaton exactly. It then looks at why training is so hard: the loss has a narrow V-shaped basin around the optimum that gradient descent tends to miss. Initialising close to that basin raises accuracy from 71.7% to about 90% across 13 arithmetic and bitwise operations.

Why Can't RNNs Learn Math? Automata-Inspired RNNs for Exact Computation

Abstract

Tags