Most neural network projects start with data. Lots of data. Images, text, sensor readings - whatever you’re trying to learn, you need thousands of examples.
Our lunar lander project had zero training data. No expert demonstrations. No recorded trajectories. Just a physics simulation and a vague goal: “land softly without crashing.”
So we didn’t train a network. We evolved one.
The Challenge
The problem was deceptively simple. A lunar lander starts somewhere in the sky. It has limited fuel. Gravity pulls it down. The goal is to land gently on a designated pad.
A human can learn this in a few tries. You fire the thrusters to slow your descent, adjust your angle, touch down softly. Easy.
But how do you specify that mathematically? What’s the function that maps sensor readings to thruster controls? What’s the reward signal that captures “land softly”?
Traditional reinforcement learning would give you a reward for every timestep, guiding the network toward good behavior. But we wanted something more interesting. We wanted the network to develop its own strategy.
Enter CTRNN
Continuous-Time Recurrent Neural Networks are weird. Unlike feedforward networks that process input-output pairs, CTRNNs evolve over time. They have internal state. Memory. They can anticipate, plan, adapt.
The math is elegant:
dy/dt = (-y + Σ(w_ij * σ(y_j) + I)) / τ
Each neuron’s activation rate changes based on its current state, weighted inputs from other neurons, and a time constant. The network isn’t just computing - it’s existing through time.
This makes them perfect for control tasks. A CTRNN doesn’t just react to the current sensor reading. It integrates information over time, building an internal model of the system’s dynamics.
The trick is finding the right weights and time constants.
The Evolution
We used a genetic algorithm. The process:
- Generate a population of random CTRNN networks (100 individuals)
- Run each network in the lunar lander simulation
- Score the landing based on velocity, angle, and fuel use
- Select the best performers (top 20%)
- Breed new networks by combining successful ones
- Mutate slightly to explore new strategies
- Repeat for hundreds of generations
No gradients. No backpropagation. No training data. Just simulated evolution.
The genome was simple: an array of floating-point numbers encoding neuron weights, biases, and time constants. The genetic algorithm manipulated these numbers blindly, guided only by fitness scores.
Early generations were chaos. Networks that fired thrusters randomly. Networks that did nothing. Networks that somehow managed to flip upside-down before crashing. Spectacular failures.
But occasionally, one would get lucky. Maybe it fired the thruster at the right time, slowing descent just enough to survive impact. That network’s genes would spread to the next generation.
The Breakthrough
Around generation 50, something clicked. The best networks started showing consistent behavior. They would:
- Fire thrusters early to slow initial descent
- Pause to conserve fuel
- Fire again just before landing
- Cut thrust at the last moment
This wasn’t programmed. We didn’t teach the network about two-phase descent strategies. It emerged from selection pressure.
By generation 100, the top performers were landing consistently with minimal fuel use. Their internal dynamics had organized into a control strategy.
By generation 200, they were optimizing landing precision. Not just soft landings, but landing exactly on target.
The Insight
The key insight was that CTRNNs don’t just map inputs to outputs. They create dynamic attractors. The network’s state space develops basins that correspond to behaviors.
When the lander is high and falling fast, the network state flows toward the “thrust hard” attractor. As it slows and approaches the ground, the state transitions to the “gentle descent” attractor. Near touchdown, it settles into “cut thrust.”
These attractors weren’t designed. They evolved because they worked.
What’s fascinating is that different evolved networks developed different attractor landscapes. Some had sharp transitions between behaviors. Others had smooth gradients. All achieved the goal, but with subtly different strategies.
The Reality Check
This approach has limits. Evolution is slow. 200 generations times 100 individuals times 500 simulation steps - that’s millions of physics updates. On my laptop, it took hours.
Reinforcement learning with gradient descent would be faster. But it would also require carefully shaping reward signals, dealing with sparse rewards, and tuning hyperparameters.
Evolution is dumb but reliable. You don’t need to specify how to learn. You just need to specify what success looks like.
And for our use case - exploring how neural dynamics can solve control tasks - evolution was perfect. We weren’t trying to build the best lunar lander controller. We were trying to understand what kinds of controllers evolution discovers.
The Lesson
CTRNNs plus genetic algorithms is an old idea. Beer’s work in the 90s, evolutionary robotics in the 2000s - this isn’t novel research.
But it’s still magical to watch. You define a problem, set up selection pressure, and let simulated evolution do its thing. No supervision. No hand-holding. Just differential survival of random variations.
And somehow, out of chaos, emerges competence.
The networks don’t just solve the task. They develop internal representations, dynamic strategies, and robust behaviors. All from selection on a simple fitness function.
It makes you wonder what other problems could be solved this way. Not because it’s the most efficient approach, but because it requires so little human insight.
You don’t need to know how to land a lunar lander. You just need to know what a good landing looks like.
Evolution figures out the rest.