@@ -27,7 +27,7 @@ Reason being that 5 seconds proved a good value for the \rlpower experiment \cit
The \supg neuron has a reset input that, when triggered, makes the neuron prematurely reset his own internal timer cycle, restarting from 0.
The original intent of this input was to be connected with a touch sensor on the feet of the quadruped robot.
This acts as a double, to indirectly synchronize the legs cycles and to behave better in a situation with an uneven terrain, where a contact on the ground could happen prematurely.
This acts as a double, to indirectly synchronise the legs cycles and to behave better in a situation with an uneven terrain, where a contact on the ground could happen prematurely.
This proved to be a really stable solution even on a flat terrain.
% \todo[inline]{explain more about the stability of \supg in the related work}
@@ -35,16 +35,16 @@ The second component of a \supg neuron is the CPPN network it's a \emph{Composit
The purpose is to transform the sawtooth signal into a more complex signal capable of driving the joint into a meaningful and coordinated movement that will make the robot move forward.
The CPPN will have just one input (the timer) and one output, the signal to the joint, which roughly corresponds to the angle the joint should be at.
Since we potentially have more than one joint in our robot (theoretically infinite ones, practically there is an hardware limit as a too complex robot could collapse on his own weight if implemented in the real world), we need a technique to scale up this strategy.
Solution could be to evolve a different CPPN for every joint, or to have only one CPPN with multiple outputs.
Since we potentially have more than one joint in our robot (theoretically infinite ones, practically there is a hardware limit as a too complex robot could collapse on his own weight if implemented in the real world), we need a technique to scale up this strategy.
The solution could be to evolve a different CPPN for every joint or to have only one CPPN with multiple outputs.
These are both bad design decision, since in the first case it increases the searching space complexity by a lot, and we want a method that is as fast as possible.
The second instead will have a problem in the timer mechanism, since it will be shared instead of being independent for every joint/leg.
Moreover both of them are not exploiting the fact that some
servos should have very similar behaviors and is not known beforehand which ones.
The second instead will have a problem with the timer mechanism, since it will be shared instead of being independent for every joint/leg.
Moreover, both of them are not exploiting the fact that some
servos should have very similar behaviours and which are not known beforehand.
The solution proposed in \cite{morse2013single} is to borrow the notion of substrate in the \hyperneat implementation \cite{gauci2010autonomous} and apply it to the position of the different servos.
A normal\hyperneat implementation would have a CPPN with coordinate inputs, the outputs would then be used as new weights for a ANN or CPPN.
In \supg{s} instead the network is used directly.
A standard\hyperneat implementation would have a CPPN with coordinate inputs; the outputs would then be used as new weights for an ANN or CPPN.
In \supg{s} instead, the network is used directly.
The big difference between a normal CPPN and this implementation is that the CPPN is shared between all joints (remember, timers are not) and every joint has a different value for the coordinate inputs.
This is efficient because it lowers the complexity that the system has to learn and at the same time exploits the fact some servos have similar outputs and other should have complementary ones.
@@ -52,24 +52,24 @@ To sum up, every CPPN has $1+n$ inputs, where $n$ is the number of coordinates c
A \supg neuron is wrapping a CPPN like the one described above, with a coordinate and a timer. It has only the reset timer as inputs\footnote{In \cite{morse2013single} the touch sensor on the foot was directly connected to the reset input} and only the joint angle output.
\subsection{\supg implementation}
The \supg solution presented looked promising, but some adaptation were needed.
The \supg solution presented looked promising, but some adaptations were needed.
First of the current robot design for the \tol project does not include a touch sensor on the feet at all.
Just defining what a foot is in this modular robot design would be a really complicated topic.
Just defining what a foot is in this modular robot design would be a complicated topic.
Even if that was resolved, to keep the reality gap as small as possible, the touch sensor solution should be practically feasible and implementable.
Vibration sensors for every block and bumpers were explored as possible solution, but none of them was a quick solution that could be easily implemented.
Vibration sensors for every block and bumpers were explored as possible solutions, but none of them was a quick solution that could be easily implemented.
Another solution was needed.
Instead of giving up completely on the reset trigger, the temporary solution adopted for this thesis was to create a new output for the CPPN.
If that output was bigger than a threshold (e.g. $x > 0.7$), than the timer trigger would be activated.
Instead of entirely giving up on the reset trigger, the temporary solution adopted for this thesis was to create a new output for the CPPN.
If that output was bigger than a threshold (e.g. $x > 0.9$), than the timer trigger would be activated.
Another modification was to include the sensory data already present on the robot into the CPPN inside the \supg neuron, hoping this would give enough information for the network to know when to trigger the timer reset.
Sensory data input comes from an \emph{Inertial Measurement Unit} (IMU) giving orientation, spin and acceleration data.
Other sensory data comes from 2 photoresistors positioned as eyes and a combined input that substracts one photoresistor value to the other, effectively telling the network if the light/mating area is towards the left or the right.
The combined input is not strictly necessary as is something the network could learn on his own but since is very likely to be an important information to have, we decided to give the network a little advantage.
Other sensory data comes from 2 photoresistors positioned as eyes and a combined input that subtracts one photoresistor value to the other, effectively telling the network if the light/mating area is towards the left or the right.
The combined input is not strictly necessary as is something the network could learn on his own, but since it's very likely to be an important information to have, we decided to give the network a little advantage.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Kickstarting the learning}
\label{ch:Method:SUPG:Kickstart}
After looking at some preliminary experiments it became clear that there was an issue in the learning process: the network was starting from blank, with a lot of inputs to work with.
After looking at some preliminary experiments, it became clear that there was an issue in the learning process: the network was starting from blank, with a lot of inputs to work with.
It was testing a lot of controllers that were sending a static position instead of a variating signal, meaning the robot would not move.
The reason is that the network did not make use of the timer input, therefore not having a time-variating input, only static
A solution was proposed to start the first networks with an hard-coded connection between the timer and the joint angle output.
@@ -83,18 +83,18 @@ As seen in the results in \ref{ch:Results} this improved the learning speed for
\label{ch:Method:NEAT}
NEAT (\emph{NeuroEvolution of Augmenting Topologies}) was first presented in 2002 \cite{stanley:ec02} as an improvement over several \emph{Topology and Weight Evolving Artificial Neural Networks} methods (\tweann in short).
It's a Genetic Algorithm dedicated to the evolution of Artificial Neural Networks.
The most important achievement of this genetic algorithm is the ability to optimize and complexify a network at the same time, with no predefined maximum complexity.
In addition, the output network is suitable for non-Markovian tasks as well, as NEAT can generate recurrent connections that can represent memory.
The most significant achievement of this genetic algorithm is the ability to optimise and complexify a network at the same time, with no predefined maximum complexity.
Besides, the output network is suitable for non-Markovian tasks as well, as NEAT can generate recurrent connections that can represent memory.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Speciation}
An important mechanism introduced in NEAT is the use of speciation to protect innovation.
Being an Evolutionary Algorithm that increase complexity over time, NEAT starts with the simplest configuration possible and it slowly adds complexity.
The introduction of new nodes in an organism transforms a possibly good linear classifier into a nonoptimized, nonlinear one.
The argument here is that the new gene is capable of introducing new better results, but at the very beginning it will show lower fitness values, therefore it needs some sort of protection in order to wait an evaluation of its full potential.
Being an Evolutionary Algorithm that increases complexity over time, NEAT starts with the simplest configuration possible and it slowly adds complexity.
The introduction of new nodes in an organism transforms a possibly good linear classifier into a non-optimized, non-linear one.
The argument here is that the new gene is capable of introducing new and better results, but at the very beginning it will show lower fitness values; therefore it needs some sort of protection in order to wait for an evaluation of its full potential.
With the division of organism into species and an explicit fitness sharing, the innovation is protected long enough to make itself prove his usefulness.
Fitness in a species is increased by the individuals' fitness results and decreased by the number of organisms in the specie.
Fitness in a species is increased by the individuals' fitness results and decreased by the number of organisms in the species.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Reproduction}
@@ -108,8 +108,8 @@ It always duplicates itself into the new generation (elitism) without any mutati
The remaining slots are then assigned to randomly picked organism of the species (species champion included).
Every organism has a chance of mutating or mating with another organism. There is also a small chance of the mate being outside of its species.
The selection of the parents is just a equally random choice between all organisms in the species.
This method has been adopted in favour of a roulette system because the species population usually remains really low and there wasn't any measured benefit from using a more complicated selection mechanism.
The selection of the parents is just an equally random choice between all organisms in the species.
This method has been adopted in favour of a roulette system because the species population usually remains low and there wasn't any measured benefit from using a more complicated selection mechanism.
% \todo[inline]{find citation for this, or just directly quote the code}
% from code:
% //Note: You don't get much advantage from a roulette here
@@ -118,17 +118,22 @@ This method has been adopted in favour of a roulette system because the species
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Mutation operator}
If a mutation occurs on an organism, it can be of 3 different types.
The first one is just a weight mutation, which does not modify the network complexity but it just searches for an optimization over the current organism configuration.
Weight mutation is widely more used than other types of mutation because it needs to occour more frequently.
If a mutation occurs in an organism, it can be of 3 different types.
The first one is just a weight mutation, which does not modify the network complexity but it only searches for optimisations of the current organism configuration.
Weight mutation is widely more used than other types of mutations because it needs to occur more frequently.
Every newly modified network needs to be evaluated with different weights before we can decide the usefulness of a new node or connection.
The second type of mutation is a mutation that adds a new connection. It selects randomly two existing nodes in the network and creates a new connection. The new weight is just a random float number in the $[0,1]$ range. There is a chance that the new link will be a recurrent link.
The third type of mutation is a mutation that adds a new node. It works by splitting an existing connection into 2 new connections with the node in the middle. Only the ending part of the link gets the previous weight, while the new connection previous to the node gets a weight of $1.0$.
This particular choice of weights is to make the new network with an added node equal to the previous one. Exploitation of this new connection will be explored by new generation by mutating the link weights.
The third type of mutation is a mutation that adds a new node.
It works by splitting an existing connection into two new connections with the node in the middle.
Of the two new connections, only the second connection (middle to end) gets the previous weight, while the first connection (start to middle) gets a weight of $1.0$.
This particular choice of weights is to make the new network with an added node equal to the previous one.
New generations will explore the search space and exploit this new connection by mutating the link weights and adding new connections.
A mutation that removes a node or a link is intentionally missing. This is because NEAT has been developed around the idea that the final network should be as minimal as possible, evolving from a minimal network and slowly developing only the important traits. The removal of useless traits is generally delegated to the extinction of an organism or an entire species, without the need to explicitly do so.
A mutation that removes a node or a link is intentionally missing.
This is because NEAT has been developed around the idea that the final network should be as minimal as possible, evolving from a minimal network and slowly developing only the important traits.
The removal of useless traits is generally delegated to the extinction of an organism or an entire species, without the need to explicitly do so.
This choice has also the side advantage of reducing the search space.
% \todo[inline]{Verify this last statement}
@@ -136,7 +141,7 @@ This choice has also the side advantage of reducing the search space.
\subsubsection{Crossover operator}
An important component of NEAT and improvement over previous \tweann methods is the introduction of a crossover operator that enables crossover only on similar genes.
This process is also present in nature and is called \emph{synapsis}.
It happens for a very specific reason: if any gene could crossover with any other gene, with an increasing complexity, the number of surviving offsprings that are not fit to be born would simply be too much.
It happens for a very specific reason: if any gene could crossover with any other gene, with increasing complexity, the number of surviving offsprings that are not fit to be born would simply be too much.
The same way, most of the resulting offsprings from a crossover of to different networks done without any criteria could result in a totally useless result that is not a mix or a combination of the parents, but more like a genetic abomination.
NEAT is able to efficiently achieve a form of \emph{artificial synapsis} by introducing \emph{historical markings} on new genes.
@@ -151,4 +156,4 @@ The neurons in the ANN are positioned in what is called a \emph{substrate}.
The substrate is a geometrical space with $n$ dimensions.
Each neuron of the ANN is placed in the substrate and can be located with coordinates.
This setup with an indirect encoding and a substrate with a coordinates system allows the network to exploit symmetry and other geometrical properties; e.g. a ANN that needs to recognize a butterfly in a picture can learn the shape of the butterfly wings as one indirect feature instead of learning them separately, as two completely different features.
This setup with an indirect encoding and a substrate with a coordinates system allows the network to exploit symmetry and other geometrical properties; e.g. an ANN that needs to recognise a butterfly in a picture can learn the shape of the butterfly wings as one indirect feature instead of learning them separately, as two completely different features.