@@ -108,7 +108,7 @@ It also differs because the system has to do incremental learning using computat
Also interesting to note is research done on non-predefined pedal walking, learned on both simulated and real hardware robots\cite{Zykov04evolvingdynamic}.
They propose a genetic algorithm to develop gait learing on a robot that resembles a worm and has embedded computational abilities.
Their approach is to learn directly on the hardware plaform and they don't talk about transferability of the brains.
Their approach is to learn directly on the hardware plaform and they do not talk about transferability of the brains.
Other interesting works that we find are studies on modular robots and learning techniques \cite{bourquin2004self, marbach2005online}.
...
...
@@ -195,12 +195,12 @@ In summary, they work quite good within this framework, but they cannot provide
They consist of groups of neurons capable of producing an output with rhythmic and patterned characteristics without the need of an external input.
They are used by animals to generate movements that can vary from simple ones like walking to more complex ones like dancing.
CPGs can be simulated in software.
They can be used as open loop controllers since, like their natural counterpart, they don't need externals signals.
They can be used as open loop controllers since, like their natural counterpart, they do not need externals signals.
CPGs can also be implemented effectively as closed-loop controllers.
Advantages of CPG controllers are (from \cite{Ijspeert1014739}):
%discussed by Ijspeert, he in the specific identifies five properties:
\begin{itemize}
\item Resistance to perturbation, CPGs don't depend from outside inputs and moreover, even in case of perturbation like in case of a change in the configuration variables, they rapidly return to their rhythmic patterns.
\item Resistance to perturbation, CPGs do not depend from outside inputs and moreover, even in case of perturbation like in case of a change in the configuration variables, they rapidly return to their rhythmic patterns.
\item Convenience, CPGs are handy to use in distributed implementations, particularly useful in case of modular robots.
\item Dimensionality reduction of the controlling problem, thanks to their need for only few control parameters.
\item Easy implementation of sensory feedback, that can be simply added as a coupling variable in the CPG equation.
...
...
@@ -343,9 +343,9 @@ Their interactions result in iterative improvement of the quality of problem sol
\rlpower\cite{kober2009learning, dangelo2014hyperneat}, abbreviation for \textbf{R}einforcement \textbf{L}earning with \textbf{Po}licy \textbf{L}earning by \textbf{W}eighting \textbf{E}xploration with the \textbf{R}eturns, is a reinforced learning algorithm particularly effective for on-line gait development.
\rlpower\cite{kober2009learning, dangelo2014hyperneat}, abbreviation for \textbf{R}einforcement \textbf{L}ear\-ning with \textbf{Po}licy \textbf{L}ear\-ning by \textbf{W}eighting \textbf{E}xploration with the \textbf{R}eturns, is a reinforced learning algorithm particularly effective for on-line gait development.
The \rlpower implementation follows the description by Jens Kober, Jan Peters \cite{kober2009learning}.
As described by Kober and Peters, the strength of \rlpower is the fact that despite being a reinforcement learning algorithm it doesn't explore the whole state and action space.
As described by Kober and Peters, the strength of \rlpower is the fact that despite being a reinforcement learning algorithm it does not explore the whole state and action space.
Such extensive exploration would in fact require an exorbitant amount of time due to the complexity of a robot.
\rlpower instead relies on a local reinforcement learning method improving the movements using previous actions.
@@ -35,10 +35,9 @@ The second component of a \supg neuron is the CPPN network it's a \emph{Composit
The purpose is to transform the sawtooth signal into a more complex signal capable of driving the joint into a meaningful and coordinated movement that will make the robot move forward.
The CPPN will have just one input (the timer) and one output, the signal to the joint, which roughly corresponds to the angle the joint should be at.
Since we don't have only one joint in our robot, but theoretically infinite ones (while practically there is an hardware limit as a too complex robot could collapse on his own weight if implemented in the real world), we need a technique to scale up this strategy.
Since we potentially have more than one joint in our robot (theoretically infinite ones, practically there is an hardware limit as a too complex robot could collapse on his own weight if implemented in the real world), we need a technique to scale up this strategy.
Solution could be to evolve a different CPPN for every joint, or to have only one CPPN with multiple outputs.
These are both bad design decision, since in the
first case it increases the searching space complexity by a lot, and we want a method that is as fast as possible.
These are both bad design decision, since in the first case it increases the searching space complexity by a lot, and we want a method that is as fast as possible.
The second instead will have a problem in the timer mechanism, since it will be shared instead of being independent for every joint/leg.
Moreover both of them are not exploiting the fact that some
servos should have very similar behaviors and is not known beforehand which ones.
...
...
@@ -53,7 +52,8 @@ To sum up, every CPPN has $1+n$ inputs, where $n$ is the number of coordinates c
A \supg neuron is wrapping a CPPN like the one described above, with a coordinate and a timer. It has only the reset timer as inputs\footnote{In \cite{morse2013single} the touch sensor on the foot was directly connected to the reset input} and only the joint angle output.
\subsection{\supg implementation}
The \supg solution presented looked promising, but some adaptation were needed. First of the current robot design for the \tol project doesn't include a touch sensor on the feet at all.
The \supg solution presented looked promising, but some adaptation were needed.
First of the current robot design for the \tol project does not include a touch sensor on the feet at all.
Just defining what a foot is in this modular robot design would be a really complicated topic.
Even if that was resolved, to keep the reality gap as small as possible, the touch sensor solution should be practically feasible and implementable.
Vibration sensors for every block and bumpers were explored as possible solution, but none of them was a quick solution that could be easily implemented.
...
...
@@ -70,7 +70,7 @@ The combined input is not strictly necessary as is something the network could l
\subsubsection{Kickstarting the learning}
\label{ch:Method:SUPG:Kickstart}
After looking at some preliminary experiments it became clear that there was an issue in the learning process: the network was starting from blank, with a lot of inputs to work with.
It was testing a lot of controllers that were sending a static position instead of a variating signal, meaning the robot wouldn't move.
It was testing a lot of controllers that were sending a static position instead of a variating signal, meaning the robot would not move.
The reason is that the network did not make use of the timer input, therefore not having a time-variating input, only static
A solution was proposed to start the first networks with an hard-coded connection between the timer and the joint angle output.
As seen in the results in \ref{ch:Results} this improved the learning speed for the robots.
...
...
@@ -119,7 +119,7 @@ This method has been adopted in favour of a roulette system because the species
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsubsection{Mutation operator}
If a mutation occurs on an organism, it can be of 3 different types.
The first one is just a weight mutation, it doesn't modify the network complexity but it just searches for an optimization over the current organism configuration.
The first one is just a weight mutation, which does not modify the network complexity but it just searches for an optimization over the current organism configuration.
Weight mutation is widely more used than other types of mutation because it needs to occour more frequently.
Every newly modified network needs to be evaluated with different weights before we can decide the usefulness of a new node or connection.
@@ -280,10 +280,10 @@ Increasing the number of evaluations would not fit into the creation of the \tol
Instead, we looked into improving the algorithm and investigated what slowed down the learning so much.
% Looking at the runs we noticed that several individuals didn't move at all during all their evaluation time. Racing was a good solution to save simulation time and recognize these individuals soon, but more could be done (see second experiment).
% Looking at the runs we noticed that several individuals did not move at all during all their evaluation time. Racing was a good solution to save simulation time and recognize these individuals soon, but more could be done (see second experiment).
% Compared to \rlpower, the final results weren't so great.
% Compared to \rlpower, the final results were not so great.
% \todo[inline]{add comparison chart: \rlpower vs this thesis result}