Commit ab08d0fa authored by Matteo De Carlo's avatar Matteo De Carlo

Restyle the thesis

parent 1e8413fc
Pipeline #128 passed with stage
in 1 minute and 25 seconds
This diff is collapsed.
This diff is collapsed.
% \begin{itemize}
% \item SUPG is working but not as fast as expected. Touch sensors maybe be needed
% to achieve good performances. Also learning is slower than RLPower (open vs closed
% loop).
% \todo[inline]{\textit{From the introduction}\newline
% Results show that the proposed controller architecture achieves a good gait, but the results are not as good as compared to other open-loop architectures.
% Results also show that the architecture is sometimes capable of reaching some kind of directed locomotion, but not targeted locomotion.\newline
% More fine tuning of the experiment and more training time would be needed to improve results.
% But we are generally not impressed on the results and we do not think this solution should be investigated any further.
% }
% \end{itemize}
Our research goals were to
\item \textbf{\textit{Train the proposed controller to achieve reasonable gaits}}. I.e. make the robot move.
\item \textbf{\textit{Train the proposed controller to achieve targeted locomotion, in the form of phototaxis}}. I.e. make the robot move towards the light.
To determine if we achieved our goals, we ask ourselves three questions:
\item[$\diamond$] is the robot capable of moving?
\item[$\diamond$] is the robot capable of learning to move at least in one direction?
\item[$\diamond$] are the multiple targets at the same time interfering with the learning?
\section{Is the robot capable of moving?}
Our first goal was to make the robot develop some \emph{movement} in any direction.
In \ref{ch:Results:gait}, we determined that the robot is capable of moving and generating some gaits; therefore we consider our first goal fully achieved.
\section{Is the robot capable of learning to move at least in one direction?}
Our second goal was to achieve \emph{targeted locomotion}.
We learned that the robot is capable of learning to move in one direction. This is called {\bf Directed Locomotion}.
All robots, when asked to learn just one direction, achieved some result.
Some directions are more difficult than others to learn depending on the shape of the robot.
There are better alternative open-loop controllers more efficient in accomplishing the same task.
This is not what we originally intended.
The robot is not capable of learning to move to a target. This is called {\bf Targeted Locomotion}.
We can demonstrate this by watching the replay and seeing that in the long term the robots are moving in the opposite direction for some targets.
The task is too complicated for our controller to learn all at once
% A mesh of multiple controllers that are able of simplier tasks is the step we would take next.
\section{Considerations about the approach}
We did not achieve our research goal and we have to have reflected on what went wrong and what could have been done to improve the results.
We conclude that a black-box approach for such a complex task was a leap too big for our algorithm to make.
Evidence that the complexity of the problem was our main obstacle can be found in the results: every time there is a decrease in complexity, there is a corresponding boost in the performance of the results.
E.g. when we introduced the kickstarting hack, only the simple shapes (fewer control joints) could take benefit;
when we separated the light targets as different tasks, we found a decisive increase in all the results, with some more than doubled.
The proposed \supg algorithm \cite{morse2013single} did not fulfil its promises to be a fast and stable algorithm.
The possible cause is that \supg excelled in a very specific configuration, where the environment was simulated and the robot was a quadruped animat, with perfect touch sensors hard-wired into the algorithm.
The original paper presents a much simpler problem, with a smaller configuration space: the objective of the original \supg algorithm was to develop a gait for a more symmetrical robot which could go in any direction with no sense of the outside apart from the touch sensors.
In our case, we needed much more flexibility and complexity, as we needed to control different, potentially asymmetrical robots.
The objective to learn is also more difficult, as the robot has to have a deeper understanding of the objective and adequately alternate rotation and directional movement.
The \supg algorithm simply showed its limits.
We think that, with an increase in 50 or 100 fold number of evaluations, some real results could be achieved.
This was not explored, as the scope falls outside the scope of the thesis, where the proposed environment is about learning in the real world using real hardware.
% Instead we suggest looking at other methodologies that can create results in reasonable times.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% \subsection{Are the multiple targets at the same time interfering with the learning?}
% Clearly for some shapes the separation of targets meant a great improvement over the single results.
% This is probably due to some shapes needing completely different gaits for locomotion in different directions and the inability of the controller to learn to separate actions like rotate first then move forward.
% We tested that all directions were learned better when learned separately.
% This shows us as the simplification of the problem brought us to a better result.
% We conclude that the problem is too complex to be learned all at once by our proposed solution.
\section{Future works}
This thesis demonstrates that a naive approach is not enough and therefore we discourage to keep working on a one-do-all, black-box controller solution.
With these results in mind, our suggestion is to take a more modular approach: simpler objectives to learn for simple controller modules that can be later combined into a more complex controller.
Some inspiration could be drawn from \cite{benbrahim1997biped} in which Benbrahim and Franklin design a modular controller (each module defines a behaviour) with a central system that continuously switches between behaviours depending on the need.
Another (non-exclusive) approach that could be taken is to offload the learning into a simulated world.
Trough the mixed use of an arena in the real world, accelerated simulations and GPU computing, we think is possible to achieve interesting results.
% \chapter{Future improvements}
% \begin{itemize}
% \item establish HyperNEAT coordinates programmatically from the body
% \item being able to load a brain and visualise the tests
% \item make the brain transferable to real hardware
% (make light sensors realistic in simulation)
% \item adjust evolution parameters to improve over speed and quality of the evolution
% \item evolve the position of the sensors in the body (bad position could lead to a bad evaluation
% \item use bits from TD-learning or Q-learning to improve learning
% \end{itemize}
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment