This post describes the planned setup for distributed computing on the Raspberry Pi Cluster.
A basic distributed system
A very basic distributed system will have one Raspberry Pi asking as a master machine. This will coordinate work among all of the slaves. Eventually, we will move away from having a single point of failure to having multiple masters. This will mean that if we lose the master Raspberry Pi the cluster will continue.
The advantage of this setup is that this will be easy to work with. With only having one machine controlling what the Raspberry Pi’s are doing, management will be relatively easy. Tasks will be able to be submitted to and distributed from the single node.
To submit jobs to the cluster you would send the job to the master node who would them schedule it to run at a later date.
As noted above this however is a single point of failure and in a truly distributed system this setup wouldn’t be used.
However to try out many of the concepts in distributed computing this is a good way of running a basic setup.
A more advanced setup, a dynamic master
An improvement to the initial setup is that instead of always having the same master, you dynamically configure the master when the nodes are loaded. There will need to be some mediation where the nodes talk to each other and decide on a master. It is very important that all nodes decide on the same master and we will look at this problem in the future.
An added benefit to this system is that in the event the master fails. When this occurs the nodes can detect this and decide on a new master. This would function exactly the same as the initial setup where there is no master.
Once the master comes back online it can ask the nodes who the current master is and join as a slave.
This system would be more akin to a standard distributed system as having a single master means there is a single point of failure. While this is fine for a small cluster setup when you are running many machines the chance of one failing is very high. Since you cannot always predict when a machine will fail you won’t know if you need to change a master soon.
To submit jobs to this cluster you could send then to any node in the network who would them forward them on to the master. This means that you won’t need to know who the master is but can pick any machine that is online.
There are some more complicated problems about knowing whether your submission was accepted around this. Especially when you don’t know if the master will ever receive your message. I shall write more about this in a future blog post.