October 24, 2020

Raspberry Pi Cluster Node – 16 Python 3 Codebase Refactor

By Chewett Raspberry Pi Cluster 1 Comment

This post builds on my previous posts in the Raspberry Pi Cluster series by improving the codebase for Python 3.

Moving to Python 3

Python 2 was marked end of life on January 1st, 2020 and therefore applications should ideally be no longer using Python 2. There will still be a lot of applications still using it, but really they should have been migrated away by now.

In this case, I am using this tutorial to migrate the code from Python 2 and perform some refactoring.

Changing Hashbangs

The first change is to replace all the hashbangs with a python 3 compatible one. This means all the hashbangs that are as follows.

#!/usr/bin/env python2.7

Will be converted to

#!/usr/bin/env python3

For now I am just marking them up with python3 and not a specific minor version. This is because while I would like to use the most recent python version, I want to keep this open to a variety of systems and therefore will keep the code generic to python 3.

Changing relative imports

Previously I used implicit relative imports which meant that Python looked in the current package level for anything referenced if it could not be initially found. This means if ModuleA refers to ModuleB, python 2 will find it if it is in the same directory.

Python 3 no longer searches for implicit relative imports which means it will no longer file ModuleB even if it is in the same directory as ModuleA.

You are able to use a relatively import by prefixing the import name with a dot. For example

Python 2 – import ModuleB
Python 3 – import .ModuleB

Both will work the same, however in Python 3 we now have to be explicit about the relative import. I am going to go through and be explicit about the location of modules.

Unicode Handling

Python 3 handles unicode in a much better way, however it means now a number of functions expect binary data instead of strings. An example is that the socket handling code to receive and send data now must be encoded in binary before being sent.

This can be done using the encode function available on strings, and decode function available on binary data. This function requires a character set to encode/decode as when it is called.

I am going to be using utf-8 to transfer data back and forth on the nodes so will encode/decode with utf-8. This means by socket handling code requires a small tweak as below:

Sending data and encoding as utf-8:
self.sock.send(payload.encode("utf-8"))

Recieving data and decoding as utf-8:
self._buffered_string += data.decode("utf-8")

Removing References to Master / Slave

Since I last worked on the cluster the world has moved on and various reflection has been made on the world. The terms master/slave are steeped in racial undertones and therefore I am removing them.

If renaming the scripts makes one person feel more included, it is for the best.

The three main scripts have been renamed to the following.

basic_primary.py
basic_secondary.py
basic_webserver.py

All references in the code to master/slave have been replaced with primary/secondary.

Python Doc Comments

I previously, wrongfully, neglected python dock blocks opting for detailed descriptions in the tutorials. While documentation in the tutorials is helpful adding docblocks to the code will also be useful.

I am going through all the code adding doc blocks. This shouldn’t be a massive undertaking as the code was already commented, just the functions/classes did not have a description on them.

Misc Code changes

There are a few changes made that PEP 8 stuff

pep8 – two spaces before inline block
pep8 – Inline block must have space between # and comment
Variables should be python case not camel case

Refactoring the DataPackager

The DataPackager is a small set of functions used to create the payload to send over the socket. In addition it handles receiving data from the socket and piecing the payload back together.

After reviewing the code I found that since partial messages are stored in the class and not the object, using multiple data packagers on multiple slots may cause message corruption.

In addition to this, it was expected that to send messages with the DataPackager you needed to format the messages first. This is now handled directly by the DataPackager and means you just pass in the payload to send.

These changes, including moving it to its own object should make it easier to use and possible to use multiple in the same script.

Summary of improving the codebase

In this tutorial the cluster codebase is moved to python 3 and generally cleaned up and improved. This will allow the latest languages and tools to be used with the cluster and improve development.

In addition the increased comments and improvements should make it easier to add further functionality to the cluster.

The full code is available on Github, any comments or questions can be raised there as issues or posted below.

Tags:Distributed Computing, Python, Python 3, Raspberry Pi, Raspberry Pi cluster, Raspberry Pi OS

About Author

Chewett

One Comment

The Chewett blog

Raspberry Pi Cluster Node – 16 Python 3 Codebase Refactor