In this post I talk about the work I did to improve the Halite 3 Bot testing gym to speed up iterations and compare various versions of my bot.
Why I spent time Improving the Halite Gym
During the competition of Halite 3 one piece of provided code was a simple gym system. This python script allowed you to enter various bots and it would pit them against each other.
Depending on the results it would then use the same ranking system as it used during the competition to rank the bots.
This was designed so that you could test various versions of your bot against each other in an automated way. You could specify a number of matches to run and after each match the ranking would be recalculated.
As a automation engineer this system spiked my interest so I could automatically compare my bots against multiple different versions.
After reviewing the system I had a few objectives that I wanted to complete.
Automate running matches continually
Allow matches to run over multiple computers on multiple operating systems
Improving the bot selection for matches
Storing of interesting replays
Below are the details of what I accomplished during the competition.
Converting the database to MySQL
Originally the gym used a simple Sqlite database to hold the bot information. This works well for a small portable system but I was planning to use it much more rigorously.
The first job I did with changing the gym was to convert it over to use a MySQL database configured on my PC. This was relatively easy as the SQL used previously was relatively independent from the Sqlite database.
This then gave me the advantage of making it easy to connect to it from different computers and have some speed bonuses. This laid the way for some larger changes described later.
Improving Bot selection
Originally the bot selection worked by picking two or four random bots to run.
For a standard tournament this works fine, but when I was iterating my bots I wanted the newer ones to run more often.
After trying various weighting systems I changed the selection code so that it would always pick the bot that had performed in the least number of battles.
Once that bot had been selected, the other one or three bots would be picked randomly. This meant that over the lifetime of the gym the bot with the lowest matches would be run more times, forcing it to compete and be judged against the others.
The random selection of the other bots meant that it was not always playing against the newer bots.
Multi-threading the Battles
Running the battles is purely limited by compute power and to me this is an easy problem to scale over computers. This fits into the category of “Embarrassingly parallel” problems. There were some tweaks I needed to perform however.
Stopping race conditions with ranking the bots
Looking over the gym code the only thing that was stopping me running more than one battle at a time was the ranking calculation. If each ranking was not updated atomically there could be race conditions causing the ranking to become false.
To do this I decided on using a MySQL feature called “Named locks”. This allows you to obtain a lock with a specific name from the MySQL server. Until you release the lock or your connection is closed the lock will be retained.
Any other thread will be able to check for the lock being used, and wait until it is released. Before any rank is calculated the thread would get the rank calculation lock.
Once obtained it would perform the rank calculation maths and update the database. Then once it had been completed it would release the lock and allow another thread to update the database.
This allowed me to run multiple copies of the gym on the same PC. This sped up checking the status of a new bot because I was able to more quickly fight it against previous bots.
Running it on multiple computers (and operating systems)
Since the battles are primarily compute bound I eventually hit a limit of how many threads the PC could successfully run. Adding more slowed down the matches generally and caused the bots to begin timing out.
Since the race condition with updating the ranking has already been dealt with the only problem was how the original gym stored bot details.
Originally when adding a bot you provided a command line to run the bot. For python bots this is something along the line of python MyBot.py to run it.
However since now these commands are run on multiple systems this began to have issues. In this instance while python evaluated to Python 3.5 on my windows system, when run under Ubuntu this was actually Python 2.7.
Since I was using specific Python 3 commands this then caused problems and the bots failed to run. To resolve this I extended the bot tables to store a path to the script and a runtime.
The new runtime table included a list of runtimes such as.
In the database each bot would specify the path to the script and the runtime needed.
Each host would then have a configuration file which mapped the runtimes to specific paths. When the bot details were passed to the machine to run it would then append the runtime binary with the machine specific path.
With these changes I was able to run the bot ranking code across multiple computers to very rapidly iterate bots.
Storing Interesting replays
I never manged to store interesting replays however the plan was to have a metric to decide what is interesting.
This was going to be based on a whether one of the following conditions occurred:
If any bot crashed – This suggests a logic error
If any bot timed out – This suggests a bot needs handling to keep track of timings better (as bots are allowed limited time to run)
If any bot lost/won to a much weaker/stronger bot (as by the ranking) – As this may indicate a weakness to a specific strategy
If any bot collected a large amount, or very small amount of halite – As this may indicate a weakness or strength with the bot and the given map
During the competition I manually reviewed anything that looked odd to check the state of the automated matches.
Using the Automated Halite Tester during the Competition
During the competition I quickly found that I could run many thousands of matches in between the development of each new bot. This meant that I was able to quickly compare the version of each bot.
Overnight since I was able to dedicate multiple machines to the tester I could run millions of matches. To utilise this power I created bots which could be configured with various parameters.
Each night I would submit multiple versions of each bot with the parameters tuned. By running many matches across these bots I was able to determine optimal parameters. An an example one area that was tuned was the amount of halite that would be collected on a cell before moving on.
One concern with this was finding local maximum’s, or finding the best bot which could compete against my own bots. To reduce this I wrote a number of bots in a number of different languages and in different ways.
Overall I enjoyed the competition and the challenges of automating it. By utilising my automation expertise I was able to create a system which could evaluate my bots. This had advantages as I didn’t need to submit them to the competition and could run many matches.