Halite is over. It was only meant to cost me a few days, but proved way too much fun to be a one-weekend stand. Hundreds of hours have been trucked into this project and I have no regrets.

My bot seems poised to grab something around the #8 spot1. If it were an ATP-rated tennis player, that would make it Dominic Thiem. If it were a Times-rated university, it would be Imperial College London. If it were an IMDb-rated film, it would be Lord of the Rings: The Return of the King.

Close enough.

Seems legit.

Halite does not seem to play close at the top, regardless how great a film Return of the King is. At time of writing, the gap in trueskill mu between 1st and 10th is the same as the gap between 10th and 116th. The god bots have a significant edge, and their write-ups are worth reading.

The relationship between rank and trueskill mu is telling; rather than flattening out as you approach the nosebleed ranks, the gap in skill only widens.

This could just as easily be an artefact of the trueskill algorithm, but if you take it at face-value then the conclusion is valid. The plateaus are caused by rating cutoffs in the finals, where bots below a certain rank where frozen from rating (thanks @janzert).

This will be about a story of a mortal’s journey amongst monsters. I’ll summarise DexBot’s core loop, give a brief overview of the Halite metagame, go into more depth on DexBot’s expansion algorithm in particular, and finally close it out by rambling at will about some tricks I found useful.

Shoutouts

One of my critical sources of power was compute resources. My other critical source of power was colleague, comrade, consultant and confidante @Shummie. Without him kicking my ass in December I would have burned out and quit.

Core Algorithm

DexBot passes a set of planned moves to progressively more executive strategy functions that mutate the moves before they are pushed to the Halite game engine. Its power lies in its expansion. Most top bots get the better of DexBot in combat, but it can hang with the best at growth.

The bot maintains a GameMap class that holds expressions and abstractions about the state of the game and a Moveset object with an API for adding, mutating and iterating over moves. These are refreshed at the start of each turn and passed together to the following functions, in order.

  1. A Combat Handler that routes ‘wartime’ pieces that exist near an interface to battle.
  2. An Expansion Handler that routes ‘peacetime’ pieces to new lands.
  3. A Move Resolver that jointly handles pathing, avoiding the strength cap and maintaining a chessboard pattern.
  4. A Nonaggression Handler that has absolute veto power over any moves that would open up an interface with the enemy if such an aggression is judged undesirable.

DexBot is written in Python and uses numpy and scipy extensively. These tools are seriously fast and allow a clean expression of Halite state and computation in the form of matrices and filters2. They also make testing new ideas wonderfully easy. Need Dijkstra’s algorithm? There’s a highly optimised function for that. Want to try a different shortest path algorithm? Knock yourself out. Need to find the convex hull of your cells? Here’s something lying around that can do it in 20 microseconds. Oh, you ran into an assignment problem? Put the textbook down hotshot, scipy already implemented it.

I was green when I started, and these three months have been a hyperbolic time chamber for learning scipy and numpy.

Halite Metagame

To conceptualise why certain aspects of DexBot behave the way they do, it’s important to understand aspects of the Halite metagame. This is everything that I understand to be common strategy and understanding among invested players.

  • Expansion.
    • Maps are generally quite imbalanced, your starting zone is usually not as good to expand in as some other points on the map. Expanding strictly in a blob is suboptimal. It’s generally good to have some tunnelling behaviour to reach better spots. Too much tunnelling slows you down and makes your internal pathing inefficient.
    • Moving over high production areas is bad. Moving at all is bad. Movement is a necessary evil.

Effective tunnelling. Straight to the prize. From this finals game.

  • Combat.
    • Everything is about overkill. Avoid getting hit with splash, while hitting as much splash as possible3.
    • Throughput is important. Patterns with zero overkill exposure can’t deliver enough energy to big battles.
  • Chessboard Movement.
    • Moving in a chessboard pattern is an easy shortcut to having decent combat with a very low technical overhead4. It also helps control the amount of traffic since adjacent pieces will agglomerate naturally.
    • The very top bots are able to control traffic and combat without needing a chessboard. Just like supplemental oxygen on Everest, when you’re really good you don’t need it.

DexBot’s chessboard movement in combat against @nmalaguti and @acouette. From this finals game.

  • Non-Aggression Pact (NAP).
    • Fighting is bad, sort of. If two players out of three are able to coexist without fighting, they will almost certainly split first and second. Deliberately avoiding combat is huge in Diamond games. I think the emergent NAP meta is one of the most interesting parts of Halite, and I regret that it wasn’t given full opportunity to develop.

@timfoden and I dancing the dance of NAP. From this finals game.

Detail

Combat

DexBot’s combat is drab and uninteresting. It makes no predictions about enemy movements, and each piece is unaware of its friends and the group’s exposure to overkill. If I had more time this would be the primary target for improvement.

Imgur

Emblematic graphs for DexBot vs @erdman from this finals game. Despite being ahead in a lot of respects, overkill damage tells a singular story.

Expansion

DexBot’s expansion algorithm is the one thing that has kept it relevant.

A popular strategy for top bots is to ‘paint’ a heuristic over all owned cells and have cells take the best move of their immediate neighbours. This is a great way to handle pathing, and something I would probably steal in a full do-over. DexBot considers instead the problem of matching internal pieces to the optimal border cell, even if it is many turns away. This is recalculated and reissued each turn.

The question of how to move each owned cell in order to effect fastest expansion is a hard one, but it can be replaced with something much simpler: what is the value of capturing each border cell? We will attempt to quantify this.

This value for an unowned border cell \(i\), \(V_i\), can be broken down into two components: the immediate value that capturing it will convey on its own merit based on its production and the cost to capture it – call it \(G_i\) for Growth – and its value in the improvement it brings to our position by making new tiles accessible – call it \(T_i\) for Tunnel.

There are various ways to approach the immediate value problem; it’s a function of the cell in question’s production, its strength, how quickly a capturing force can be mobilised and how much production will be sacrificed in movement costs to get there. One reasonable approximation is to consider the cell’s production divided by its time to make a full return on investment. Once the costs to capture a border cell are returned, how much production capacity has been gained, and how much time was spent achieving it? This can be thought of as targeting the gradient of the production versus time graph: we want to reach a point where our strength has been recouped and our production has been inflated as quickly as possible, favouring high production tiles that can be repaid quickly.

If a border cell’s production is \(P_i\) and the strength is \(S_i\) then the time to make a full return is:

This ignores the time and cost to reach cell \(i\), but has the convenient property of being invariant on the arrangement of owned cells. This is passed through a low-sigma scipy Gaussian filter in the final bot to make the surface smoother and reduce movement jitter.

The second quantity concerning the value from accessibility is much harder to model. Any representation needs to consider the value of a large number of faraway cells weighted by how reachable they are. Further, it needs to direct the bot to dig efficient and fast tunnels to reach faraway areas.

DexBot solves this with a calculation of the shortest paths between all pairs of cells for a map. This is an expensive operation. Fortunately it only has to happen once, and scipy is more than happy to help.

The distances between all pairs of cells is the matrix \(D\), where the distance between cells \(i\) and \(j\) is \(D_{ij}\). This ‘distance’ is heuristic-driven, whereby lower-strength cells are easier to traverse. The set of all unowned cells for which \(i\) is the closest border cell according to \(D\) is \(C_i\). This idea is easier to understand visually.

For owned cells (white), and immediate potential captures (black diamonds), unowned cells are coloured according to the border cell from which they can be most easily reached. The rightmost border cell allows efficient access to close to a third of the map, whereas the bottommost border cell is the most efficient path to only a small number of cells. For each \(i\), each set \(C_i\) has a different colour in this map.

DexBot defines the accessibility value \(T_i\) as:

That is, each border cell receives a boost in perceived value corresponding to the value of every unowned cell to which it is the closest weighted by how reachable that unowned cell is.

The heuristic distance itself was formulated partly by intuition and partly by testing. It is defined as the number of turns it would cost to capture a destination cell, using the production of the origin cell, over neighbours only.

DexBot’s expansion heuristic while growing over a 50x50 map. Redder corresponds to a greater value.

Assigning owned cells to border cells in the main loop requires a handful more definitions and hyperparameters. Define the number of moves to reach cell \(j\) from cell \(i\) as \(\Delta_{ij}\), and the number of turns cell \(i\) would have to grow in order to have enough strength to capture border cell \(j\) as:

The expansion handler iterates over controlled, peacetime cells \(i\) and assigns them a target border cell, \(j\) according to:

Where \(k\) and \(b\) are tuned constants.

In English, the value of moving a particular owned cell to capture a particular border cell is given by the immediate value of the border cell divided by the number of turns to reach it and capture it, plus the global value of the cell divided by the distance plus a constant.

Higher values of \(k\) lead to a more optimistic, tunnel-happy bot while lower values of \(k\) cause greedier, less range-aware expansion. \(b\) controls the falloff of global value, wherein higher \(b\) values will encourage cells to move further to reach global value.

This gives a natural and reasonably intuitive trade-off between expanding locally and hunting for a global maximum, with a manageable number of hyperparameters. There is no need to switch between tunnel and expansion ‘modes’. Further, the only expensive part of the computation is to generate \(D\), which can be done with scipy.sparse.dijkstra in less than ten seconds up front within the Halite init provision and then held constant for the entire game.5

Patterns for success with numpy and scipy

As I said before, DexBot is built from numpy. Its main power is in its blazing fast array arithmetic, allowing algebraic operations to be performed at map-scale with C-like performance. It also provides much nicer containers for any-dimensional matrix information than anything in base Python. I cover most of this goodness in the numpy starter’s OverkillBot implementation.

Here is some useful Halite logic to add to your quiver:

unravel_index

Frequently a two-dimensional array of values appears, and its maximum is of grave importance. Finding the coordinates of the maximum point in an ndarray is weird-looking but fast:

np.unravel_index(my_array.argmax(), my_array.shape)

where

Iterating over coordinates in a matrix where a condition is met is also common. Grabbing this for an ndarray is, again, weird-looking but fast.

np.transpose(np.where(my_array == my_value))

Particularly handy shorthand for grabbing all locs where my_array is nonzero:

np.transpose(np.nonzero(my_array))

np.where can also be used to index ndarrays and manipulate them, for instance:

my_array[np.where(my_array == 0)] = -1

roll

This is a killer function for Halite with its global wraparound: np.roll allows an ndarray to be moved through an axis and wrapped around the other side.

my_array_offset = np.roll(my_array, x_amount, 0)

scipy filters

These are awesome, fast and highly customisable. They allow you to move a spotlight-style function over an ndarray for any size and shape.

As an example, say a Halite competitor wanted to calculate which cells are on her immediate border. That requires a double loop over coordinates! What a pain!

Not so.

def plus_filter(X, f):
    """Scans a +-shaped filter over the input matrix X, applies
    the reducer function f and returns a new matrix with the same
    dimensions of X containing the reduced values.
    """
    footprint = np.array([[0, 1, 0],
                          [1, 1, 1],
                          [0, 1, 0]])
    proc = generic_filter(X, f, footprint=footprint, mode='wrap')
    return proc

borders = plus_filter(owned, max) - owned

This makes calcuting all sorts of boundary and neighbour-aware logic with ndarrays a cinch.

Effective testing

I had access to three behemoth servers throughout the competition, each with four Xeon E7-8890 v4’s. That’s 96 physical CPUs per box. Testing changes is a dream when two thousand games can be computed and crunched in a flash. Throughout development I maintained the previous version of my bot as a reference and would play games against it in heads up and 4-way matches to benchmark changes.

I wrote an R script to spawn Halite sessions across multiple threads for fixed set of seeds, number of opponents and map dimension. Holding seed set constant allows marginal changes in power to be caught more easily without having to rely on the law of large numbers. Winning three more games out of a thousand than a previous iteration is only meaningful for constant seeds.

This can be pushed too far. In an analogue to machine learning, spurious tuning can make a bot overfit to a set of seeds. Fortunately training data for Halite is unlimited and burned seeds can be replaced any time.

Testing seems to increase in reliability with how genetically different the reference bot is. Testing against a close-to-clone results in warped, mirrored combat and the over-representation in effect of small changes. The best testing environment I had was always after a large rewrite, or when I had access to @BigBallerShotCaller’s bots for sparring. This makes for another machine learning analogue: similar learners don’t ensemble well.

Makefiles

Makefiles are amazing. I only know the basics, but the basics are enough to have a dynamite workflow. They can be used essentially as project-specific functions and aliases.

The fun parts of my repo’s Makefile are:

clean:
    rm *.hlt
    rm *.log

run:
    halite -d "30 31" "python3 ReferenceBot.py" "python3 MyBot.py"

archive:
    today=$$(date +%Y-%m-%d.%H:%M:%S); \
      zip archives/dexbot_1.2e_$$today.zip dexlib/*.py halitesrc/*.py MyBot.py

The plaque of .hlt and .log files that build up during development can get annoying, and make clean saves valuable keystrokes over running two separate rm’s.

Having to type out or reverse-i-search the command to run the Halite environment costs valuable microseconds, so make run triggers a 30x31 game between the current bot and the reference bot. Look forward to forgetting how to spell ‘halite’, because you don’t need that word in your command line vocabulary any more.

The last one is definitely the most useful. make archive zips up all the relevant files and folders for the bot to run into a timestamped file. What a lifesaver!

Parting thoughts

The Halite community has been tremendous fun to be a part of and I have learned a lot. @truell20 and @Sydriax deserve huge credit for putting this game together.

Competitions with leaderboards are dangerous and addictive. Don’t try this at home.

Until 2.0.

  1. I actually finished 7th thanks to a late little boost from TrueSkill RNG. 

  2. @erdman’s Python code doesn’t use numpy but is clean as a whistle. Massive props. 

  3. Top bots seem to just focus on ‘avoid getting hit’, since this is easier than anticipating moves. 

  4. A cool implementation of this is to only allow pieces that satisfy x + y + turn_number % 2 == 0 to move. 

  5. Holding \(D\) constant is only strictly valid for games without opponents, but it’s close enough. Enemy-controlled cells and cells with zero strength are zeroed out, but meddling enemies can distort the perception of closeness.