An overview of our Hanabi rule evaluator

An overview of our Hanabi rule evaluator

Following on from our existing work on Hanabi, Piers and myself have been working on a web based UI to allow people to submit rule based agents, have them evaluated and then return the results to them. This will form part of our workshop at the IGGI symposium conference next week.

Requirements

No software to install

The first requirement meant that we were probably looking at something web-based. We have no idea what hardware the attendees will have, what restrictions would be in place on them. Our framework is written in Java, so even if they had the JRE installed, we cannot assume that the users would have (or even be able to install) an IDE and compiler to be able to develop the agents.

We, therefore, needed to make it web based. This usually isn’t too much of a problem (my hatred of Javascript aside), We’ve built interactive web pages before and even put together some WebSocket demos.

Needs to Look good

Secondly, we wanted it to look good – the demo we were building this for would be presented to games industry professionals as well as academics. While we can expect academics to be quite forgiving of functional but non-aesthetically pleasing UIs in order to appeal to the games industry professionals we wanted something that would be quite visually appealing.

Fast to execute

Thirdly, Our usual data collections can run for days on large server clusters, playing thousands of Hanabi games – the aggregation of the data is done offline by manually transferring data from the cluster to our workstations and using pandas scripts. This wasn’t going to work for a live 2-hour workshop. The time taken to process the runs is largely due to the sheer volume of data we are collecting and some of our more complex agents. The rule based agents we were planning to have the workshop attendees build are extremely fast to execute (I can run 60,000 games using the rule based agents in my office machine in about 5 minutes). To get the execution time to under 5 seconds even on heavy loads, we’re limiting each agent to 100 different deck orderings (each ordering playing the 2,3,4 and 5 player versions of the game).

Easy to use

Lastly, we needed the UI to be easy to use. The conference is attended by a range of people from both industry and academia from the full breadth of IGGI’s research areas. We can’t assume that the people attending will be programmers or have a programming background. In addition, people will only have a very short time to use the tool (the workshop is only 2 hours long). They shouldn’t be an overly complex learning curve to make sure they get the most out of the session. Whatever UI we come up with should be usable by non-programmers and shouldn’t need a lot of explaining.

Technical Architecture

After scrawling on our office whiteboard for a bit, we came up with a plan. We’d build an interactive web UI using bootstrap and jquery. It’d make rest requests to a tomcat server that would do the processing and return the results. We’d display them on the interface, done.

Simple, in principle. Hosting is fairly easy, thanks to FOSS Galaxy we had docker hosts for Tomcat and static file hosting for the front end. The drag and drop interface in javascript required some Googling, but we found jquery-ui quite handy for that. For displaying the data, we decided on using bar charts with average scores as well as a table displaying the information numerically.

The Tomcat server is a Docker container with a public port exposed. It takes a POST request in the form of a comma separated list of internal rule IDs and returns the GameStats result objects of the games played as a JSON list using gson.

The actual evaluation is performed using the same runners that we developed for use on the cluster, only rather than taking their arguments on the command line they are passed directly to the Java object. None of our code uses static variables and games are single threaded. An agent that makes an illegal move (which should be impossible given the ruleset) or fails to provide a move will be given zero for that game.

Design

screenshot of the Hanabi web interface
Our finished web interface

The interface dynamically updates the bar chart as data is received from the server. It allows for comparison of different agents using a histogram of scores. When testing agent performance we found score histograms one of the easiest ways to visualize how the agent was doing.

The last 3 agents the user has developed are displayed on the graph, with their vital stats (average score, moves, and lives) being displayed underneath. Piers also implemented a “revert” feature to allow people to restore one of their previous agents.

On the right, there is a list of every rule in our framework. In Hanabi, there are 3 different kinds of moves: Tell, discard and play. Each rule has an icon telling the user its type and an info button that opens the Javadoc page for that rule in a modal dialog. When building agents we had trouble trying to find the rules we were after, so we added a filter box to allow filtering of the available rules.

The evaluate button sends the request to the server and disables the button until a response is received. This is to prevent users mashing the button when the response is not instantaneous.

Conclusion

Overall, I’m pretty happy with how it turned out. The underlying framework is fairly robust and made writing the tomcat components relatively straight forward. The agent loading framework that allows agents to be loaded based on strings makes this kind of evaluation fairly easy. This would not have been possible without the awesome work that Piers did so major thanks to him for making this thing a reality.

It would have been nice to have a working web based client for Hanabi that let people play games with their bots, but we didn’t really have the time to build that. We do have a Java UI that allows for it though if people really are interested in doing so…

Now, all we have to do is present it…

2 thoughts on “An overview of our Hanabi rule evaluator”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.