Last time, we talked about how to have an actor make plans. We provided an expensive algorithm for reaching decisions, a simple breadth-first search through a space of possible worlds. It worked, but it was hideously expensive. Let’s make it a little better.
The most obvious change is to be smarter about what nodes we search and in what order. We can switch from breadth-first search, which will analyze every possibility one step from now, followed by every possibility two steps from now, and so on until your computer starts sobbing; instead, we can use a sort of best-first search.
In breadth-first search, we maintain a queue of nodes. When we visit a possible world, we put a tuple into that queue consisting of the possible world and the path we used to get there. Also, for convenience, we store the score of that possible world in the same tuple (the crude approximation of its expected utility). We keep popping items from the queue, and each time, we insert all the worlds you can reach in one decision back into the queue.
In best-first search, we use a priority queue instead. The queue is keyed on the the world’s score. This lets us repeatedly take only the best world out of the queue each time and explore possibilities from there.
This works. It’s the standard solution. It has a glaring problem.
Let’s say the actor, Scarlett, wants a television. Secondarily, she wants money, at a rate of $10,000 equivalent to one TV. She has an option of preordering a television for $200, to arrive in thirty days. Or she can just go to work and earn $5. We’ve got enough processing budget to analyze four billion nodes, plenty to breadth-first search far enough to get the television in. (Four billion nodes is 31 days’ worth of simulation for all possible decisions.)
But we’re doing best-first search. So we’ll look at the first two options, see that one loses us $200 and the other gains us $5, and we’ll prefer the second, since it puts us $205 ahead of the other. And we’ll keep going on that branch, keep choosing to work, never choosing to buy the television. What can we do about this?
Instead of just looking at the state of the world immediately after making the decision, we can look ahead thirty days, assuming we always make some nominal decision (probably the decision to do nothing). This greatly increases the cost of evaluating each world, but it avoids stupidity like this. Since we previously had a budget of looking through four billion nodes, now we’ll only have a budget of looking through about 140 million. But that took us to 31 days before, and it takes us through 26 days now. But we see a potential world 56 days from now.
This obviously isn’t perfect. Let’s say Scarlett, in addition to earning money, has bills to pay. Bills cost $4.50 per day, and she has $200 in savings. In 26 days, she’ll have saved up another $13, and we assumed she did nothing for the next 30 days. That saw her $17 in the red. But in Scarlett’s home town of Southumberland, there’s no such thing as credit; there’s debtor’s prison. If she buys the TV, she’ll end up in prison next month, with no access to television and no money in the bank.
We can change this to make her nominal decision “go to work” instead of “do nothing”. I’m still worried about similar problems that will require more manual intervention to fix.
Echo utility backwards to the point of the decision
We created the world. We created the option to preorder a television — we’ve created delayed gratification. We can have our AI handle this explicitly, as long as the events are structured in a way to allow it. We can, for instance, use a linear scale in utility for delayed gratification, where we achieve X% of the utility of a television per day until it arrives.
Alternatively, we can use the concept of net present value. We might say that a television in a month is worth 5% less than a television today, and we compound that every month. That means we’ll prefer to preorder the television for next month, and we’d even prefer a preorder that takes five years to arrive. But around 75 months, it’s worth more to Scarlett to work than to take the time and money ordering the television. At 150 months’ delay, she wouldn’t even take time off work to get a free television.
This is utterly straightforward in simple cases like this: you have an event, you look up the utility of the result, you calculate a net present value from that. What about more complex cases? You could just assume the result happened now and simulate the world for one or two steps to see what happens. That’s another crude approximation, and it requires to simulate your entire world, but it only costs you a day or two each time.
These are all simple, straightforward techniques, but they require you to be able to simulate quickly.
You can create a neural network that attempts to associate decisions with effects. This is a more holistic approach, and once you build the neural network, it’s going to be reasonably fast. However, neural networks require a lot of time and data to train, and they’re often rather large. Since we’re talking about procedurally generated actors, the network has to match the current collection of actors as well as the current state of the world.
Given the amount of data to match and the range of possibility generally available, you’re going to spend a ton of time with the game running a different AI just to feed a giant neural network.
In my case, I’m talking about a game that might have a large number of players. That would end up with me paying for a bunch of large AI servers just hosting this neural network, and players would have to be online in order to play. If I ever shut down the servers (for instance, because they cost a couple hundred dollars a month to keep running and I’m not seeing any revenue), my players wouldn’t be able to play the game.
Neural networks are not an option for me. But if you’re generating a novel, you might be able to use them.
Just be warned, neural networks aren’t good at explaining their reasoning. It’s usually important for a story that characters be able to explain their reasoning.
Next time, we’ll look at incorporating other actors’ actions into one’s plans.