MUD life: time

We’ve been talking about procedural content generation for a MUD, but it would be rather sad to have a procedurally generated static MUD. A place where nothing ever changes.

If we want to talk about change over time, we need a notion of time.

Continue reading

MUD procedural generation: drawing the walls

This is the second post in a series on procedural generation for a MUD, which is a text-only MMO, or a multiplayer Zork. The eventual goal is to create a large, varied MUD containing multiple cities, towns, and wilderness biomes, and a wide array of NPCs and items, as the basis for more personalized and customized work.

Last time, we spoke about placing towers for the city walls. Let’s create the walls themselves.

Continue reading

MUD Procedural generation: outlining city walls

This starts a series on procedural generation for a MUD, which is a text-only MMO, or a multiplayer Zork. The eventual goal is to create a large, varied MUD containing multiple cities, towns, and wilderness biomes, and a wide array of NPCs and items, as the basis for more personalized and customized work.

Today we’re looking at generating the outline of a city: the city walls.

Continue reading

Subtex: doing a BBCode with LaTeX

I created Subtex, a LaTeX-inspired markup language for ebooks. Here, I’m going to talk about what caused me to create this language, what it does, and what’s interesting about it.

Continue reading

The Atlantis browser concept review

My reaction to this is a giant NO.

Extracting a browser kernel that runs services on top, that’s great. The rest, not so much. Right now, I have to trust Apple or the KDE devs or Google or Opera or Microsoft that my browser isn’t spying on me. Then I use RequestPolicy and Privacy Badger and sometimes NoScript to ensure the page itself isn’t doing bad things. But with Atlantis, I have to trust that every single website that contains anything I care about has done research on every component it depends on. And I already distrust these websites, by and large.

The examples didn’t have any security on them. They referenced compilers and stuff via unencrypted links. No checksums. No signing. So my bank does due diligence and determines that the parser and renderer and compiler it’s using are secure — but I get MITM’d and all my bank details are siphoned off to Russian hackers. Or the bank uses HTTPS and the server hosting the parser and renderer gets hacked and I get sad.

So instead my bank hosts everything itself. And there’s a bug in the RPC handler that allows arbitrary code execution. I know about it, upstream knows about it, there’s a fix released…but my bank is still using a three year old version and the Russian hackers are starting to feel sorry for me.

Fun, yes?

Security fixes

Fortunately, the security story is moderately straightforward. We have a central repository of trusted services. You can request specific versions of a service, but the browser doesn’t guarantee that it will honor your version request. For instance, if you request mshtml11.3.7, the browser might give you mshtml11.3.12. In point of fact, we’ll only support major.minor version requests; you always get the most recent patch version.

A service build is automatically retired after eighteen months to mitigate security risks. This is why the browser won’t always honor your version requests. You might have asked for mshtml4.0, but nobody’s been maintaining that for a decade and more, so the browser will give you the nearest equivalent.

Since we’re using a silo for trusted services, we can use a bunch of things like signed builds and certificate pinning to reduce your ability to muck about with my trusted resources.

Finally, Atlantis internally has an RPC mechanism defined. You can post arbitrary data to arbitrary pages. That’s a problem. You need a way to lock that down. Without a means of restricting it, I can construct a page that will fuzz your other open tabs. Perhaps you require a handle to a page in order to send RPCs, and the only way of getting a page ID is by opening it (or receiving the ID via an RPC). Perhaps there are named RPC channels that a page must enroll in, and the browser automatically drops RPCs that aren’t supported.

Privacy

One big thing that web devs tend to want is analytics. I’m not so keen on being tracked. It’s straightforward in Firefox to reduce tracking: suppress the HTTP referer header, add a pre-request hook that will disallow cross-origin requests contrary to a defined policy, and delete cookies on schedule. Maybe pass the Do Not Track header too.

How do I do that in Atlantis?

I have to trust Atlantis to pass the referer header, then I have to use a local proxy for everything it does. That works okay for HTTP, but it doesn’t work with HTTPS. With HTTPS, the referer header is encrypted and my proxy can’t see it. Or my proxy needs to have a certificate that the browser implicitly trusts for every site, and I have to disable certificate pinning in the browser.

This goes into the general realm of promoting user agency vs promoting developer agency.

Technical aspects

Atlantis uses abstract syntax trees as the basis of everything.

Abstract syntax trees let you abstract over different programming languages a little. Not much. You’re stuck with one set of semantics. It’s like trying to implement C on the JVM — you just can’t do it. You can do some of it, but you don’t get pointers and you don’t get structs, so you’ll end up writing a virtual machine on top of the JVM to support those.

So that’s a constraint that limits you a lot. The obvious alternative is LLVM bytecode. Google Chrome’s NaCL, if I recall correctly, uses LLVM, so it’s possible to sandbox it.

The other problem I have with the version presented is the rendering system. It builds a bitmap image and sends it off to the browser. I’m not sure how that will work for accessibility. I’m not sure how to highlight text and copy it that way. It’s good enough for research, not good enough for real life. And if you punt this to the service developer, 95% of them will ignore accessibility entirely, and 30% of them will forget about copy/paste.

What’s good?

If you split up a browser into multiple services running on a base, things get nicer on the technical side.

The core of the browser can just be a few process-oriented APIs, a series of drawing primitives, IO, etc. That’s simple enough to implement on its own.

Independent parties can develop services to implement, say, SVG or MathJAX. And with agreed-upon APIs, I can use the same service on IE and Firefox. This is good for web standards — they can be implemented more quickly, and it’s easier to track down the source of incompatibilities when you can insert the W3C version of MathJAX into Firefox, observe how it renders, and then swap out Trident for Gecko to see if there’s a bad interaction between Gecko and W3C:MathJAX that’s messing up the output.

Then I can implement browser addons as services that the user brings in. For special purposes, when the user allows, pages can do nonstandard things too, implementing their own services. For instance, the Elm programming language provides a moderately different layout system that tends to be pixel-based. (The relatively recent html package offers access to DOM, but the older stuff doesn’t.) That could be implemented as a new rendering service. Or if we find a way to provide sandboxed GPU access, we could get a Unity3D service. Or with DRM, a page could supply a service that converts encrypted audio to WAV.

There’s a lot of possibility here. And I’m sure that James Mickens has considered some of it. A one-hour talk isn’t the best for conveying the full depth of your vision. I’m excited to see his continuing work.

Not Pope Innocent

Fresh off the press, my new hit single, Not Pope Innocent!


I think I did it again
I made you believe it’s God’s holy war
Oh, Catholics
I might write papal bulls
But that doesn’t mean I’m your next saint
Cause to win at politics
That is just how I want to beat
(di Sezze)

Oops, I did it again
Invaded the Jews, got lost in crusades
(Oh one church, true church)
Oops, you think I’ve God’s love
That I was sent from abo-o-ove
I’m not Pope Innocent

You see my pawns are arranged
You’re wasting away
Wishing the cardinals would all vote your way
I snort, awaiting the day
Can’t you see you’re a fool flubbing political plays?
But to win at politics
That is just how I want to beat
(di Sezze, Sezze)

Oops, I did it again
Bribed the synod, rewrote church doctrine
Oh Catholics, Catholics
Oops, you think it’s God’s plan
The Son of Ma-a-a-an
I’m not Pope Innocent

(Pope Alexander! Wait, before you go, there’s something I want you to have.)
(It’s slimy! But wait, isn’t this…?)
(Yeah, yes it is.)
(But I thought the Spear of Longinus was at the bottom of the Dead Sea.)
(Well, Your Holiness, I went down and brought it back for you.)
(I grant you lands and my blessing for your service to the Holy Church.)

Oops I did it again to your church
Got lost in crusades oh synod
Oops you think that I’m sent from above
I’m not Pope Innocent.

(In the interests of historical accuracy, I must point out that Pope Alexander III did not, in fact, call any crusades.)

Poor worldbuilding in the Firebird Trilogy

I’ve been seeing a lot of recommendations for the Firebird’s Son trilogy by Darth Marrs. They praise the worldbuilding. I really don’t see it, and I’m tired of repeating why, so I’m cataloguing it here.

The Firebird’s Son trilogy is a work of Harry Potter fan fiction. The premise is that, in the UK, there is a strongly matriarchal society, there are about three women for every man, and birth rates are rather low. Furthermore, men and women bond the first time a man has sex, and that bond transfers magic power from the man to the woman.

The sheer number of things that don’t make sense in the story is mind-boggling.

Men are rare — or not

Men are rare and therefore valuable, right? Except almost nothing in the story reflects that.

If men were rare, they would be protected, from infancy to death. What we actually see is a surprising degree of callousness. Boys are allowed to play Quidditch, for instance, despite the risk of injury. And men are even allowed to join the DMLE.

Why? It doesn’t make any sense!

Furthermore, malnutrition can erode your magic power, which (as we’ll discuss soon) impacts your ability to form bonds. But nobody checked on Harry to see that he was being well fed.

Men lose magic power

Men are expected to enter a bond and lose most of their magic. Then, if they have enough magic left, they’re expected to enter another bond. And again, and again, until they’re barely able to hold a wand.

(Squibs, meanwhile, supposedly tend to leave magical society because it’s too tough for them to be able to see magic without being able to perform it. How much worse must it be to be able to do so little when only a few years prior you could do so much more!)

Boys need some amount of magic training, otherwise they’re a danger to themselves and others. However, they don’t need as much as women. It’s a waste of their time and their hope to teach them more magic than necessary for them to control their power.

It would be sensible for men to take administrative positions that benefit less from magic. Accounting should be a popular course for boys.

Bonding

Bonding is spoken of as if it’s serious business. On the ground, people treat it as if it’s important and weighty. At the administrative level, nobody seems to care much about it.

Bonding is the core of family structure, so it’s not something you want to jump into. Also, bonding someone too young will kill them or render them catatonic. So a sensible person would try to keep boys and girls separate, right? At least in unsupervised conditions — kids aren’t going to molest each other under a teacher’s watchful eye.

But Darth Marrs thinks it’s sensible to keep everyone in the same boarding school, on the same Quidditch teams, with easy access to each other. The only vague sop to separating them he institutes is that they can’t have classes together until fourth year — as if kids tend only to be sexually active before they turn fourteen while in a classroom.

The covens (old, powerful groups of witches) tend to control bonding. They would want to crack down on unauthorized bonding, so segregated classes would be high on their agenda. Women who bond without their consent tend to be in for a rough time (though they can gain status from having a powerful husband), and we don’t really see love matches. Plus if there’s a tacit attempt to get men to rebel more often, bonding won’t help.

Really, no authority figure benefits from unauthorized bonding, and all of them should be in favor of gender segregation.

Declining population

One problem the series tries to drive home is the declining population in Magical Britain. Everyone talks about it, so you’d expect someone would do something about it.

Combine that with the high gender imbalance and the limits on polygyny, and you should naturally get a high rate of halfbloods with magical mothers. Do we? It doesn’t seem like it. If there were, we’d see population growth, since some 70% of the population is capable of bearing young. Instead of needing two children to break even, as we see in real life, two children would give a 40% increase.

If that doesn’t work, men can’t form a second bond without their first bond-holder’s consent. So you can have a high degree of polygyny without lots of bonding, right? Except nobody has thought of that.

If that also doesn’t work, artificial insemination.

And if that also doesn’t work, research other methods of conception. Find a way to terminate bonds without hurting everyone to ensure that men always have bondmates that can bear children. Try something else. Try something!

But nobody tries anything. Nobody cares.

Similarly, Muggleborns tend to die because they don’t have a witch’s breastmilk. But there’s no attempt to locate them and find them wetnurses, which would add to the magical population.

Nitpicks

Everything above was major, story-changing stuff, things that would require the plot to be scrapped or at least major portions of the story to be rewritten. There are also plenty of minor things that could be fixed with smaller modifications:

  • Eleven-year-olds engage in sexual banter.
  • Harry is an Aether, which is rare and unusual and indicates upheaval. Is it worth keeping him around and in possession of his faculties? Why isn’t he tracked more closely?
  • There are not just traditions but important social structures around polygyny. The current situation has been around for nine generations. Is that really long enough to create these social structures?
  • Kingsley Shacklebolt is a member of the DMLE. Safety aside, does he have enough magic power to do his job?
  • Students take 14 hours per week of Muggle Studies. It’s unclear how many years she teaches, but if it’s even two full years, she’s working very long hours. (Recall that there are at least two classes.)
  • Lucius Malfoy went to Azkaban. Does it have any impact on fertility? The bonds? If so, it probably wouldn’t be allowed, but it doesn’t seem likely that Dementors would leave you in a good condition for begetting more children.
  • Magical people are “genetically incompatible” with Muggles starting when they turn forty. THAT’S NOT HOW GENETICS WORKS.

What can we learn from this?

Overall, it seems like Darth Marrs had an idea of what style of world he wanted and tried to patch it to make it make sense. But he stopped after one set of patches, ending up with something that only bears up under five minutes of scrutiny, if that. This might be better than average, but it’s still lazy or inept, and I can’t tolerate it.

If you want bulletproof worldbuilding, you need to be adversarial about it. You need to be able to ask in one moment “how do I make this world into one that I want?” and in the next “how might people living in this world break it?” You have to turn the pillars of your universe into problems to be solved and then try to solve them.

Or if that’s too much, you can simply say: there are people working on these problems, but they’ve encountered difficulties and have no working results yet.

Darth Marrs didn’t, and the story suffers for it.

Effective enchanting in GURPS

This post is about GURPS. It’s a tabletop roleplaying game, vaguely similar to D&D.

Enchanting

Enchanting in GURPS is a moderately debated topic.

The cost of entry for enchanting is Magery 2 and ten spells from different colleges. Then you need the Enchant spell at effective skill 15, plus skill 15 for whatever other spells you want to enchant on an item. That’s only 35 character points, and it’s normal to start with 100-150. So far, so good.

Enchanting costs energy points. There are two ways to enchant: quick and dirty, or slow and sure. With quick and dirty enchantment, you spend your own energy in the form of fatigue points to create an item quickly. With slow and sure enchantment, you use ambient magic instead.

Assuming your energy reserves are large enough and you buy level 20 on the Recover Energy spell, you can enchant 528 points of magic items per day using the Quick and Dirty method. This requires you to have at least 48 points of FP / Energy Reserve available.

You can enchant magic items to produce any effect you could create with a spell. There are a bare handful of spells you can enchant for 50 energy. Useful spells tend to start around 400 energy. To buy an Energy Reserve for enchanting, you must spend at least 1.5 character points per point of Energy Reserve — so if you want to enchant a vaguely useful spell with the Quick and Dirty method, you need to spend six hundred character points.

But wait! There’s Slow and Sure enchantment!

Slow and Sure enchanting doesn’t require any huge energy reserve. Instead, it requires time. Specifically, it requires one day per energy point of the item. Divide that by the number of people involved in the enchantment — oh, but reduce the effective skills for enchanting by one for every person you add. And you can’t interrupt the process without the enchantment regressing.

For instance, I’ve got a ton of downtime between campaign arcs. I decide to take my enchanter and make some sweet magic item. A Staff of Lightning, specifically. I must already know Lightning at skill level 15 or more to make this item. And at the end, I have a Staff. It cost me 800 days and $1200 to build.

How good is this?

When you know a spell at skill 15, you can cast it with one less energy point, but you need to say a few quiet words or make a small gesture to cast. If you are gagged and your fingers are bound, you can’t cast a spell at skill 15 — but you can use a magic item. So building the Staff of Lightning was a bit of a tradeoff, yes?

Not really. Your captors aren’t going to leave you a staff but bind your fingers and mouth. If you enchanted a Ring of Lightning, that would be more reasonable, though in a setting with magic items, you’ll probably be searched for jewelry. (Time to make a Necklace of Lightning instead and tie it around your thigh.) Spells, on the other hand, can’t be removed from you.

So that’s two disadvantages to the item — extra FP to cast, -5%; can be stolen, -30% — and one advantage — no signature, +20%. So it’s objectively worse than just casting the spell, according to the game’s builtin cost metrics.

I could alternatively have done self-study the whole time. During the enchantment process, I gained eight character points to be divided between Lightning and Enchant — probably evenly. Evening self-study earns me as many points again, but on any skills I want.

If I had gone for self-study alone, I could have gotten Lightning to skill 20. This would reduce my casting signature, just like building the magic item. 125% of the time to get +35% the advantage.

The Invisible Hand?

The only use for enchanting items is when you purchase them, not when you produce them, right?

You can trade the thirty points you spent on Very Wealthy and the twenty on Independent Income, and you save up for the better part of a year to buy that Staff of Lightning right out. But you could have spent those points to buy Magery 1 and Lightning at skill 15, with six other okay spells that you can try.

But points spent on Wealth get a superlinear payout. I could spend 75 points on Multimillionaire 1 and ten on Independent Income. This gets me enough starting cash to buy three magic items like a Ring of Lightning, and I can buy another every month. Good, yes? I do need to spend at least fifteen points on Magery, though — Lightning has an effect based on my level of Magery, and that’s not inherited from item creation.

So, I spent 100 points on a character who can cast three spells and gets an extra spell or two free per month. I could alternately have spent a hundred points on Magery 3 and a fair number of spells that I could use from the start of the campaign.

When should I enchant something for my party?

As far as I can tell, it makes sense to seek a magic item when you have a spell you must be able to cast, but you can’t spend the time it would require to learn that spell and can’t hire someone who knows it. (You have to be a mage to use most magic items, and you need to have access to the related college. So there’s no avoiding Magery.)

So presumably you are not a single-college mage, and the spell has a large number of prerequisites, and you’ve got a giant pile of cash.

But there’s another side to this: you need enough of a market for magic items that people have already produced the item you need. Since it’s pretty much the realm of multimillionaires to purchase magic items, that’s not really happening. As a GM, I strive to produce consistent worlds, so I could only allow a bare handful of magic items, crafted by tinkerers and people thinking more of their legacy than their own power.

Perhaps in advanced age, mages tend to take up either enchanting or teaching, so with a thousand mages in existence you tend to get another couple dozen magic items a year. But this still makes magic items rare and especially valuable.

In short, magic items are not a commodity, and you wouldn’t usually want to use one, so I’m not sure why they’re even in the game.

Making magic items better

The “can be stolen” limitation isn’t going away. We can reduce it by enchanting bracers, chokers, and rings instead of staves.

The extra FP cost is troublesome. Most of the interesting spells cost at least two FP. At skill 15, that’s a 50% reduction — you can cast the spell twice as often on your own. But we can address that by embedding powerstones.

An embedded powerstone gives you two extra points of use per day. (Double in a high mana environment.) So if you enchant up a ten-point powerstone — which requires knowing the Powerstone spell at level 15 — you can store twenty FP worth of item use. You need at least 20 FP to start with in order to cast Powerstone, though.

While this doesn’t fix the idea behind enchanting, you can add an Ally group of enchanters (6-10 individuals, appears fairly often, Duty to assist them with their enchanting, etc). Unfortunately, to take advantage of a group of enchanters, the leader of the enchantment needs to know the spell at skill 15 + the number of assistants. Everyone else needs the spell at skill 15 to contribute any appreciable amount of energy. So that won’t get you any magic items for spells you can’t cast, though it might get you a magic item in a few hours that gives you an effective +5 to skill on one spell (in exchange for costing more fatigue).

I need more power!

In a higher-tech environment, TL5 and above, you can use electricity to power spells with the Draw Power spell (Magic 180). You get ten energy points per kilowatt-hour of electricity. This makes quick and dirty enchanting much easier.

You do need at least 60 FP to maintain this spell for the full hour it takes to cast the most basic enchantment — or get a friend who knows Draw Power and Lend Energy, and they can restore energy to you while you enchant. (You could even enchant this into an item…)

Now you get ten energy points per hour of casting per kilowatt of generation. Get six portable generators off Amazon and you can draw all the power you need that way, for up to twelve hours of enchanting — more, if your friend refuels them for you.

If you’re looking at a magitech world, you can produce a windmill and power it by enchanting Wind (Magic 195). A moderately sized windmill could easily produce enough power for one person to enchant all day long with the Quick and Dirty method.

If you use this technique, you still need to add Power 2 to your magic items to make them useful to you. That will cost you ten hours.

With this technique, the Vigil spell (Magic 138), and a rotating team of people (or magic item) casting Lend Energy, you can produce a magic item worth over 3,000 energy points.

The fixed costs are the generator (which I’d put at $50,000 to $200,000, 5% maintenance per year), the Wind enchantment ($20,000 using default calculations, 5% maintenance per year), and the enchanting lab (wild guess $5,000 to $10,000; 20% maintenance per year). The labor costs are two Comfortable salaries for one hour per hundred points of enchantment, plus a surcharge over 12 hours. (A comfortable salary works out to $14/hour at TL5, $19/hour at TL6, etc.)

With this, a normal magic item at TL6 will cost $50 + $28 for every 100 energy points, with a margin for profit. (The rate increases around 1200 energy points.) A Staff of Lightning will probably retail for $350. Adding Power 2 to it will double the price.

That’s less than a broadsword. On the other hand, you can use this technique to produce broadswords with Shape Metal, which will reduce the cost of a sword. And there are all sorts of things you can do with magic: create food with it; create water, turn it into hydrogen fuel cells, and turn that hydrogen into Essential Fuel to power your spells; rings of Illusion Disguise and Warmth in lieu of clothing…

Can we save low-tech magic?

If you have a TL4-ish campaign, how do you make magic items worthwhile?

Add Power 1 by default

Right now, there are advantages to a few magic items even without Power 1. An Earring of Freedom or Ring of Blink helps when you’re trapped. But these tend to be marginal. You’re in a situation where you can’t use your innate magic freely enough, so you turn to a magic item instead.

But for most items, you’ll use them when you’re already free to use spells. If you enchanted the item, it’s worse than casting the spell. At least with Power 1, you get even with casting directly.

Base item power on Enchant

Right now, you need to know the spell at skill 15 in order to enchant it, and the power of the item is based on how well you know the equivalent spell. If we relaxed that and made it so that a high Enchant skill increases the quality of the item above your skill level with the spell, you have an incentive to enchant things.

For instance, I learned Lightning at skill 12, and I have Enchant at skill 25. I produce a Staff of Lightning that acts as if I’m casting with skill 19, perhaps (halfway between, round down).

Improve Slow and Sure speed

If Slow and Sure enchanting provides one energy point per hour rather than per day, you can afford magic items without being a multimillionaire. You can craft them in your downtime — your Staff of Lightning will only take you three months and change, being equivalent of two character points in self-study. But that’s still worse than just casting the spell, in most ways, and you spent probably 15 points on Enchant and five points on miscellaneous spells you wouldn’t otherwise get.

Split the party

If your party gets split often and there are a few essential mage spells, enchanting can ensure that nobody’s without those essential spells.

You might need to relax the ubiquitous “Only usable by mages” requirements, or give everyone Magery 0.

Cut energy costs

If you divide energy costs for magic items by a reasonable fraction, like 5x, you improve Quick and Dirty enchanting tremendously and make Slow and Sure enchanting feasible.

Add more energy sources

Paut is an expensive alchemical concoction that you can consume to get energy to spend on casting. There are alternate rules allowing rare materials to defray the energy costs of item creation.

You could rule that some fraction of the energy cost of an item can be paid with items, and you can price those items appropriately for your campaign. For instance, 80% of the item cost can come from mandrake and wolfsbane extracts and gold-inlaid obsidian, at a minimum of $1/point. That Staff of Lightning with Power 2 costs $1440 plus 360 days’ labor instead of 1800 days’ labor. You dropped the cost by 75% or more. That’s still only available for Wealthy people for now, but by TL6 they’ll be commonplace, even without turning electricity into enchantments.

Who needs humans?

Instead of enchanting one item, I enchant a golem to enchant items.

I might only be able to get one type of item from a golem. And they might only be able to do slow and sure enchanting. But imagine a giant warehouse with nothing but golems in it, each churning out magic items.

It might take a year to produce a decent golem, but they tend to last, don’t need rest, and don’t ask for wages. After ten years, I could have ten golems, each producing one item per year, for a total of 45 items produced. If you’re sticking with hand-crafting everything, you’ll only have ten items to show for it.


GURPS enchanting doesn’t afford feasible magic items until you mix in technology, but there are plenty of things you can do to make a plausible industry from it.

Human experimentation for fun and profit

I want to experiment on my users. How do I do it?

Yesterday I talked about creating a configuration service. We’re going to leverage that. An experiment is just a configuration rule that’s sharded among your userbase.

But is it that simple? Usually not. Let’s dive in!

Choosing a treatment

Iacta alea est

The easiest way to go is to just toss the dice.

You define your treatments and their percentages and roll 1d100. The user gets into whatever treatment corresponds to the value on the die. For instance:

function getTreatment(treatments, control) {
	var value = Math.random() * 100;
	for (var i = 0; i < treatments.length; i++) {
		value -= treatments[i].percent;
		if (value < 0) {
			return treatments[i].value;
		}
	}
	return control;
}

What's this good for? Things where you're okay with changing behavior between requests. Things where your users don't need consistency. Probably where your users won't notice a lot. Like Google's 41 shades of blue.

Introduce a discriminator

So you determined you want each user to have a consistent experience. Once they enter an experiment, they're in it until the experiment finishes. How do we do that?

The simplest way is to introduce a pivot value, something unique to the user:

function toHash(str) {
	var hash = 1;
	for (var i = 0; i < str.length; i++) {
		hash = hash * 33 + str.charCodeAt(i);
	}
	return hash;
}

function getTreatment(pivot, treatments, control) {
	var value = pivot % 100;
	for (var i = 0; i < treatments.length; i++) {
		value -= treatments[i].percent;
		if (value <= 0) {
			return treatments[i];
		}
	}
	return control;
}

config.treatment = getTreatment(toHash(user.email), treatments, control);

What's great about this? It's simple, that's pretty much it.

What's terrible about it? The same users get the first treatment in every experiment. If you want to roll out updates to 1% of your users at a time, the same person always gets the least tested, bleeding edge stuff every time. That's not so nice, and it opens you up to luck effects much more.

The victorious solution

Quite simple: instead of basing your pivot only on the user, you base it on the user and the experiment. For instance:

var experiment = 'home screen titlebar style - 2016-06-12';
var pivot = toHash(user.email + experiment);
config.treatment = getTreatment(pivot, treatments, control);

This effectively randomizes your position between experiments but keeps it consistent for each experiment. We'll have to adjust the API to make it easier and more obvious how to do the right thing:

function getTreatment(userId, experimentId, treatments, control) { ... }

Dependencies

You will often have several simultaneous experiments. Sometimes you'll need a person to be enrolled in one specific experimental treatment for another experiment to even make sense. How do we do this?

First off, we'll adjust our treatment API so that, instead of an array of treatments, you send a JS object:

var homeScreenTreatments = {
	control: {value: {bgColor: 'black', fontSize: 10, bold: true}},
	t1: {value: {bgColor: 'black', fontSize: 12, bold: false}},
	t2: {value: {bgColor: 'gray', fontSize: 10, bold: true}}
};

Next, we'll stash our treatment decisions in the framework (in a new cache for each script run). Then we'll let you query that later. For instance:

var homeScreenExp = 'home screen titlebar style';
config.homeScreen = getTreatment(
	user.email,	homeScreenExp,	homeScreenTreatments);
// 50 lines later...
if (hasTreatment(homeScreenExp, 't2')) {
	config.fullNightModeEnabled = false;
}

We can alternatively bake experiments into the rule infrastructure, for instance, where a rule can specify a config section it supplies, treatments, and percentages. This will end up with a complex UI that does 90% of what users need in an inflexible way, but that's going to be troublesome.

However, what we want to do is store a collection of experimental treatments on the config object. We'll get into that later, but it looks like:

config.experiments = {
	'home screen titlebar style': 't2',
	'wake up message': 't5'
};

Incremental releases

Another common thing people want to do is roll out new features gradually. Sometimes I want to roll it out to fixed percentages of my users at fixed times. One option is to introduce a "rule series", which is a collection of rules, each with a start and end date. No two rules are allowed to overlap.

So I set up a rule series "roll-out-voice-search" with a simple set of rules:

// in the UI, I set this rule to be effective 2016-06-10 to 2016-06-15
config.voiceSearchEnabled = getTreatment(
	user.email,
	'roll-out-voice-search',
	{
		control: {value: false},
		enabled: {value: true, percent: 1}
	});

And I make a couple more rules, for 10%, 50%, and 100%, effective in adjacent date ranges.

But this is a common pattern. So we can simplify it:

config.voiceSearchEnabled = gradualRollout({
	user: user.email,
	rollout: 'roll-out-voice-search',
	start: '2016-06-10',
	finish: '2016-06-25',
	enabled: {value: true},
	disabled: {value: false}
});

And we can very easily interpret that to a linear rollout over the course of fifteen days based on the user's email address.

Metrics

You don't just assign experiment treatments to people and forget about it. You want to track things. And that means the client needs to know your entire configuration. But the entire configuration is sometimes obtuse to work with. So you want to see experimental treatments directly, by name, not as a bunch of configuration values that you have to backtrack into an actual value.

Separately, you need a system to record client events, and you submit the experiment treatments to it as tags. Then you can correlate treatments to behavior.

Speed

One complaint you might have is that this approach always fires every rule in sequence, and that's slow. The Rete algorithm is used in a wide variety of rule engines and is faster than naive reevaluation, so we should use that here, right?

Wrong. The Rete algorithm is complex and requires us to build up a large data structure. That data structure is used when a small portion of the input changes, letting me avoid recalculating the whole result.

In my case, I'm getting a series of new configurations, and each one is unrelated to the last. I might get a call for one collection of rules and then not get a call for it in the next hour. Or a rule might throw an error and leave the Rete data structure in an invalid state. Or I might have to abort processing, again leaving the data structure in an invalid state.

Future directions

The main target here is to look at what people are doing and try to provide more convenient ways of doing it.

We also want to provide the ability to mark portions of metadata as private information, to be redacted from our logs.

IP geolocation would be handy, allowing us to tell people where the client is located rather than relying on the client to self-report. We can grab a country-level GeoIP database for $25/month, city-level for $100/month. This would be strictly opt-in, possibly with an additional fee.

Finally, we have to turn this into a proper service. Slap a REST API in front of it, add in HMAC authentication and API usage reporting, service health metrics, and load balancers.

That concludes our short on creating an experiment system.

Configuration as a service

I’m working on a rule engine targeted toward configuration-as-a-service and experiment configuration. Since it’s nontrivial and not much exists in this space, I thought I’d talk about it here for a bit.

Configuration as a service? Huh?

There are a few things this can be used for.

Recall when Google wanted to test out 41 different shades of blue for search result links? They used an experiment system to enroll randomized segments of the userbase into each treatment. That’s one use case we want to support.

Let’s say I’m implementing a phone app and it’s got a new feature that I want to get out as soon as possible. I need to QA it on each device, but I’m pretty sure it’ll just work. So I ship my update, but I keep the feature off by default. Then I add a rule to my configuration service to turn it on for the devices I’ve QA’ed it on. As I finish QA on a given device, I update the rule to turn the feature on for that device.

Or maybe I need to take legal steps in order to provide a feature in a given country. The client sends its location, and I’ve added rules to determine if that location is one where I can legally enable that feature. It might also include, for instance, which of my API endpoints it should use to store any server-side data — some countries require user data to remain in EU borders.

What are we implementing?

We want to offer a multitenant service so that you can pay us a bit of money and get our glorious configuration service.

You will submit JSON metadata to us and get JSON configuration back. You will enter in rules in a UI; we’ll execute those rules against the metadata to get your configuration. The rule UI will let you say: this rule comes into effect on this date, stops on that date; it’s got this priority; let’s test it against this sample configuration… Not too complex, but some complexity, because real people need it.

There are two basic parts: first, a service to execute rules; then, a website to manage rules. In between we have a rule engine.

Any significant caveats?

We’re running a configuration / experimentation service. We want third parties to use it. That means security.

We need to prevent you from calling System.exit() in the middle of your rules and bringing down our service. All that normal, lovely sandboxing stuff. Timeouts, too.

Also, you’re updating your rules pretty frequently. We need to be able to reload them on the fly.

Rules are code, and code can have bugs. We’ll have to watch for thrown exceptions and report them.

What’s already out there?

Drools

The heavy hitter, Drools has been around since the dinosaurs roamed the earth. It’s not easy to work with. It takes way too much code to initialize it, and most of thath code is creating sessions and factories and builders and containers that have no discernable purpose. If you try to read the code to figure out what it all means, prepare for disappointment: it’s a snarl of interfaces and fields set via dependency injection and implementations in separate repositories.

Drools rules accept Java objects and produce output by mutating their inputs. That means I need a real Java class for input and another for output. Their rule workbench lets you create your Java classes, but that means you need to publish your project to Maven. And loading multiple versions of a rule is an exercise in pain.

On the plus side, it gives you a rule workbench out of the box, and it has a reasonable security story. However, it doesn’t have any way to limit execution time that I’ve found, meaning you have to run rules in a separate thread and kill them if they take too long. This isn’t nice.

Easy Rules

The new kid on the block, it uses Java as a rule language, which brings us to JAR hell like Drools. Unfortunately, it doesn’t supply a workbench, it doesn’t offer a way to provide inputs and retrieve outputs, and it doesn’t have any sandboxing or time limits. At least the code is relatively straightforward to navigate.

Everyone else

OpenRules is based on Excel. Let’s not go there.

N-Cube uses Groovy as a DSL, which implies compiling to a JAR. It’s also got almost no documentation.

There are several others that haven’t been updated since 2008.

So they all suck?

No. They’re built for people who want to deploy a set of rules for their application within their application. They’re for people who trust the people writing business rules. We are building a service whose sole purpose is to supply a rule engine, where untrusted people are executing code.

When you are building a service specifically for one task, you shouldn’t be surprised when off-the-shelf components don’t cut it.

When you are building a multitenant service, libraries performing similar tasks often fall short of your needs.

What do we do?

The core thing that our service does is run user code. Let’s bring in a scripting engine. And since we’re going to accept JSON and emit JSON, let’s use a language that makes that natural. Let’s use Javascript.

The Rhino scripting engine makes it easy to run code and easy to filter which classes a script is allowed to use. Let’s just use that. Now we accept a rule from a user, wrap it in a light bit of code, and run it:

// we inject inputString as the raw json string
var input = JSON.parse(inputString);
var output = {};
// insert user code here

When we want to run it, we can just write:

Context ctx = Context.enter();
ctx.setClassShutter(name -> {
	// forbid it from accessing any java objects
	// (as a practical matter, I probably want to allow a JsonObject implementation)
	return false;
});
if (rule.compiledScript == null) {
	compile(rule);
}
Scriptable scope = ctx.initStandardObjects();
scope.put("inputString", scope, Context.toObject(inputString, scope));
rule.compiledScript.exec(ctx, scope);
response.write(scope.get("output", scope));

That’s not the whole story — we want to limit the amount of time it has to finish executing, set up logging and helper functions, all that jazz. We need to locate the rule somehow. We probably have multiple rules to run, and we have to propagate partial output objects between them (or merge them after). We also have to determine what order they should run in.

But, for what this does, it’s maybe half as much code as Drools takes.

What’s so much better about your approach?

The first huge advantage is that I’m using a scripting engine, one that doesn’t shove a bunch of classes into the global classloader. That means I can update everything on the fly. I’d get the same if I made Drools talk JSON, but that’s harder than writing my own engine.

Compared to Drools or EasyRules, I don’t have to maintain a build server and figure out how to build and package a java project I generate for each rule. I just shove some text into a database.

Javascript handles JSON objects quite well, which means not having to create a Java class for every input and output. That is the largest part of savings — Drools would be acceptable if it could talk JSON.

The people writing these rules are likely to be developers, not managers or analysts. They probably know Javascript, or can fake it pretty well.

What’s the catch?

Drools is huge and complex for three reasons.

First, it had significant development going on in an age when huge complex coding was de rigeur in Java.

Second, it had a separation between API and implementation enforced for historical and practical reasons.

And third, it solves complex problems.

You want your rules to just work. Drools has a lot of thought behind it to determine what “just working” should look like and make sure it happens. We haven’t put in that thought. I think the naive approach is pretty close to the intuitive result, but I haven’t verified that.

The rules accept and generate JSON. This means you lose type safety. On the other hand, the API accepts and generates JSON anyway, so this is pushing things a step further. Not great, but not the end of the world.

Javascript is kind of ugly, and we’re promoting its use. It’s going to be a bit crufty and verbose at times. The point of business rules in the Drools language or what-not is so that managers can read the rules, and we’re kind of missing that.

What do these rules look like?

An example rule:

if (input.device.name == 'bacon') {
	output.message = 'Congrats on your OnePlus One!';
}
if (input.device.name == 'bullhead') {
	output.message = 'Congrats on your Nexus 5X!';
}
if (input.device.uptime > 31 * 24 * 60 * 60) {
	output.sideMessage = "It's been a month. You might want to reboot your phone.";
}
output.homeScreenTreatment = Treatments.choose(
	'homeScreenTreatment',
	input.userId,
	{
		control:  {value: {backgroundColor: 'black'}},
		grayBg:   {percent: 5, value: {backgroundColor: 'gray'}},
		grayBold: {percent: 5, value: {backgroundColor: 'gray', bold: true}}
	}
);

I’ll talk a bit more about the experiment side next time.