The Atlantis browser concept review

My reaction to this is a giant NO.

Extracting a browser kernel that runs services on top, that’s great. The rest, not so much. Right now, I have to trust Apple or the KDE devs or Google or Opera or Microsoft that my browser isn’t spying on me. Then I use RequestPolicy and Privacy Badger and sometimes NoScript to ensure the page itself isn’t doing bad things. But with Atlantis, I have to trust that every single website that contains anything I care about has done research on every component it depends on. And I already distrust these websites, by and large.

The examples didn’t have any security on them. They referenced compilers and stuff via unencrypted links. No checksums. No signing. So my bank does due diligence and determines that the parser and renderer and compiler it’s using are secure — but I get MITM’d and all my bank details are siphoned off to Russian hackers. Or the bank uses HTTPS and the server hosting the parser and renderer gets hacked and I get sad.

So instead my bank hosts everything itself. And there’s a bug in the RPC handler that allows arbitrary code execution. I know about it, upstream knows about it, there’s a fix released…but my bank is still using a three year old version and the Russian hackers are starting to feel sorry for me.

Fun, yes?

Security fixes

Fortunately, the security story is moderately straightforward. We have a central repository of trusted services. You can request specific versions of a service, but the browser doesn’t guarantee that it will honor your version request. For instance, if you request mshtml11.3.7, the browser might give you mshtml11.3.12. In point of fact, we’ll only support major.minor version requests; you always get the most recent patch version.

A service build is automatically retired after eighteen months to mitigate security risks. This is why the browser won’t always honor your version requests. You might have asked for mshtml4.0, but nobody’s been maintaining that for a decade and more, so the browser will give you the nearest equivalent.

Since we’re using a silo for trusted services, we can use a bunch of things like signed builds and certificate pinning to reduce your ability to muck about with my trusted resources.

Finally, Atlantis internally has an RPC mechanism defined. You can post arbitrary data to arbitrary pages. That’s a problem. You need a way to lock that down. Without a means of restricting it, I can construct a page that will fuzz your other open tabs. Perhaps you require a handle to a page in order to send RPCs, and the only way of getting a page ID is by opening it (or receiving the ID via an RPC). Perhaps there are named RPC channels that a page must enroll in, and the browser automatically drops RPCs that aren’t supported.

Privacy

One big thing that web devs tend to want is analytics. I’m not so keen on being tracked. It’s straightforward in Firefox to reduce tracking: suppress the HTTP referer header, add a pre-request hook that will disallow cross-origin requests contrary to a defined policy, and delete cookies on schedule. Maybe pass the Do Not Track header too.

How do I do that in Atlantis?

I have to trust Atlantis to pass the referer header, then I have to use a local proxy for everything it does. That works okay for HTTP, but it doesn’t work with HTTPS. With HTTPS, the referer header is encrypted and my proxy can’t see it. Or my proxy needs to have a certificate that the browser implicitly trusts for every site, and I have to disable certificate pinning in the browser.

This goes into the general realm of promoting user agency vs promoting developer agency.

Technical aspects

Atlantis uses abstract syntax trees as the basis of everything.

Abstract syntax trees let you abstract over different programming languages a little. Not much. You’re stuck with one set of semantics. It’s like trying to implement C on the JVM — you just can’t do it. You can do some of it, but you don’t get pointers and you don’t get structs, so you’ll end up writing a virtual machine on top of the JVM to support those.

So that’s a constraint that limits you a lot. The obvious alternative is LLVM bytecode. Google Chrome’s NaCL, if I recall correctly, uses LLVM, so it’s possible to sandbox it.

The other problem I have with the version presented is the rendering system. It builds a bitmap image and sends it off to the browser. I’m not sure how that will work for accessibility. I’m not sure how to highlight text and copy it that way. It’s good enough for research, not good enough for real life. And if you punt this to the service developer, 95% of them will ignore accessibility entirely, and 30% of them will forget about copy/paste.

What’s good?

If you split up a browser into multiple services running on a base, things get nicer on the technical side.

The core of the browser can just be a few process-oriented APIs, a series of drawing primitives, IO, etc. That’s simple enough to implement on its own.

Independent parties can develop services to implement, say, SVG or MathJAX. And with agreed-upon APIs, I can use the same service on IE and Firefox. This is good for web standards — they can be implemented more quickly, and it’s easier to track down the source of incompatibilities when you can insert the W3C version of MathJAX into Firefox, observe how it renders, and then swap out Trident for Gecko to see if there’s a bad interaction between Gecko and W3C:MathJAX that’s messing up the output.

Then I can implement browser addons as services that the user brings in. For special purposes, when the user allows, pages can do nonstandard things too, implementing their own services. For instance, the Elm programming language provides a moderately different layout system that tends to be pixel-based. (The relatively recent html package offers access to DOM, but the older stuff doesn’t.) That could be implemented as a new rendering service. Or if we find a way to provide sandboxed GPU access, we could get a Unity3D service. Or with DRM, a page could supply a service that converts encrypted audio to WAV.

There’s a lot of possibility here. And I’m sure that James Mickens has considered some of it. A one-hour talk isn’t the best for conveying the full depth of your vision. I’m excited to see his continuing work.

Human experimentation for fun and profit

I want to experiment on my users. How do I do it?

Yesterday I talked about creating a configuration service. We’re going to leverage that. An experiment is just a configuration rule that’s sharded among your userbase.

But is it that simple? Usually not. Let’s dive in!

Choosing a treatment

Iacta alea est

The easiest way to go is to just toss the dice.

You define your treatments and their percentages and roll 1d100. The user gets into whatever treatment corresponds to the value on the die. For instance:

function getTreatment(treatments, control) {
	var value = Math.random() * 100;
	for (var i = 0; i < treatments.length; i++) {
		value -= treatments[i].percent;
		if (value < 0) {
			return treatments[i].value;
		}
	}
	return control;
}

What's this good for? Things where you're okay with changing behavior between requests. Things where your users don't need consistency. Probably where your users won't notice a lot. Like Google's 41 shades of blue.

Introduce a discriminator

So you determined you want each user to have a consistent experience. Once they enter an experiment, they're in it until the experiment finishes. How do we do that?

The simplest way is to introduce a pivot value, something unique to the user:

function toHash(str) {
	var hash = 1;
	for (var i = 0; i < str.length; i++) {
		hash = hash * 33 + str.charCodeAt(i);
	}
	return hash;
}

function getTreatment(pivot, treatments, control) {
	var value = pivot % 100;
	for (var i = 0; i < treatments.length; i++) {
		value -= treatments[i].percent;
		if (value <= 0) {
			return treatments[i];
		}
	}
	return control;
}

config.treatment = getTreatment(toHash(user.email), treatments, control);

What's great about this? It's simple, that's pretty much it.

What's terrible about it? The same users get the first treatment in every experiment. If you want to roll out updates to 1% of your users at a time, the same person always gets the least tested, bleeding edge stuff every time. That's not so nice, and it opens you up to luck effects much more.

The victorious solution

Quite simple: instead of basing your pivot only on the user, you base it on the user and the experiment. For instance:

var experiment = 'home screen titlebar style - 2016-06-12';
var pivot = toHash(user.email + experiment);
config.treatment = getTreatment(pivot, treatments, control);

This effectively randomizes your position between experiments but keeps it consistent for each experiment. We'll have to adjust the API to make it easier and more obvious how to do the right thing:

function getTreatment(userId, experimentId, treatments, control) { ... }

Dependencies

You will often have several simultaneous experiments. Sometimes you'll need a person to be enrolled in one specific experimental treatment for another experiment to even make sense. How do we do this?

First off, we'll adjust our treatment API so that, instead of an array of treatments, you send a JS object:

var homeScreenTreatments = {
	control: {value: {bgColor: 'black', fontSize: 10, bold: true}},
	t1: {value: {bgColor: 'black', fontSize: 12, bold: false}},
	t2: {value: {bgColor: 'gray', fontSize: 10, bold: true}}
};

Next, we'll stash our treatment decisions in the framework (in a new cache for each script run). Then we'll let you query that later. For instance:

var homeScreenExp = 'home screen titlebar style';
config.homeScreen = getTreatment(
	user.email,	homeScreenExp,	homeScreenTreatments);
// 50 lines later...
if (hasTreatment(homeScreenExp, 't2')) {
	config.fullNightModeEnabled = false;
}

We can alternatively bake experiments into the rule infrastructure, for instance, where a rule can specify a config section it supplies, treatments, and percentages. This will end up with a complex UI that does 90% of what users need in an inflexible way, but that's going to be troublesome.

However, what we want to do is store a collection of experimental treatments on the config object. We'll get into that later, but it looks like:

config.experiments = {
	'home screen titlebar style': 't2',
	'wake up message': 't5'
};

Incremental releases

Another common thing people want to do is roll out new features gradually. Sometimes I want to roll it out to fixed percentages of my users at fixed times. One option is to introduce a "rule series", which is a collection of rules, each with a start and end date. No two rules are allowed to overlap.

So I set up a rule series "roll-out-voice-search" with a simple set of rules:

// in the UI, I set this rule to be effective 2016-06-10 to 2016-06-15
config.voiceSearchEnabled = getTreatment(
	user.email,
	'roll-out-voice-search',
	{
		control: {value: false},
		enabled: {value: true, percent: 1}
	});

And I make a couple more rules, for 10%, 50%, and 100%, effective in adjacent date ranges.

But this is a common pattern. So we can simplify it:

config.voiceSearchEnabled = gradualRollout({
	user: user.email,
	rollout: 'roll-out-voice-search',
	start: '2016-06-10',
	finish: '2016-06-25',
	enabled: {value: true},
	disabled: {value: false}
});

And we can very easily interpret that to a linear rollout over the course of fifteen days based on the user's email address.

Metrics

You don't just assign experiment treatments to people and forget about it. You want to track things. And that means the client needs to know your entire configuration. But the entire configuration is sometimes obtuse to work with. So you want to see experimental treatments directly, by name, not as a bunch of configuration values that you have to backtrack into an actual value.

Separately, you need a system to record client events, and you submit the experiment treatments to it as tags. Then you can correlate treatments to behavior.

Speed

One complaint you might have is that this approach always fires every rule in sequence, and that's slow. The Rete algorithm is used in a wide variety of rule engines and is faster than naive reevaluation, so we should use that here, right?

Wrong. The Rete algorithm is complex and requires us to build up a large data structure. That data structure is used when a small portion of the input changes, letting me avoid recalculating the whole result.

In my case, I'm getting a series of new configurations, and each one is unrelated to the last. I might get a call for one collection of rules and then not get a call for it in the next hour. Or a rule might throw an error and leave the Rete data structure in an invalid state. Or I might have to abort processing, again leaving the data structure in an invalid state.

Future directions

The main target here is to look at what people are doing and try to provide more convenient ways of doing it.

We also want to provide the ability to mark portions of metadata as private information, to be redacted from our logs.

IP geolocation would be handy, allowing us to tell people where the client is located rather than relying on the client to self-report. We can grab a country-level GeoIP database for $25/month, city-level for $100/month. This would be strictly opt-in, possibly with an additional fee.

Finally, we have to turn this into a proper service. Slap a REST API in front of it, add in HMAC authentication and API usage reporting, service health metrics, and load balancers.

That concludes our short on creating an experiment system.

Configuration as a service

I’m working on a rule engine targeted toward configuration-as-a-service and experiment configuration. Since it’s nontrivial and not much exists in this space, I thought I’d talk about it here for a bit.

Configuration as a service? Huh?

There are a few things this can be used for.

Recall when Google wanted to test out 41 different shades of blue for search result links? They used an experiment system to enroll randomized segments of the userbase into each treatment. That’s one use case we want to support.

Let’s say I’m implementing a phone app and it’s got a new feature that I want to get out as soon as possible. I need to QA it on each device, but I’m pretty sure it’ll just work. So I ship my update, but I keep the feature off by default. Then I add a rule to my configuration service to turn it on for the devices I’ve QA’ed it on. As I finish QA on a given device, I update the rule to turn the feature on for that device.

Or maybe I need to take legal steps in order to provide a feature in a given country. The client sends its location, and I’ve added rules to determine if that location is one where I can legally enable that feature. It might also include, for instance, which of my API endpoints it should use to store any server-side data — some countries require user data to remain in EU borders.

What are we implementing?

We want to offer a multitenant service so that you can pay us a bit of money and get our glorious configuration service.

You will submit JSON metadata to us and get JSON configuration back. You will enter in rules in a UI; we’ll execute those rules against the metadata to get your configuration. The rule UI will let you say: this rule comes into effect on this date, stops on that date; it’s got this priority; let’s test it against this sample configuration… Not too complex, but some complexity, because real people need it.

There are two basic parts: first, a service to execute rules; then, a website to manage rules. In between we have a rule engine.

Any significant caveats?

We’re running a configuration / experimentation service. We want third parties to use it. That means security.

We need to prevent you from calling System.exit() in the middle of your rules and bringing down our service. All that normal, lovely sandboxing stuff. Timeouts, too.

Also, you’re updating your rules pretty frequently. We need to be able to reload them on the fly.

Rules are code, and code can have bugs. We’ll have to watch for thrown exceptions and report them.

What’s already out there?

Drools

The heavy hitter, Drools has been around since the dinosaurs roamed the earth. It’s not easy to work with. It takes way too much code to initialize it, and most of thath code is creating sessions and factories and builders and containers that have no discernable purpose. If you try to read the code to figure out what it all means, prepare for disappointment: it’s a snarl of interfaces and fields set via dependency injection and implementations in separate repositories.

Drools rules accept Java objects and produce output by mutating their inputs. That means I need a real Java class for input and another for output. Their rule workbench lets you create your Java classes, but that means you need to publish your project to Maven. And loading multiple versions of a rule is an exercise in pain.

On the plus side, it gives you a rule workbench out of the box, and it has a reasonable security story. However, it doesn’t have any way to limit execution time that I’ve found, meaning you have to run rules in a separate thread and kill them if they take too long. This isn’t nice.

Easy Rules

The new kid on the block, it uses Java as a rule language, which brings us to JAR hell like Drools. Unfortunately, it doesn’t supply a workbench, it doesn’t offer a way to provide inputs and retrieve outputs, and it doesn’t have any sandboxing or time limits. At least the code is relatively straightforward to navigate.

Everyone else

OpenRules is based on Excel. Let’s not go there.

N-Cube uses Groovy as a DSL, which implies compiling to a JAR. It’s also got almost no documentation.

There are several others that haven’t been updated since 2008.

So they all suck?

No. They’re built for people who want to deploy a set of rules for their application within their application. They’re for people who trust the people writing business rules. We are building a service whose sole purpose is to supply a rule engine, where untrusted people are executing code.

When you are building a service specifically for one task, you shouldn’t be surprised when off-the-shelf components don’t cut it.

When you are building a multitenant service, libraries performing similar tasks often fall short of your needs.

What do we do?

The core thing that our service does is run user code. Let’s bring in a scripting engine. And since we’re going to accept JSON and emit JSON, let’s use a language that makes that natural. Let’s use Javascript.

The Rhino scripting engine makes it easy to run code and easy to filter which classes a script is allowed to use. Let’s just use that. Now we accept a rule from a user, wrap it in a light bit of code, and run it:

// we inject inputString as the raw json string
var input = JSON.parse(inputString);
var output = {};
// insert user code here

When we want to run it, we can just write:

Context ctx = Context.enter();
ctx.setClassShutter(name -> {
	// forbid it from accessing any java objects
	// (as a practical matter, I probably want to allow a JsonObject implementation)
	return false;
});
if (rule.compiledScript == null) {
	compile(rule);
}
Scriptable scope = ctx.initStandardObjects();
scope.put("inputString", scope, Context.toObject(inputString, scope));
rule.compiledScript.exec(ctx, scope);
response.write(scope.get("output", scope));

That’s not the whole story — we want to limit the amount of time it has to finish executing, set up logging and helper functions, all that jazz. We need to locate the rule somehow. We probably have multiple rules to run, and we have to propagate partial output objects between them (or merge them after). We also have to determine what order they should run in.

But, for what this does, it’s maybe half as much code as Drools takes.

What’s so much better about your approach?

The first huge advantage is that I’m using a scripting engine, one that doesn’t shove a bunch of classes into the global classloader. That means I can update everything on the fly. I’d get the same if I made Drools talk JSON, but that’s harder than writing my own engine.

Compared to Drools or EasyRules, I don’t have to maintain a build server and figure out how to build and package a java project I generate for each rule. I just shove some text into a database.

Javascript handles JSON objects quite well, which means not having to create a Java class for every input and output. That is the largest part of savings — Drools would be acceptable if it could talk JSON.

The people writing these rules are likely to be developers, not managers or analysts. They probably know Javascript, or can fake it pretty well.

What’s the catch?

Drools is huge and complex for three reasons.

First, it had significant development going on in an age when huge complex coding was de rigeur in Java.

Second, it had a separation between API and implementation enforced for historical and practical reasons.

And third, it solves complex problems.

You want your rules to just work. Drools has a lot of thought behind it to determine what “just working” should look like and make sure it happens. We haven’t put in that thought. I think the naive approach is pretty close to the intuitive result, but I haven’t verified that.

The rules accept and generate JSON. This means you lose type safety. On the other hand, the API accepts and generates JSON anyway, so this is pushing things a step further. Not great, but not the end of the world.

Javascript is kind of ugly, and we’re promoting its use. It’s going to be a bit crufty and verbose at times. The point of business rules in the Drools language or what-not is so that managers can read the rules, and we’re kind of missing that.

What do these rules look like?

An example rule:

if (input.device.name == 'bacon') {
	output.message = 'Congrats on your OnePlus One!';
}
if (input.device.name == 'bullhead') {
	output.message = 'Congrats on your Nexus 5X!';
}
if (input.device.uptime > 31 * 24 * 60 * 60) {
	output.sideMessage = "It's been a month. You might want to reboot your phone.";
}
output.homeScreenTreatment = Treatments.choose(
	'homeScreenTreatment',
	input.userId,
	{
		control:  {value: {backgroundColor: 'black'}},
		grayBg:   {percent: 5, value: {backgroundColor: 'gray'}},
		grayBold: {percent: 5, value: {backgroundColor: 'gray', bold: true}}
	}
);

I’ll talk a bit more about the experiment side next time.

Why I’m not in the D community

D is a great programming language in many ways. It’s got a host of features to make your life easier. It’s got syntax that’s familiar to anyone who knows Java, which is almost every programmer these days. It does away with a lot of cruft by making the syntax lighter and by making reasonable assumptions for you.

On the library front, they took the concept behind LINQ and kicked it up to eleven. It’s pretty awesome overall. There’s a working coroutine implementation, and it’s pretty efficient, plus you can subclass the Fiber class and provide your own scheduler. The standard library is mostly okay, missing some things you’d expect it to have. There’s a package manager, but it’s pretty new. There’s no corporate support for anything, though — no AWS client, no Google API client, no first-party datastore drivers, nothing. So get used to writing your own stuff.

Still, on the whole, it’s a reasonable option for some use cases, and I’ve been working off and on to create a MUD in D.

But I’m leaving the newsgroup, I’m not going to report any bugs, and I’m staying off the IRC channel. And I’m probably never going back.

Why? Because D’s community is garbage.

If you want a programming language to gain adoption, you need to make it friendly to novices. You need to make it easy to learn. You need a standard library with good documentation. You don’t have to change the features that your language exposes, necessarily, but you do need to provide the resources people need in order to start using the language.

Hardly a day goes by without people on the newsgroup expressing or implying a strange sort of pride in how obtuse D is to learn, or how the documentation isn’t easy to understand quickly. When people point out problems, there is always someone eager to pipe up that it isn’t a problem because they managed to learn it, or it’s okay that something is presented in entirely the wrong way because the data that’s shown is data that needs to be available.

Say something needs to be improved and people will derisively ask “Where’s your pull request?”

This isn’t a good attitude to have.

To be clear, this isn’t everyone. It’s maybe one in ten. Walter and Andrei, most importantly, don’t do this. But they do nothing to stop it.

So I will use D, when it’s appropriate. I will even release open source projects in D. But I won’t join in the wider community.