Category Archives: development

Mirror, mirror on the wall — who’s the scrappiest scraper of all?

I implemented Silvermirror, a website mirroring tool. It’s got some interesting aspects that make it better than wget in some cases. Let’s dive in!

Continue reading

MUD life: time

We’ve been talking about procedural content generation for a MUD, but it would be rather sad to have a procedurally generated static MUD. A place where nothing ever changes.

If we want to talk about change over time, we need a notion of time.

Continue reading

The Atlantis browser concept review

My reaction to this is a giant NO.

Extracting a browser kernel that runs services on top, that’s great. The rest, not so much. Right now, I have to trust Apple or the KDE devs or Google or Opera or Microsoft that my browser isn’t spying on me. Then I use RequestPolicy and Privacy Badger and sometimes NoScript to ensure the page itself isn’t doing bad things. But with Atlantis, I have to trust that every single website that contains anything I care about has done research on every component it depends on. And I already distrust these websites, by and large.

The examples didn’t have any security on them. They referenced compilers and stuff via unencrypted links. No checksums. No signing. So my bank does due diligence and determines that the parser and renderer and compiler it’s using are secure — but I get MITM’d and all my bank details are siphoned off to Russian hackers. Or the bank uses HTTPS and the server hosting the parser and renderer gets hacked and I get sad.

So instead my bank hosts everything itself. And there’s a bug in the RPC handler that allows arbitrary code execution. I know about it, upstream knows about it, there’s a fix released…but my bank is still using a three year old version and the Russian hackers are starting to feel sorry for me.

Fun, yes?

Security fixes

Fortunately, the security story is moderately straightforward. We have a central repository of trusted services. You can request specific versions of a service, but the browser doesn’t guarantee that it will honor your version request. For instance, if you request mshtml11.3.7, the browser might give you mshtml11.3.12. In point of fact, we’ll only support major.minor version requests; you always get the most recent patch version.

A service build is automatically retired after eighteen months to mitigate security risks. This is why the browser won’t always honor your version requests. You might have asked for mshtml4.0, but nobody’s been maintaining that for a decade and more, so the browser will give you the nearest equivalent.

Since we’re using a silo for trusted services, we can use a bunch of things like signed builds and certificate pinning to reduce your ability to muck about with my trusted resources.

Finally, Atlantis internally has an RPC mechanism defined. You can post arbitrary data to arbitrary pages. That’s a problem. You need a way to lock that down. Without a means of restricting it, I can construct a page that will fuzz your other open tabs. Perhaps you require a handle to a page in order to send RPCs, and the only way of getting a page ID is by opening it (or receiving the ID via an RPC). Perhaps there are named RPC channels that a page must enroll in, and the browser automatically drops RPCs that aren’t supported.

Privacy

One big thing that web devs tend to want is analytics. I’m not so keen on being tracked. It’s straightforward in Firefox to reduce tracking: suppress the HTTP referer header, add a pre-request hook that will disallow cross-origin requests contrary to a defined policy, and delete cookies on schedule. Maybe pass the Do Not Track header too.

How do I do that in Atlantis?

I have to trust Atlantis to pass the referer header, then I have to use a local proxy for everything it does. That works okay for HTTP, but it doesn’t work with HTTPS. With HTTPS, the referer header is encrypted and my proxy can’t see it. Or my proxy needs to have a certificate that the browser implicitly trusts for every site, and I have to disable certificate pinning in the browser.

This goes into the general realm of promoting user agency vs promoting developer agency.

Technical aspects

Atlantis uses abstract syntax trees as the basis of everything.

Abstract syntax trees let you abstract over different programming languages a little. Not much. You’re stuck with one set of semantics. It’s like trying to implement C on the JVM — you just can’t do it. You can do some of it, but you don’t get pointers and you don’t get structs, so you’ll end up writing a virtual machine on top of the JVM to support those.

So that’s a constraint that limits you a lot. The obvious alternative is LLVM bytecode. Google Chrome’s NaCL, if I recall correctly, uses LLVM, so it’s possible to sandbox it.

The other problem I have with the version presented is the rendering system. It builds a bitmap image and sends it off to the browser. I’m not sure how that will work for accessibility. I’m not sure how to highlight text and copy it that way. It’s good enough for research, not good enough for real life. And if you punt this to the service developer, 95% of them will ignore accessibility entirely, and 30% of them will forget about copy/paste.

What’s good?

If you split up a browser into multiple services running on a base, things get nicer on the technical side.

The core of the browser can just be a few process-oriented APIs, a series of drawing primitives, IO, etc. That’s simple enough to implement on its own.

Independent parties can develop services to implement, say, SVG or MathJAX. And with agreed-upon APIs, I can use the same service on IE and Firefox. This is good for web standards — they can be implemented more quickly, and it’s easier to track down the source of incompatibilities when you can insert the W3C version of MathJAX into Firefox, observe how it renders, and then swap out Trident for Gecko to see if there’s a bad interaction between Gecko and W3C:MathJAX that’s messing up the output.

Then I can implement browser addons as services that the user brings in. For special purposes, when the user allows, pages can do nonstandard things too, implementing their own services. For instance, the Elm programming language provides a moderately different layout system that tends to be pixel-based. (The relatively recent html package offers access to DOM, but the older stuff doesn’t.) That could be implemented as a new rendering service. Or if we find a way to provide sandboxed GPU access, we could get a Unity3D service. Or with DRM, a page could supply a service that converts encrypted audio to WAV.

There’s a lot of possibility here. And I’m sure that James Mickens has considered some of it. A one-hour talk isn’t the best for conveying the full depth of your vision. I’m excited to see his continuing work.

Human experimentation for fun and profit

I want to experiment on my users. How do I do it?

Yesterday I talked about creating a configuration service. We’re going to leverage that. An experiment is just a configuration rule that’s sharded among your userbase.

But is it that simple? Usually not. Let’s dive in!

Choosing a treatment

Iacta alea est

The easiest way to go is to just toss the dice.

You define your treatments and their percentages and roll 1d100. The user gets into whatever treatment corresponds to the value on the die. For instance:

function getTreatment(treatments, control) {
	var value = Math.random() * 100;
	for (var i = 0; i < treatments.length; i++) {
		value -= treatments[i].percent;
		if (value < 0) {
			return treatments[i].value;
		}
	}
	return control;
}

What's this good for? Things where you're okay with changing behavior between requests. Things where your users don't need consistency. Probably where your users won't notice a lot. Like Google's 41 shades of blue.

Introduce a discriminator

So you determined you want each user to have a consistent experience. Once they enter an experiment, they're in it until the experiment finishes. How do we do that?

The simplest way is to introduce a pivot value, something unique to the user:

function toHash(str) {
	var hash = 1;
	for (var i = 0; i < str.length; i++) {
		hash = hash * 33 + str.charCodeAt(i);
	}
	return hash;
}

function getTreatment(pivot, treatments, control) {
	var value = pivot % 100;
	for (var i = 0; i < treatments.length; i++) {
		value -= treatments[i].percent;
		if (value <= 0) {
			return treatments[i];
		}
	}
	return control;
}

config.treatment = getTreatment(toHash(user.email), treatments, control);

What's great about this? It's simple, that's pretty much it.

What's terrible about it? The same users get the first treatment in every experiment. If you want to roll out updates to 1% of your users at a time, the same person always gets the least tested, bleeding edge stuff every time. That's not so nice, and it opens you up to luck effects much more.

The victorious solution

Quite simple: instead of basing your pivot only on the user, you base it on the user and the experiment. For instance:

var experiment = 'home screen titlebar style - 2016-06-12';
var pivot = toHash(user.email + experiment);
config.treatment = getTreatment(pivot, treatments, control);

This effectively randomizes your position between experiments but keeps it consistent for each experiment. We'll have to adjust the API to make it easier and more obvious how to do the right thing:

function getTreatment(userId, experimentId, treatments, control) { ... }

Dependencies

You will often have several simultaneous experiments. Sometimes you'll need a person to be enrolled in one specific experimental treatment for another experiment to even make sense. How do we do this?

First off, we'll adjust our treatment API so that, instead of an array of treatments, you send a JS object:

var homeScreenTreatments = {
	control: {value: {bgColor: 'black', fontSize: 10, bold: true}},
	t1: {value: {bgColor: 'black', fontSize: 12, bold: false}},
	t2: {value: {bgColor: 'gray', fontSize: 10, bold: true}}
};

Next, we'll stash our treatment decisions in the framework (in a new cache for each script run). Then we'll let you query that later. For instance:

var homeScreenExp = 'home screen titlebar style';
config.homeScreen = getTreatment(
	user.email,	homeScreenExp,	homeScreenTreatments);
// 50 lines later...
if (hasTreatment(homeScreenExp, 't2')) {
	config.fullNightModeEnabled = false;
}

We can alternatively bake experiments into the rule infrastructure, for instance, where a rule can specify a config section it supplies, treatments, and percentages. This will end up with a complex UI that does 90% of what users need in an inflexible way, but that's going to be troublesome.

However, what we want to do is store a collection of experimental treatments on the config object. We'll get into that later, but it looks like:

config.experiments = {
	'home screen titlebar style': 't2',
	'wake up message': 't5'
};

Incremental releases

Another common thing people want to do is roll out new features gradually. Sometimes I want to roll it out to fixed percentages of my users at fixed times. One option is to introduce a "rule series", which is a collection of rules, each with a start and end date. No two rules are allowed to overlap.

So I set up a rule series "roll-out-voice-search" with a simple set of rules:

// in the UI, I set this rule to be effective 2016-06-10 to 2016-06-15
config.voiceSearchEnabled = getTreatment(
	user.email,
	'roll-out-voice-search',
	{
		control: {value: false},
		enabled: {value: true, percent: 1}
	});

And I make a couple more rules, for 10%, 50%, and 100%, effective in adjacent date ranges.

But this is a common pattern. So we can simplify it:

config.voiceSearchEnabled = gradualRollout({
	user: user.email,
	rollout: 'roll-out-voice-search',
	start: '2016-06-10',
	finish: '2016-06-25',
	enabled: {value: true},
	disabled: {value: false}
});

And we can very easily interpret that to a linear rollout over the course of fifteen days based on the user's email address.

Metrics

You don't just assign experiment treatments to people and forget about it. You want to track things. And that means the client needs to know your entire configuration. But the entire configuration is sometimes obtuse to work with. So you want to see experimental treatments directly, by name, not as a bunch of configuration values that you have to backtrack into an actual value.

Separately, you need a system to record client events, and you submit the experiment treatments to it as tags. Then you can correlate treatments to behavior.

Speed

One complaint you might have is that this approach always fires every rule in sequence, and that's slow. The Rete algorithm is used in a wide variety of rule engines and is faster than naive reevaluation, so we should use that here, right?

Wrong. The Rete algorithm is complex and requires us to build up a large data structure. That data structure is used when a small portion of the input changes, letting me avoid recalculating the whole result.

In my case, I'm getting a series of new configurations, and each one is unrelated to the last. I might get a call for one collection of rules and then not get a call for it in the next hour. Or a rule might throw an error and leave the Rete data structure in an invalid state. Or I might have to abort processing, again leaving the data structure in an invalid state.

Future directions

The main target here is to look at what people are doing and try to provide more convenient ways of doing it.

We also want to provide the ability to mark portions of metadata as private information, to be redacted from our logs.

IP geolocation would be handy, allowing us to tell people where the client is located rather than relying on the client to self-report. We can grab a country-level GeoIP database for $25/month, city-level for $100/month. This would be strictly opt-in, possibly with an additional fee.

Finally, we have to turn this into a proper service. Slap a REST API in front of it, add in HMAC authentication and API usage reporting, service health metrics, and load balancers.

That concludes our short on creating an experiment system.

Configuration as a service

I’m working on a rule engine targeted toward configuration-as-a-service and experiment configuration. Since it’s nontrivial and not much exists in this space, I thought I’d talk about it here for a bit.

Configuration as a service? Huh?

There are a few things this can be used for.

Recall when Google wanted to test out 41 different shades of blue for search result links? They used an experiment system to enroll randomized segments of the userbase into each treatment. That’s one use case we want to support.

Let’s say I’m implementing a phone app and it’s got a new feature that I want to get out as soon as possible. I need to QA it on each device, but I’m pretty sure it’ll just work. So I ship my update, but I keep the feature off by default. Then I add a rule to my configuration service to turn it on for the devices I’ve QA’ed it on. As I finish QA on a given device, I update the rule to turn the feature on for that device.

Or maybe I need to take legal steps in order to provide a feature in a given country. The client sends its location, and I’ve added rules to determine if that location is one where I can legally enable that feature. It might also include, for instance, which of my API endpoints it should use to store any server-side data — some countries require user data to remain in EU borders.

What are we implementing?

We want to offer a multitenant service so that you can pay us a bit of money and get our glorious configuration service.

You will submit JSON metadata to us and get JSON configuration back. You will enter in rules in a UI; we’ll execute those rules against the metadata to get your configuration. The rule UI will let you say: this rule comes into effect on this date, stops on that date; it’s got this priority; let’s test it against this sample configuration… Not too complex, but some complexity, because real people need it.

There are two basic parts: first, a service to execute rules; then, a website to manage rules. In between we have a rule engine.

Any significant caveats?

We’re running a configuration / experimentation service. We want third parties to use it. That means security.

We need to prevent you from calling System.exit() in the middle of your rules and bringing down our service. All that normal, lovely sandboxing stuff. Timeouts, too.

Also, you’re updating your rules pretty frequently. We need to be able to reload them on the fly.

Rules are code, and code can have bugs. We’ll have to watch for thrown exceptions and report them.

What’s already out there?

Drools

The heavy hitter, Drools has been around since the dinosaurs roamed the earth. It’s not easy to work with. It takes way too much code to initialize it, and most of thath code is creating sessions and factories and builders and containers that have no discernable purpose. If you try to read the code to figure out what it all means, prepare for disappointment: it’s a snarl of interfaces and fields set via dependency injection and implementations in separate repositories.

Drools rules accept Java objects and produce output by mutating their inputs. That means I need a real Java class for input and another for output. Their rule workbench lets you create your Java classes, but that means you need to publish your project to Maven. And loading multiple versions of a rule is an exercise in pain.

On the plus side, it gives you a rule workbench out of the box, and it has a reasonable security story. However, it doesn’t have any way to limit execution time that I’ve found, meaning you have to run rules in a separate thread and kill them if they take too long. This isn’t nice.

Easy Rules

The new kid on the block, it uses Java as a rule language, which brings us to JAR hell like Drools. Unfortunately, it doesn’t supply a workbench, it doesn’t offer a way to provide inputs and retrieve outputs, and it doesn’t have any sandboxing or time limits. At least the code is relatively straightforward to navigate.

Everyone else

OpenRules is based on Excel. Let’s not go there.

N-Cube uses Groovy as a DSL, which implies compiling to a JAR. It’s also got almost no documentation.

There are several others that haven’t been updated since 2008.

So they all suck?

No. They’re built for people who want to deploy a set of rules for their application within their application. They’re for people who trust the people writing business rules. We are building a service whose sole purpose is to supply a rule engine, where untrusted people are executing code.

When you are building a service specifically for one task, you shouldn’t be surprised when off-the-shelf components don’t cut it.

When you are building a multitenant service, libraries performing similar tasks often fall short of your needs.

What do we do?

The core thing that our service does is run user code. Let’s bring in a scripting engine. And since we’re going to accept JSON and emit JSON, let’s use a language that makes that natural. Let’s use Javascript.

The Rhino scripting engine makes it easy to run code and easy to filter which classes a script is allowed to use. Let’s just use that. Now we accept a rule from a user, wrap it in a light bit of code, and run it:

// we inject inputString as the raw json string
var input = JSON.parse(inputString);
var output = {};
// insert user code here

When we want to run it, we can just write:

Context ctx = Context.enter();
ctx.setClassShutter(name -> {
	// forbid it from accessing any java objects
	// (as a practical matter, I probably want to allow a JsonObject implementation)
	return false;
});
if (rule.compiledScript == null) {
	compile(rule);
}
Scriptable scope = ctx.initStandardObjects();
scope.put("inputString", scope, Context.toObject(inputString, scope));
rule.compiledScript.exec(ctx, scope);
response.write(scope.get("output", scope));

That’s not the whole story — we want to limit the amount of time it has to finish executing, set up logging and helper functions, all that jazz. We need to locate the rule somehow. We probably have multiple rules to run, and we have to propagate partial output objects between them (or merge them after). We also have to determine what order they should run in.

But, for what this does, it’s maybe half as much code as Drools takes.

What’s so much better about your approach?

The first huge advantage is that I’m using a scripting engine, one that doesn’t shove a bunch of classes into the global classloader. That means I can update everything on the fly. I’d get the same if I made Drools talk JSON, but that’s harder than writing my own engine.

Compared to Drools or EasyRules, I don’t have to maintain a build server and figure out how to build and package a java project I generate for each rule. I just shove some text into a database.

Javascript handles JSON objects quite well, which means not having to create a Java class for every input and output. That is the largest part of savings — Drools would be acceptable if it could talk JSON.

The people writing these rules are likely to be developers, not managers or analysts. They probably know Javascript, or can fake it pretty well.

What’s the catch?

Drools is huge and complex for three reasons.

First, it had significant development going on in an age when huge complex coding was de rigeur in Java.

Second, it had a separation between API and implementation enforced for historical and practical reasons.

And third, it solves complex problems.

You want your rules to just work. Drools has a lot of thought behind it to determine what “just working” should look like and make sure it happens. We haven’t put in that thought. I think the naive approach is pretty close to the intuitive result, but I haven’t verified that.

The rules accept and generate JSON. This means you lose type safety. On the other hand, the API accepts and generates JSON anyway, so this is pushing things a step further. Not great, but not the end of the world.

Javascript is kind of ugly, and we’re promoting its use. It’s going to be a bit crufty and verbose at times. The point of business rules in the Drools language or what-not is so that managers can read the rules, and we’re kind of missing that.

What do these rules look like?

An example rule:

if (input.device.name == 'bacon') {
	output.message = 'Congrats on your OnePlus One!';
}
if (input.device.name == 'bullhead') {
	output.message = 'Congrats on your Nexus 5X!';
}
if (input.device.uptime > 31 * 24 * 60 * 60) {
	output.sideMessage = "It's been a month. You might want to reboot your phone.";
}
output.homeScreenTreatment = Treatments.choose(
	'homeScreenTreatment',
	input.userId,
	{
		control:  {value: {backgroundColor: 'black'}},
		grayBg:   {percent: 5, value: {backgroundColor: 'gray'}},
		grayBold: {percent: 5, value: {backgroundColor: 'gray', bold: true}}
	}
);

I’ll talk a bit more about the experiment side next time.

Why I’m not in the D community

D is a great programming language in many ways. It’s got a host of features to make your life easier. It’s got syntax that’s familiar to anyone who knows Java, which is almost every programmer these days. It does away with a lot of cruft by making the syntax lighter and by making reasonable assumptions for you.

On the library front, they took the concept behind LINQ and kicked it up to eleven. It’s pretty awesome overall. There’s a working coroutine implementation, and it’s pretty efficient, plus you can subclass the Fiber class and provide your own scheduler. The standard library is mostly okay, missing some things you’d expect it to have. There’s a package manager, but it’s pretty new. There’s no corporate support for anything, though — no AWS client, no Google API client, no first-party datastore drivers, nothing. So get used to writing your own stuff.

Still, on the whole, it’s a reasonable option for some use cases, and I’ve been working off and on to create a MUD in D.

But I’m leaving the newsgroup, I’m not going to report any bugs, and I’m staying off the IRC channel. And I’m probably never going back.

Why? Because D’s community is garbage.

If you want a programming language to gain adoption, you need to make it friendly to novices. You need to make it easy to learn. You need a standard library with good documentation. You don’t have to change the features that your language exposes, necessarily, but you do need to provide the resources people need in order to start using the language.

Hardly a day goes by without people on the newsgroup expressing or implying a strange sort of pride in how obtuse D is to learn, or how the documentation isn’t easy to understand quickly. When people point out problems, there is always someone eager to pipe up that it isn’t a problem because they managed to learn it, or it’s okay that something is presented in entirely the wrong way because the data that’s shown is data that needs to be available.

Say something needs to be improved and people will derisively ask “Where’s your pull request?”

This isn’t a good attitude to have.

To be clear, this isn’t everyone. It’s maybe one in ten. Walter and Andrei, most importantly, don’t do this. But they do nothing to stop it.

So I will use D, when it’s appropriate. I will even release open source projects in D. But I won’t join in the wider community.

A Return to Go

I’ve switched jobs and am using even more Go. While I previously talked about Go, it was a while ago, and I was using it inside Google, with a vastly different build system than is inflicted on the wild. So I have a new perspective on it, and I’m updating my opinion.

Concurrency

Concurrency is why you write code in Go rather than any other language, right? It’s Go’s shining feature. Aside from goroutines, you’re pretty much left with C with a facelift and garbage collection, and that’s probably not the thing you most want for application or service development.

Go’s not concurrent. It’s as concurrent as Node.js, just you don’t need to structure your code with callbacks. That’s it.

In Java, I have true concurrency. I’m not saying this because of OS threads versus cooperative multitasking. No, the problem with Go’s concurrency is that it’s entirely hidden from you. Java actually gives you thread objects. It lets you cancel the execution of a thread and check on its status. And that’s what I need, much of the time.

In my spare time, I’m writing a MUD. This involves a ton of AI routines, scripts, and user input handlers, each of which is easier to write as a sequential operation. So I want some sort of multitasking, and since I’m estimating a huge MUD world might need over half a million threads, OS threads won’t do. Can I use goroutines?

No.

I need a reliable system. That means a process that checks over each scripted object to ensure it’s actually running its script. How do I do that in Go? …well, I have zero access to the default scheduler, so I can’t ask it for a list of running goroutines. I can’t get a goroutine object and ask it whether it’s running. I could use a waitgroup for each scripted object and deferred execution so that when that object’s goroutine exits for whatever reason I can see it, which is slightly annoying and has to be repeated everywhere.

I need actions to happen on schedule. I can handle the whole MUD being 50ms slow for one scheduling tick (which is planned to be 250ms); I could handle one script being slightly late for one tick; but I can’t handle long-term clock drift. Also, the order of execution might be dictated by game rules — players always go first, ordered by their speed stat; NPCs go second, ordered similarly; items and rooms go last in arbitrary order. This is much easier to handle if I write my own scheduler.

I need to suspend tasks. I wrote a script for an NPC sailor to wander around, singing and quaffing ale, but halfway through, a player attacks her. I need to be able to suspend this singing and quaffing task to handle combat. In Go’s world, I need to check whether the NPC is currently in combat after every yield. This is unusable.

What language does this stuff right? Well, the best I’ve seen is D. Fibers in D are much more of an afterthought than in Go, yet in D I can write my own scheduler, get a reference to a Fiber object to check on its status, and even cancel further execution of a fiber all in the standard library.

What if you’re stuck with Java? Well, most of the time, you aren’t manipulating shared state anyway. You need to ensure that your database library is threadsafe or just instantiate a new adapter with each task, and probably similarly with a couple other things, but you can pretty much ignore the fact that things are running in multiple threads and be okay 95% of the time. Just use a threaded ExecutorService and be done.

Type system

I thought the type system was a bit anemic before. Now I view it as an enemy.

Interfaces are not met by accident. They are planned. People write code to match an interface. Go doesn’t realize that. There is no way to say to the language: here’s this type, and oh by the way, just ensure for me that it matches the io.Reader interface. So you get compilation errors at call sites because the type doesn’t match the interface you designed it to match. This is the opposite of what I want.

There is no virtual dispatch. People do not use interfaces by default. They use concrete types by default. This means testing is ugly. I end up having to write interfaces for other people’s code a lot.

Covariant return types are not allowed. Interfaces operate on exact match only. I wrote an interface for a Redis client for testing. Then I realized that I couldn’t instantiate the return type for one of the methods with sensible values — it had private fields with public getters. So I had to write a wrapper struct for the Redis client that simply forwarded the relevant method but had a slightly different return type. (It actually isn’t possible, given the language, to solve this problem in a sensible manner. That doesn’t mean it’s less painful or that Rob Pike is any less at fault; it just means he messed up earlier and it only became apparent here.)

Syntax and parsing

Go’s syntax looks a bit funky at first. And then, eventually, it hits you: this language was not designed to make it easy for you to read and write it. It isn’t designed to make it fast for you to understand what’s written. It’s instead designed to reduce the amount of lookahead the compiler has to do, to simplify the amount of work parsing takes.

Why do I have to type “type Foo struct” rather than just “Foo struct”? The latter is consistent with the little-endian nature of Go, where types follow variables. But if you had “Foo struct” and “bar func”, that would increase the amount of lookahead the compiler had to do. Similarly, with functions, Go could have followed a strategy similar to the C family of syntaxes. But that would require more lookahead to implement.

It’s certainly not to help me read things faster. Remove the “func” and “type” keywords and I can read code just as fast. I can write it slightly faster. It’s only for the compiler’s benefit that I have to write these keywords.

This is backwards. This is perverse. A team of five to ten individuals decided they wanted to do slightly less work, so everyone else has to do more work. We pay people big money to spend more effort so a lot of people can do slightly less, and we think that’s valuable. We think it’s a good tradeoff. But here we get the exact opposite treatment and people seem to love it. I don’t understand.

There are other problems I have with the syntax. The compiler requires a := to create and initialize a new variable, while it just uses = for an assignment to an existing variable. There’s a special variable, _, which indicates “throw this value away”.

However, there are two other variable names that you will reuse very, very often: err and ok. err is the default variable name (by the documentation, not by language features) for an error. Many things return errors in addition to something else, and most of the time you’ll write something like val, err := tryGetValue(). It would be awesome if you could use := when reusing at least the ‘err’ variable.

I’m thinking of pre-declaring at least err so I can always use = for it, but I don’t think that would save me thanks to multiple return values.

All in all, this looks like two features that seem great in isolation (different syntax for declaring with initialization versus assignment, added to multiple return values) not working together very well in practice. But I’ve never even seen anyone use multiple return values aside from returning an error with a single value, so…

Also, Go says that all loops are special cases of for loops. You create an infinite loop with for { doStuff() }. You create a while loop with for booleanExpression { doStuff }. This hides programmer intent. Not ideal.

Constant initialization with iota is magic. You can write:


const (
  B = 1 << (iota * 10)
  KB
  MB
  GB
  TB
)

This gives you the constants you would expect given the names. The thing to remember is that iota auto-increments each time you use it, and a constant without an initializer acts as if you had copied and pasted the previous constant’s initializer… The first time I read this sort of code, I had no clue what it meant. (Also, it started with _ = iota, which confused things slightly more.) I thought I’d get sequential values, like every other language, incrementing from the previous given value. Or, if the language were especially clever, equal increments.

Magic is only good for making a programmer feel clever.

(As an aside, I praised D for its concurrency earlier. The standard library contains a lot of code written by someone who likes feeling clever. This means you have to write things like dur!"msecs"(15) rather than a more sensible construct like Duration.fromMillis(15). Even though I don’t have to modify that code, I depend on it, so I have to spend effort to understand an API expressed in templates and metaprogramming rather than simpler constructs.)

Shadowing declarations

We spoke a few moments ago about the problems with multiple assignments. Here’s a kicker: every time you create a new scope (which is, roughly, every time you have a new set of curly braces), you can freely shadow declarations.

What does that mean? Well, let’s take this snippet dealing with Redis:


var cursor int64 = 0
for {
  cursor, keys, err := redis.Scan(cursor, "prefix:*", 10)
  if cursor == 0 {
    break
  }
  // ...
}
if cursor != 0 {
  // We stopped early. Do something special.
}

Redis’s SCAN call takes an input cursor indicating where to start and emits an output cursor indicating where to start next time. Simple, right? Obviously correct. Wrong.

You created a new variable named keys. But you can’t separate that one variable’s new-ness from the other variables. Go assumes that you want to make all the variables anew. So instead of doing the right thing, updating cursor each time through the loop, you get a brand new variable.

Either Go will create and initialize the inner cursor to 0 each time, lift it above the point of declaration, etc; or it will use the cursor variable from above, never updating it. Either way, you’ll process the first set of values over and over again forever.

Conclusions

People mock Javascript for its awfulness, but Go isn’t far behind. Use Go instead of Node.js if you want, but since there are bajillions more Javascript devs than Go devs, you’d be better off using Node.js. Dart’s even significantly more popular than Go, so if you want static typing, that’s an option. (Or you can use TypeScript with Node.js, but you still have to deal with a lot of JS’s oddities.)