Twitter Takes Characters, Gives Access in Return

I avoided Twitter for a long time because I’m long-winded. “144 Characters? How on Earth could I get my point across?!?” I like to set the context for what I’m talking about and make my points clear.

When everyone gets to do that, though, the result is a lot of stuff to sift through–like the entire rest of the Internet. This blog, for example, is some obscure, long-winded fringe colony. Few visit, and those who do don’t take the time to read it. Hell, I wouldn’t.

By limiting the complete freedom of speech to 144 characters, though, everyone gets that much time to get their point across. Even if they break their post into pieces, they still have to make a conscious decision to do so. It’s a lot of work to string together one of those multi-posts.

The end result is increased access. Because a lot of noise is stripped down, it allows for one to sift through what’s interesting quickly and get to the point. It also cuts down on the influence of noisier individuals and allows better access to people who get large streams of stuff sent to them regularly, like celebrities. Folks with whom I’d never have had a chance to interact might get to see–because they have less to sift through–some clever shit I came up with.


Of course, you can’t get every point across on such a short-winded platform, but you’re not supposed to. That’s what other platforms are for, like this one. As for a public forum that provides access to those who would otherwise be out of reach, Twitter is amazing.

Arranging Fundamentals

A friend of mine who does some work I shouldn’t talk about–DC living for you–once explained something that struck me as odd. If you have two unclassified documents, document A and document B. Staple document A and B together–now you may have a classified document. Just the juxtaposition of two pieces of information is information, enough to change the status of the documents.

This was so striking, in fact, I dwelled on it for a bit, and began to realize, that is literally what I do as a computational linguist–re-arrange strings in meaningful ways.

Once you start thinking about this, it appears in a lot of things, aside from the arrangement of textual information. Geographic location is just this in action: real estate is valued by what real estate is next to it; if I own a car on a different continent, that car is essentially worthless unless I’m on that continent; I’m the Emperor of the Moon of the Wholly Circumferential Lunar Empire, yet they refuse to give me a diplomat plate.

Why should information feel so different? After all, re-arranging information is fundamentally what you learn to do in school, and I’ve been in school for 21 years.

I suppose it’s just that. Especially having studied physics, one learns to boil any problem down to its principle components, down to the key relationships that apply, and to demonstrate with those relationships how the facts have come to be. You become a master at re-arranging information in the right way, and when you become a master at re-arranging information, re-arranging information feels cheap; knowing the principle components is the key to finding the solution.

I suppose this is why the juxtaposition of two documents as something meaningful is so striking–if you have access to the two documents before, you have access to the principle components. As a master of re-arrangement, nothing else is required.

This couldn’t be further from the truth, though. The arrangement of things does matter, and it’s often incredibly complex. If the fundamental components were all that mattered, then if you memorized this chart of the fundamental particles of matter, you would know everything there is to know about everything.



But knowing this chart, you don’t know everything about everything. Their combinations allow a certain freedom, and how those uncertainties left by that freedom are realized are also interesting.

Those uncertainties are just arrangements, but they’re important. They explain how cats are different from birds, why cruising in the passing lane makes you a complete asshole, and why I keep writing this essay despite having far more pressing shit on my plate. All of these things, in many ways individually, are due to arrangements of arrangements of arrangements of fundamental particles, so far removed that the black box of the atomic nucleus bears little (obvious) bearing on the outcome of their combination, aside from making it possible amongst another infinitude of possibilities.

This line of thinking is common outside of physics. For example, after recent events at UVa, some have argued in favor of shutting down fraternities, indefinitely. Counter-arguments in the comments, however, went along the lines of “well, if you kick them out of the frats, they’re still rapists.” They’re treating the members of the organizations as fundamental, principle components, and are arguing that by dividing the principle components up, you do nothing to negate the evil contained in those components.

There’s a lot of places this could go, but I’ve made my point here, loosely enough. Arrangement is information, and it matters. Fundamental components are good to know–they give space for juxtaposition to happen–but the interesting stuff happens in how things are arranged. It’s why we’re more than quarks.

The Context of the Gold Record

Voyager 1 and 2 were two robotic probes launched into the outer solar system in 1977.  Their timing was impeccable, launched in a rare window where a single vehicle could encounter all four gas giants without requiring any course corrections aside from gravity assists. To this day, Voyager 2’s pictures of Uranus and Neptune are the highest resolution available.


Attached to each probe was a gold record. They contain a variety of images and sounds of Earth, from music to rain, to “hello” in a number of languages. As a kid, when I first heard about this, I took the idea of the messages for granted; aliens could pick it up and know who we are and stuff, maybe drop in and say hello.

The deeper meaning of the records wasn’t abundantly clear until I remembered the context of the 1970’s; it’s striking how different things are now than they were then.

These days, I can pull up any song I want, any time I want. I can rip it, copy it, and aside from the opinions of a few douche-canoes at the RIAA, distribute it to whomever I want at no apparent cost except that of the electricity to run my computer.

At the time, though, music was etched into plastic–not with lasers but with needles–worn down with each precious play. Data was transmitted on physical media. The existence of that physical world was punctuated each waking moment with the impending doom of nuclear war. Carl Sagan–in large part responsible for arranging the record–comes back to the theme of nuclear war throughout his multi-part television series Cosmos, often noting the precarious position of humanity in the 20th century. Impending nuclear war comes across as quaint–at least it did when I first was dwelling on this–but it was a possible ending to his–and our–world.

Sagan and others suggested future humans may be able to pick up the gold record. Aside from being a novel object–a one of a kind artifact–it was unclear why this was meaningful. However, in the context of atomic bombs blowing the living hell out of every city on earth, it was far more obvious. The plastic records of human civilization would have melted away, perhaps further lost through strife of war, revolution, or change.

The gold record carried the idea that some how, by some means, our culture had endured. Something else–distant ancestors or life alien–could pick the sounds of us up, decipher our message, and know us–know what we heard and what we felt, to know Johnny B Goode and “Der Hölle Rache kocht in meinem Herzen.” It wasn’t just a message to aliens, but to the post-apocalyptic humans who would have had no way to know us, with our cities and records reduced to glass and ash. Through the gold records, in some way, even if we failed to create peace on earth and lost ourselves, we had endured in eternal memory of anyone, human or otherwise, that could have come after us.

The Distortion That is Learning

Ada Lovelace, they were talking about the other day, on the radio. They pointed out that she was the first person–working under Charles Babbage–to describe the computer as more than an adding machine. It could add, of course, but more importantly, it could follow instructions. It was–more than a mere calculator–a decision maker.

I’d been thinking about this the last few months–that the role of a programmer isn’t just to give instructions, but to bestow meaning into the machine. Of the infinitudes of programs accessible to a programmer, they choose the ones which are meaningful. Otherwise, we could just generate programs at random and call it a day.

There seemed to be something profound about this convergence of thoughts–that Lovelace and I had been thinking about the same thing, as if the whole universe pointed to my own thoughts.

That, of course, is absurdly egotistical at best. I’ve seen myself wander into this thought a number of times, though, and I’ve paid more attention as I’ve seen it arise. In some cases, it’s something I’d probably heard before, but at the time, had nothing particularly interesting to do with that fact. With nothing to peg the idea to, it wandered back to hyperuranium. Only when I had some context to apply it to–a probabilistic dimple in my brain etched deep enough to pull the idea in–did the fact seem soon so profound.

It’s a lot like digging through a pile of Legos, wherein the digger develops an ever-changing myopia. With a certain problem at hand, some Legos are extremely prominent, relevant to the problem that needs to be solved at that moment. Others are just noise and join the irrelevant static of the rest of the pile.

As the digger builds, though, the process changes. The experience gained from the process of building–or simply progress in building–changes the needs of the process. What was once a piece of noise is now very valuable, once one sees a fit in what’s being built. The digger’s own perspective, through the learning that’s done through building, becomes distorted from how it had previously been.

The same goes for any other learning process. As one works, one’s apparent needs change, and what once seemed irrelevant can suddenly pop out as a solution. The way one sees the world literally changes as learning occurs; the world, though, is the same as it was.

Actors and Actions

This summer–out of town, meeting many new people–I encountered far more often the unenviable dilemma of explaining my dissertation topic. Unintentionally, though, I turned it into an experiment.

Linguistics: where talking about an experiment becomes another experiment.

Typically, when introducing the topic, I presented a set of verbs, “arrest, search, apprehend, try, convict” and asked what nouns came to mind. Most folks drew a blank. At first I thought it was a fluke, but after a sustained near-0% success rate, and failing so frequently to explain to so many people what I was doing, I got my head out of my ass and admitted that I was explaining wrong.

So instead of giving them verbs and asking what nouns came to mind, I gave them “police and suspect” and asked them what words come to mind. “arrest, search…” It worked like a charm.

It’s easy to think of the actors and the actions associated with them as interchangeable, and then to emphasize the extracted product of the process (Chambers and Jurafsky 2009). After all, that list of verbs is a project result. However, coreference chains–strings of co-referring nouns–are employed at the first step, so it’s more sensible to convey the process nouns-first. Then, in a way, the listener becomes the project, and that’s way more interesting for them and you.

Furthermore, this may signal a need to alter the schema construction process. Verbs are compared to one another, and though their similarity depends on their co-referrent arguments, the choice of comparison depends on grammatical/referent collocations of verbs, not the juxtaposition of two actors. In this direction, the pair of actors I prompted listeners with is similar to those in Balasubramanian et al. 2013, retaining a pairwise relationship between role fillers through the extraction process.

In the end, it’s the nouns I’m interested in. On my 2nd Qualifying Paper, I looked at narratives related to police. Fundamentally, I was interested in what the system told me about police and how they interacted with other argument types: suspects, bystanders, etc. A noun-centric generation process may provide results more suited to this sort of analysis.

A noun-centric process may also improve performance in more challenging domains. I noticed analyzing movie reviews that, while the means of describing films and reviewer sentiment about them varied, particular roles remained constant throughout the domain: the reviewer, the director, characters in a plot synopsis, the film itself. Since that’s where I’m headed, that seems to be the way to think about things.

Synchronous Narratives, Small Data, and Measure Veracity

I’m, at the moment, looking for a particular problem to work on for my dissertation. It feels a bit backwards the way I’m going about it–I know what kind of solution I want to deploy, but I’m looking for a problem to solve with it. It’s a bit like running around the house with a hammer, looking for nails to hit, or running around with a new saw, cutting up wood into bits for the hell of it. The danger is that I could end up cutting all my wood up into tiny shavings, having had a blast with the saw but finding myself homeless at the end of the day.

My tool in this case isn’t a saw, but the abstraction of narrative schemata. The idea is, using dependency parses and coreference chains, you can extract which verbs are likely to co-occur with a shared referent. For example, arrest, search, and detain often share role fillers of some kind–policesuspect, or something referring to something that is one of those two.

A corpus of news contains all kinds of relationships like those, buried inside the language data itself. Ideally, these represent some sort of shared world knowledge that can be applied to other tasks. To demonstrate that this isn’t mere idealism is what I’m looking to do my dissertation on at the moment.

Back in the spring, I took my first attempt at this, and it went ok. My hypothesis–one of convenience, mostly–didn’t pan out, but there were interesting trends in the data. That resulted in a problem, though; I had two things to sort out: was my hypothesis wrong? Was the measure I used to determine that fact suitable for doing so? There was some minor evidence that the measure was suitable, but nothing conclusive.

Instead, I started sniffing around for other hypotheses–things someone else had already thought of, and that may be demonstrable with narrative schemata as an overlying application. Per my typical procrastination, I stumbled upon a recent article on Salon that critiques national press coverage of Rick Perry, claiming that narratives presented in the national press diverge wildly from those presented in Texas papers.

With an author having shown this qualitatively, it’s ripe for quantitative replication. It would make a great experiment for showing the veracity of whatever measure I end up devising.

The difficulty comes in with corpus building. There isn’t a corpus of these texts lying around. I’d have to dig them up myself, from numerous scattered sources. Additionally, the number of sources is likely to be limited. I may be able to obtain a few hundred articles if I’m relentless. Prior work on schemata began with millions of articles. The robustness of the approach may be questionable, in this case.

Of course, the difference in size may be the source of an interesting result in and of itself, but it’s not what I’d set out to demonstrate when searching for a problem that demonstrates the veracity of my measure.

Thirteen Years Later

They hate our freedoms.”

There’s little that pisses me off more than this sentence. It’s been used to justify thirteen years of warfare and widespread state surveillance, and it’s complete bullshit.

They hate our freedoms as much as we hate theirs; it’s really got nothing to do with why what happened did. It’s the conception of justice as vengeance. It’s the belief that our way of life is average, normal, and optimal, and imposing it on others is, in a universal sense, justifiable. It neglects centuries of Western powers rampaging through the Middle East in pursuit of religious or economic gain and in the process leaving power vacuums and anger, leading to further intervention, leading to further power vacuums and anger.

No matter who is responsible, the cycle of suffering will continue. While someone over here mumbles “mrrca” repeated, asserting their own patriotic righteousness, just that same mumbling sounds like “Allahu akbar” on the other side of the globe. They both believe in the absolute truth and justice of their own side, and to push on other another reinforces and maintains that sentiment. The equal and opposite reaction is evil, and the push becomes good. The justification is built on a selective compassion for dead compatriots. Death begets death, the slaughter continues, and as a solution, violence allows for one and only one victory condition: suppression or annihilation of the opposition.

So, here we are again, bombing what festered in a power vacuum we created. Nor will this be the last time. Nor will the next time be the last time.

Bombs are expensive. As long as there is someone to drop the bomb on, there are those who sell bombs. Bombs are good for business. Bombs guided by expensive electronics are even better.

George Orwell rolls in his grave.

Broke the Turing Test

There have been reports that some AI “passed” the Turing Test. Let’s delve into this.

First, let’s start with what the Turing Test is, or even who Turing was. Alan Turing established many of the theoretical foundations of modern computing–in the 1940’s. He was largely responsible for hacking German secret codes. He was way ahead of his time–60 years or so.

The Turing Test works like this–if you have some artificial intelligence inside a computer chatting with you and you have some person chatting to you through a computer, can you tell the difference? If you can, the AI has failed the Turing Test. If you can’t, the AI has passed the Turing Test.

So what about this AI?  “…the Eugene Goostman program managed to persuade 33 percent of people that it was a 13-year-old boy from Odessa, Ukraine.” That’s the trick here. First of all, that’s not the highest bar in the world, 33%. If three people examined the system, one of them got duped. Still, though, it’s larger than nothing. I’m waiting on the research paper to see how significant the bar is. Sometimes caveats like this are required in AI research.

However, the real gimmick is the “a 13-year-old boy from Odessa, Ukraine” part. If you can’t make your AI fluent, make your AI simulate someone who isn’t. I don’t think that’s really what Turing intended, but I’d like to congratulate Veselov et al. on finding a loophole in the test. It took 65 years.


~/.ssh/config Noob Problems

I did a computer rebuild a month or two ago, and I couldn’t seem to get my ssh config file to work. I setup some aliases for a few servers I connect to, and nothing would happen when I actually tried to connect. However, if I typed the whole address in from the command line, no problems.

As an example, one of these was called “armstrong.” Turning on verbose mode made the problem clear. When trying to use the alias, ssh tried to connect to a different IP address than the whole written URL.

ssh refuses to use a config file unless the permissions for that file are set appropriately–that is, only if the user who owns the file can read and write to it.

How could that be the problem? I’m the only user on this machine.

But I’m not. When I created the files, I used sudo, because sudo is magic computer sauce that makes everything work. So technically, the ~/.ssh/config file belonged to the root user, not to me, and because of that, ssh refused to use it.

So, ssh is magic sauce. It works pretty good on a lot of things, but for some things, it ruins them.

And don’t forget,


must be made with

vim ~/.ssh/config


sudo vim ~/.ssh/config


Blog blog blah: Introductions

I’ve done quite a bit of blogging in the past; I kept a blog from my Junior year of high school up until recently, a solid eight years. There’s a lot that happened in that time, like growing up, so I decided it’s time to start again fresh.

I’ll be posting fun stuff here–short little research projects that don’t look like they’re publishable, thoughts about politics, music, etc. Basically, whatever is too long for a tweet.