The Distortion That is Learning

Ada Lovelace, they were talking about the other day, on the radio. They pointed out that she was the first person–working under Charles Babbage–to describe the computer as more than an adding machine. It could add, of course, but more importantly, it could follow instructions. It was–more than a mere calculator–a decision maker.

I’d been thinking about this the last few months–that the role of a programmer isn’t just to give instructions, but to bestow meaning into the machine. Of the infinitudes of programs accessible to a programmer, they choose the ones which are meaningful. Otherwise, we could just generate programs at random and call it a day.

There seemed to be something profound about this convergence of thoughts–that Lovelace and I had been thinking about the same thing, as if the whole universe pointed to my own thoughts.

That, of course, is absurdly egotistical at best. I’ve seen myself wander into this thought a number of times, though, and I’ve paid more attention as I’ve seen it arise. In some cases, it’s something I’d probably heard before, but at the time, had nothing particularly interesting to do with that fact. With nothing to peg the idea to, it wandered back to hyperuranium. Only when I had some context to apply it to–a probabilistic dimple in my brain etched deep enough to pull the idea in–did the fact seem soon so profound.

It’s a lot like digging through a pile of Legos, wherein the digger develops an ever-changing myopia. With a certain problem at hand, some Legos are extremely prominent, relevant to the problem that needs to be solved at that moment. Others are just noise and join the irrelevant static of the rest of the pile.

As the digger builds, though, the process changes. The experience gained from the process of building–or simply progress in building–changes the needs of the process. What was once a piece of noise is now very valuable, once one sees a fit in what’s being built. The digger’s own perspective, through the learning that’s done through building, becomes distorted from how it had previously been.

The same goes for any other learning process. As one works, one’s apparent needs change, and what once seemed irrelevant can suddenly pop out as a solution. The way one sees the world literally changes as learning occurs; the world, though, is the same as it was.

Actors and Actions

This summer–out of town, meeting many new people–I encountered far more often the unenviable dilemma of explaining my dissertation topic. Unintentionally, though, I turned it into an experiment.

Linguistics: where talking about an experiment becomes another experiment.

Typically, when introducing the topic, I presented a set of verbs, “arrest, search, apprehend, try, convict” and asked what nouns came to mind. Most folks drew a blank. At first I thought it was a fluke, but after a sustained near-0% success rate, and failing so frequently to explain to so many people what I was doing, I got my head out of my ass and admitted that I was explaining wrong.

So instead of giving them verbs and asking what nouns came to mind, I gave them “police and suspect” and asked them what words come to mind. “arrest, search…” It worked like a charm.

It’s easy to think of the actors and the actions associated with them as interchangeable, and then to emphasize the extracted product of the process (Chambers and Jurafsky 2009). After all, that list of verbs is a project result. However, coreference chains–strings of co-referring nouns–are employed at the first step, so it’s more sensible to convey the process nouns-first. Then, in a way, the listener becomes the project, and that’s way more interesting for them and you.

Furthermore, this may signal a need to alter the schema construction process. Verbs are compared to one another, and though their similarity depends on their co-referrent arguments, the choice of comparison depends on grammatical/referent collocations of verbs, not the juxtaposition of two actors. In this direction, the pair of actors I prompted listeners with is similar to those in Balasubramanian et al. 2013, retaining a pairwise relationship between role fillers through the extraction process.

In the end, it’s the nouns I’m interested in. On my 2nd Qualifying Paper, I looked at narratives related to police. Fundamentally, I was interested in what the system told me about police and how they interacted with other argument types: suspects, bystanders, etc. A noun-centric generation process may provide results more suited to this sort of analysis.

A noun-centric process may also improve performance in more challenging domains. I noticed analyzing movie reviews that, while the means of describing films and reviewer sentiment about them varied, particular roles remained constant throughout the domain: the reviewer, the director, characters in a plot synopsis, the film itself. Since that’s where I’m headed, that seems to be the way to think about things.

Synchronous Narratives, Small Data, and Measure Veracity

I’m, at the moment, looking for a particular problem to work on for my dissertation. It feels a bit backwards the way I’m going about it–I know what kind of solution I want to deploy, but I’m looking for a problem to solve with it. It’s a bit like running around the house with a hammer, looking for nails to hit, or running around with a new saw, cutting up wood into bits for the hell of it. The danger is that I could end up cutting all my wood up into tiny shavings, having had a blast with the saw but finding myself homeless at the end of the day.

My tool in this case isn’t a saw, but the abstraction of narrative schemata. The idea is, using dependency parses and coreference chains, you can extract which verbs are likely to co-occur with a shared referent. For example, arrest, search, and detain often share role fillers of some kind–policesuspect, or something referring to something that is one of those two.

A corpus of news contains all kinds of relationships like those, buried inside the language data itself. Ideally, these represent some sort of shared world knowledge that can be applied to other tasks. To demonstrate that this isn’t mere idealism is what I’m looking to do my dissertation on at the moment.

Back in the spring, I took my first attempt at this, and it went ok. My hypothesis–one of convenience, mostly–didn’t pan out, but there were interesting trends in the data. That resulted in a problem, though; I had two things to sort out: was my hypothesis wrong? Was the measure I used to determine that fact suitable for doing so? There was some minor evidence that the measure was suitable, but nothing conclusive.

Instead, I started sniffing around for other hypotheses–things someone else had already thought of, and that may be demonstrable with narrative schemata as an overlying application. Per my typical procrastination, I stumbled upon a recent article on Salon that critiques national press coverage of Rick Perry, claiming that narratives presented in the national press diverge wildly from those presented in Texas papers.

With an author having shown this qualitatively, it’s ripe for quantitative replication. It would make a great experiment for showing the veracity of whatever measure I end up devising.

The difficulty comes in with corpus building. There isn’t a corpus of these texts lying around. I’d have to dig them up myself, from numerous scattered sources. Additionally, the number of sources is likely to be limited. I may be able to obtain a few hundred articles if I’m relentless. Prior work on schemata began with millions of articles. The robustness of the approach may be questionable, in this case.

Of course, the difference in size may be the source of an interesting result in and of itself, but it’s not what I’d set out to demonstrate when searching for a problem that demonstrates the veracity of my measure.