In the class I’ve been teaching this summer, for the last few days, we’ve been using a parsed version of the Donald Trump speech corpus that Ryan McDermott posted to Github a few days ago. One of my students mentioned that Donald Trump had made a speech where he said, quote, “Bing bing, bong bong bong, bing bing.”
I was wondering if this particular speech were actually in the corpus. As a teaching activity, we started searching for instances of /[Bb][io]ng/. I also wanted to see what the parser would do with a string like “bing bong bing bing bong”. There’s a possibility that the parser would assume this is a normal sentence and produce something like:
[NP bing bong] [VP bing [NP bing bong]]
Another student asked why were we doing this–searching for such an obscure, non-sense lexical item, when we could be searching for something that is actually meaningful?
The answer I had, in part, was that it’s not that obscure. As it turns out, these items are quite characteristic of Trump’s speech. In this corpus alone—which lacks the famous original “bing bing, bong bong” speech cited above—it appears 24 times (16 if you remove duplicates), often in clusters of three:
“And that’s what we ended up getting–the king of teleprompters. But, so when I look at these things here I say you know what, it’s so much easier, it would be so nice, just bah, pa, bah, pa, bah, bing, bing, bing. No problems, get off stage, everybody falls asleep and that’s the end of that. But we have to do something about these teleprompters.”
“I hear where they don’t want me to use the hairspray. They want me to use the pump because the other one, which I really like better than going bing, bing, bing, and then it comes out in big globs, right? And then you’re stuck in your hair and you say, ‘Oh my God, I have to take a shower again. My hair’s all screwed up.’ ”
“You know, in the old days everything was better right? The car seats. You’d sit in your car and you want to move forward and back, you press a button. Bing, bing. Now, you have to open up things, press a computer, takes you 15 minutes.”
“You know, when you have so many people running – we had 17 and then they started to drop. Ding. Bing. I love it. I love it.”
“On the budget – I’m really good at these things – economy, budgets. I sort of expected this. On the budget, Trump – this is with 15 people remaining – Trump 51%. Everyone else bing.”
“In Paris, I call him the guy with the dirty filthy hat. Okay? Not a smart guy. A dummy. Puts people in there – mastermind – bing, bing, bing, it’s like shooting everybody. You’ve got to be a mastermind.”
“I was like the establishment. They’d all come to me, and I’d give them all money I write checks sometimes to Senators whatever the max – bing, bing, bing.”
The communicative goals of these tokens could constitute an entire discourse paper, but let’s just stick with the basics now. He seems to use it to indicate some kind of quick, repetitive action. It doesn’t seem to have a particular sentiment associated with it: bribing senators, competitors dropping out of the race, committing mass murder, moving the chair conveniently in a car, being annoyed with pump style hair gels, politicians reading off teleprompters.
It’s undoubtedly characteristic of his speech, though. To say that it’s a mere aberration–something to ignore–is prescriptive. If we look at counts of lemmas throughout the corpus (using SpaCy—a little easier to break out than digging through CoreNLP’s XML), the lemma “bing” appears 11 times, the other 13 times being lemmatized as “be.” In those cases, the lemmatizer assumed “bing” was a VBG, essentially a misspelling of “being.”
Of the whole corpus, compared with all 24 counts of “bing,” Trump said “bing” more often than he said:
- situation: 23
- donor: 21
- dangerous: 21
- migration: 20
- weak: 20
- economic: 19
- freedom: 18
- mexican: 18
- illegally: 14
- muslim: 13
- god: 11
- kasich: 11
- bernardino: 10
- criminal: 9
- hispanic: 9
- chinese: 8
Among many, many other word types. You can get the full list of lemma counts here (when I get around to posting it), though note that “bing” appears at 11 in that list because a lot of the results were merged with “be” erroneously.
To go back to the critical student’s original question, though, it’s a difference in expectations, I suspect. While NLP tools are helpful, they don’t totally address the problem of meaning in text. Meaning is still in large part up to the programmer using the tool, not the tool itself. There’s still a lot of work to be done in that regard, in any application. Sometimes “bing bing bong bong” is really the best we can do.