Via a Tyler Cowen tweet, Robert Fortner writes,
The accuracy of computer speech recognition flat-lined in 2001, before reaching human levels.
Raw computer power is not enough to solve complex problems. That is why predicting that we will have brain emulation when we have as many logic gates in a computer as neurons in a human brain is silly. Five years ago, I noticed a pattern in the over-optimism of Ray Kurzweil’s predictions.
Generally speaking, the more open-ended the problem and the more adaptive that the machine needs to be to provide a solution, the less far along we are in arriving at a technological solution.
READER COMMENTS
Jeremy, Alabama
May 4 2010 at 8:33am
I enjoy Kurzweil’s books, but I agree with you.
Kurzweil’s singularity is not just computation, though. It is biological and nano, too. For instance, he shows that brain scanning resolution is doubling every year, with the prospect that injectable nano machines will take over when non-invasive techniques fall off. Then computers can duplicate the patterns uncovered by high-resolution brain scans. And so-on.
There are troubling signs that progress is becoming linear rather than exponential, e.g. computers going core instead of MHz, heat dissipation problems, unexpected complexities in gene interaction, commercial and legal frictions in deploying products, political frictions reducing risk-taking and investment, international frictions such as Iran or wmd/terror.
Michael Keenan
May 4 2010 at 8:34am
This was posted on reddit, and the top-ranking comment there (with 234 points) reproduced a comment on the blog post by Jeff Foley. He argued that progress is still being made in speech recognition. I’ll reproduce it here:
[Pasted comment elided. The remainder of Foley’s comment can be found on Fortner’s blog at http://robertfortner.posterous.com/the-unrecognized-death-of-speech-recognition#pcomment_commentunit_2423299. Please do not paste entire comments to EconLog that are available elsewhere. A link with a summary or quote are sufficient.–Econlib Ed.]
mattmc
May 4 2010 at 9:49am
I spent a couple of years researching computational simulations of biological neurons and neural networks. Equating a single neuron with a single logic gate is simply wrong. Even a simulation of a single synapse has to account for things like the concentrations of neurotransmitters in the vesicles on both sides. It took a lot of gates (and a lot of software).
Point two, I think some of the statistical work being done by Google on the speech recog front is pretty good, and unpublished. The progress they have been making using statistical techniques for machine translation is also impressive. The Google Voice service is getting better. I just got these messages yesterday-
“Hi Matt, It’s Mom, Could you give me a call. Thanks. Love you.”
“Hi Matt, It’s George, I was just calling to let you know the car is ready to go if you could pick it up anytime until 9 o’clock. Thanks bye bye. “
Sam
May 4 2010 at 9:59am
Yes, what mattmc says above. The brain is not an electrical system; it’s an electrochemical system. Once you start having to model the chemistry you can go arbitrarily far “down” — is it enough to model concentrations of neurotransmitters, or do you need to account for the possibility that different neurotransmitters have different effects? Do you need to account for drugs possibly blocking neurorecepters? Is it enough to use a high-level model of the dynamics, or do you need to model the quantum chemistry of the active site, the drug, and the neurotransmitter to accurately predict the dynamics?
mdb
May 4 2010 at 12:04pm
That should read generic speech recognition flat lined in 2001.
The programs that you “train” to your voice, work amazingly well.
Chris T
May 4 2010 at 1:03pm
I’ve been arguing this for a while now. Transistors and neurons are not comparable (Kurzweil and his followers also ignore all of the other biochemical processes going on in the brain at any given time). Thankfully we don’t need them to be equivalent. It’s surprising how many things you can reduce down to logical sequences.
As far as science/technology slowing down, I’m not seeing it.
Josh
May 4 2010 at 4:41pm
I have this theory that there is a theoretical limit to how sophisticated a computer program can be given how much work it is to program, and the shape is roughly logarithmic (i.e. exponential programming complexity for linear increase in functionality).
Computers today have very limited abilities – interestingly, most of their ability is in things that humans are terrible at like math – and the programming is relatively straightforward.
As we try to do more complex things like pattern recognition (of which speech recognition is one example), what you’d love is for the programming of such an algorithm to be basically the same as for doing mathematical tasks, just more so. But I have a feeling that as you approach human levels of sophistication, you’d be just as happy programming at the genetic level as trying to get a computer to do it. And that we may find that the human brain is about as simple a device to program for its given level of complexity as is theoretically allowable.
MernaMoose
May 5 2010 at 2:12am
Josh,
Interesting theory. My pet theory is that software has not yet reach The Middle Stone Ages on the evolutionary scale. Think “bear skin wrapped around your waist, wooden club in hand”. But I think the computers we’ve developed to date, are only the first approximation. Who knows what comes next.
As far as progress slowing down, in a sense it is. But it’s easy to misunderstand the causes.
CPU speeds have been flat for a few years now. But it’s not because we don’t know how to make them go faster. It’s that nobody has found an economically viable way to implement faster CPUs. The cooling systems just get too complex, expensive, and unreliable.
We will find a way past this temporary bottle neck. But this problem is far different from the problem of say, the Romans not building steam engines because they had no freaking clue how to do it.
In the world of pattern recognition, speech patterns are one of the easiest problems to tackle. You have signal amplitude and frequency (i.e. time) to worry about. That’s a lot simpler than 3-D temporal imaging problems, for example trying to get software to look use a camera and pick a face out of a crowd of people walking by.
We’re pushing up against harder problems than every before. But it remains true that regular progress continues to be made. I can do things with computers today that was neigh-impossible even five years ago. For example home video editing is now starting to come into reach.
Changes in technology have forced changes in how business is done. This is forcing changes in how our whole social order works. I believe social adaptation lags at the tail end, and manifests itself as friction.
Ignorance (today in the form of nay-saying environmentalists who are simply and expressly opposed to technological advances), combined with the growth of The State and socialism, are the biggest obstacles that I see in front of us.
But ignorance, combined with the fact that we really haven’t figured out how to govern ourselves very well, has been our biggest obstacle since roughly the beginning of recorded history.
Mark Bahner
May 5 2010 at 12:35pm
If this post confirmed my biases, I would let it slide. But it conflicts with my bias, which is that “Massive improvements in hardware alone can overcome limited improvements in hardware.” So I won’t give the “flat line since 2001” a free pass.
First of all, as is pointed out by a commenter, Robert Fortner’s graph was a pretty egregious cherry-pick of the reference that Fortner cited:
“The rest of the story”…showing Fortner’s cherry-pick
Fortner chose the “Switchboard” curve, which ENDED in 2002. So obviously that curve could have “flatlined” in 2001, and it wouldn’t mean anything. Notice how Fortner did not reproduce the “conversational speech” curve, which continues to 2004, and shows improvement through 2004.
Perhaps more important, the whole figure that Fortner references could be interpreted in many ways, since there are many curves that start and stop at various times.
As also pointed out mdb, there is the question about whether these results are “trained” versus “untrained” voice recognition.
I simply can’t believe that a computer with massively more RAM (the 2010 personal computers versus 2001 personal computers) and a faster, multiple core processor, can’t dramatically improve the performance of speech recognition software, even if that software had not changed a bit. And it makes sense that the speech recognition software would be revised to take advantage of the improvements in microprocessors and RAM.
Here’s an article in which Nuance claims out-of-the-box 99% accuracy for Version 9 of Naturally Speaking:
Nuance discusses latest version of Naturally Speaking
From that article:
“The accuracy rate, or what percentage of words the software spells correctly by itself, varies depending on sound quality and how a person talks, Revis said. But Nuance has improved it by 20 percent since NaturallySpeaking 8 was introduced in 2004, according to the company.
Version 8 could reach 99 percent, but only after the user read a prepared script, Revis said. Now users can get that level of accuracy right after installing the software and starting it up, though a script is still available if a user isn’t satisfied with the results on the first try. In any case, the software can continue learning on its own just through normal use, Revis added.”
Comments are closed.