Thursday, October 4, 2012

Where Speech Recognition Is Going

                 Everyone’s heard of Siri, Apple’s voice recognition software for iPhones that can perform many actions just by the user speaking into the phone. This voice recognition technology seems most prominent in smartphones, but one of the biggest companies in this industry, Nuance Communications, wants this technology in many more products. This company is widely believed to be behind Siri’s voice interface and has plans in the works with numerous other companies to implement this technology in other products. Not too long ago, products with touchscreens swept through the markets and were the most desired merchandise. The next trend to sweep through the markets is voice recognition technologies, and Siri is just the beginning.
                 Already, work is being done to implement this voice interface into television, cars, computers, and even wearable computers. The hope is that if someone is busy doing something like driving, or is eating dinner while watching TV, they would not have to use their hands to perform actions. The driver could address the technology and find directions or the nearest movie theater without having to take his or her eyes off the road. The person eating wouldn't have to put his or her fork down to change the channel or put on a movie. Nuance wants all the actions to be hands-free, eliminating the need to hit a bunch of buttons on your GPS or having to plug in the buttons on your TV remote. With simply speaking, this voice interface will perform your task.  
                Siri may be a good start, but there are still kinks that need to be worked out. First, Nuance hopes to improve on privacy concerns that people may have while using such software. Many voice recognition systems come with this privacy concern because they cannot distinguish between different voices. Nuance hopes to make it an option so that only the owner of a product with this technology could speak to it, so that strangers couldn't ask questions to a phone to find out private information. There is also plan to make it so that this software could hear over background noise, or recognize that you are in a conversation and not speaking to it to prevent accidental communication with the software.
                One of the biggest concerns user would have is if this software is listening or recording conversations. Nuance hopes to develop technology that can go from “sleep-mode” to knowing that a person is talking to it. That means that it is always paying attention for trigger words that would let it know that a person is asking it do to something. Does that mean at all times, the software can hear all conversations? Can this be hacked and can strangers listen in on private conversations? Also, would private conversations be recorded and stored somewhere when they were not meant to be heard by the phone?
These issues clearly need to be addressed before the improved software is out on the market, but all in all the direction of this technology is headed in the right direction. While there are privacy issues, the substantial benefits of having this technology in many more products than just phones and computers as well as being able to perform hands-free tasks makes this technology very futuristic. It seems that the concerns for privacy can be addressed, and that this technology would be a great addition to many products.



  1. I believe that speech recognition is a great form of new innovation to be used throughout companies. Nuance has many uses for speech recognition software that can increase business efficiency in different ways. It serves as an aid in communication and direction, but also reduces the time is takes for specific tasks to be done. When it comes time for employees to multitask, speech recognition software can become almost like another mind to help that person get one task done, while he or she is performing another.

    This blog discusses the use of speech recognition through television, cars, and computers. One example of computer software that I believe is worth mentioning is Nuance’s Dragon Software. This software takes the words you speak and turns it into text to create a document, e-mail, or to search the web. Dragon creates texts three times faster than typing by hand which can reduce performance time in a company and allow for the next task to be taken on. It is adaptable enough to be used on the iPhone, iPod, iPad, or a compatible Android. It applies to many different people with all sorts of motives. However, it is beneficial for busy professionals, bloggers, reporters, inspectors, contractors, consultants, business owners, two-finger typists, and people who have carpal tunnel or arthritis. This software is adaptable not only in uses, but for the people it can be used by as well.

    Voice recognition is an interesting
    innovation. I strongly agree that this type of technology is futuristic. In today’s work environment, the pace is getting faster, the expectations are getting higher, and people need to be able to adapt and perform under stressed conditions. With the help of software like Dragon, people don’t have to worry about their minds working faster than their hands. This software will keep up with their speech and allow them to continue to process thoughts instead of distracting some of their focus towards typing.

    Nuance has put a typing challenge on its website to show potential buyers how efficient it really is. The challenge gives you a sentence to type and as you are typing, a voice recording says the sentence for Dragon to enter as text as well. After you have finished typing, the challenge compares how many words per minute you typed as opposed to Dragon. To see how effective this software really is, take the test yourself and see how much time this program could save you:

    As for the privacy aspect of speech recognition technology, I believe it is important to create a settings preference in which the user can decide when to be recorded, which mode it would like the recorder to be in, such as sleep mode, and which words he or she would like to make trigger words for the recorder to wake up and start recording. I also think speech recognition devices should hold a log of all recordings, where the user can into to see what has been recorded, what needs o be deleted, and what can be saved. These adjustments can definitely be worked on and implemented into this sort of technology for full efficiency.

  2. It seems as if every article in the news is about how each company is trying to put themselves one step ahead of the competition. When one invention is brought to the market, it is continuously altered to reap its full benefits. Voice recognition took electronics by storm when it became popular through the iPhone 4S with Siri.

    I am all for Nauce trying to take the concept of voice recognition up a notch. When people have the ability to communicate or complete mundane tasks without having to stop their current task not only can time be saved but also it will increase efficiency. The implementations of voice recognition in cars will allow drivers to keep their hands on the wheel at all time, and decrease the amount of accidents.

    However, I think voice activations when used for the television for example to start a movie are encouraging laziness. As exemplified in the blog, a person does not have to put down their fork in order to start a movie. I think this is become a serious issue in America. People are relying on technology too heavily. According to recent calculations about 35.7 adults in America are considered obese along with 16.9 percent of children. I partially blame the temptations of technology motivating lethargic mannerisms. At some point people need to just get up, “put the fork down” and do things manually.

    Contrary to this downturn of voice recognition, Nauce’s upcoming software has been noted to have a positive effect in the medical industry. It allows doctors to communicate with other branches of the hospital quicker as well as cut down on the document turnaround time. When used in a positive light, voice recognition could help society reach great extents.

    Similarly to most technology, I believe its value is based upon its purpose. When used to increase effectiveness in the medical field for example or in the safety of the citizens it is beneficial. Although when it targets consumers who lack the desire to act for themselves it is not justified.

  3. Chris, great article and post here. I would like to comment on where I believe voice recognition technology stands now as well as some of the questions I have for its future.

    The first thing that comes to my mind when I think of voice recognition these days is Apple’s Siri. Being the first company to come out with a phone that is this interactive got people all over the world excited. Now don’t get me wrong, the information and ability Siri provides to its user is incredible. From being able to ask it about the weather to being informed about the traffic, Siri seems to do it all. But from my experience and the experiences that those around me have had, I don’t think Siri is really all that great. Like I said before, the attention it receives is due to the innovation and not necessarily its effectiveness. There are way too many times, in my opinion, where Siri fails to recognize your request or fails to act upon it properly. I know that a particular growing concern with this voice recognition system is that it has trouble picking up on and interpreting requests coming from foreign users. Maybe this is why Professor Tallon elected to go with the Galaxy over the 4S…

    That is my take specifically on Siri, but as for the technology as a whole, my feelings are much more positive. One thing about the technology that I specifically support is how it allows for hands free driving. Being able to talk on your phone or enter an address into the GPS without using your hands provides an enormous amount of safety for yourself as well as the other drivers out on the road. Additionally, if the technology were used with televisions, the user would not have to worry about finding the clicker or washing the mess off their hands to change the channel, as they will just be able to interact with their television verbally. Again, some of these advantages may seem minor, but the fast paced life styles most of us live these days call for many things to be done simultaneously. With this technology, it enables users to multi-task in a very efficient manner, and whether it provides safety or convenience, I believe people will pay for it.

    To bring this all together, I believe that this technology has a lot of potential, and your post shows us some examples of that. But, in my opinion, especially with things like Siri (as explained above), there is still a lot of “maturing” that the technology has to go through. If a company like Nuance is able to come in and work on perfecting this technology…we might never need our hands again!

  4. Chris, I truly enjoyed reading your article and found it very interesting because “Speech Recognition” is a big topic right now, not only in smart phones but other devices as well. This type of technology advancement is relevant to many IT organizations including hardware, software and professional services firms.

    Websites are also trying to get a hold of this fascinating technology to help their websites grow. For example as of August 2012 Google is beginning to make some changes to its search menu to help make searches faster and more relevant. In order to achieve this Google is incorporating their Siri voice search feature to the iPad and iPhone. They are incorporating this in an update to its free Google Search app. This updated feature will use speech-recognition technology to understand spoken, natural language search requests. Though it has not yet been approved, Google is currently working with Apple to get the kinks out and up and running.

    As Chris mentioned one of the major companies that focus on voice recognition is Nuance. Nuance is a multinational computer software technology corporation that provides speech and imaging applications. Their most known use of speech recognition is through Apple’s iPhone 4s Siri voice recognition application. Nuance also provides other voice recognition software as well. For example Nuance introduces “Dragon Naturally “which is a desktop dictation solution. “Dragon Naturally” can create documents and emails via speaking. The computer can be controlled by ones voice and can quickly capture one’s thoughts and ideas. Nuance claims that by having this application it gives someone the freedom to interact with documents instead of the manual task of typing. Dragon technology can be useful because most of the time we think faster than we can type therefore this might be able to help prevent writers block for writers etc. Healthcare organizations are also trying to get a grasp on speech recognition to further enable the capture of the patient’s whole clinical story. The patients’ exam would not simply be recorded by the doctor or nurse and then translated later by a typist, instead the recordings would automatically be translated real time as the patient or doctor is making the recording. Nuance is also trying to improve call center interactions with their software solutions. The obvious benefit of having this type of technology in the call center is to provide a better customer service experience.

    All of these advancements in voice recognition are a good start to making life more efficient and faster, but as Chris says there are still kinks that need to be ironed out. Nuance is working to improve privacy concerns while they use these types of software in a limited capacity so far. Privacy is a main concern when using voice recognition because it cannot distinguish from one voice to another. This can be a major problem if a company or healthcare organization is using voice recognition and patient’s files get mixed up or even worse shared with the public. Nuance is working to make it an option for only the owner of the product or through a specific security configuration to only enable approved users use the applications. This way private information will not be shared with the wrong people. Nuance is also trying to tone out background noise and hone in on the specific voice. As Chris states, these issues are very important to have sorted out before any real monumental breakthrough applications are widely used. Overall they are headed in the right direction and this technology is creating quite a buzz.


  5. I find speech recognition technology extremely intriguing but I wondered how accurate Nuance’s software really is. After doing some research, I found a review for Nuance’s “Dragon NaturallySpeaking” program. The author compared it to the Speech Recognition utility that is found for free on most Windows computers. Unsurprisingly, Nuance surpassed Microsoft in the area of accuracy. The author has used Dragon Naturally for interviewing purposes for years and can account for the improvement with the newest edition.

    A user can improve the accuracy of Nuance’s Dragon program by customizing a profile. The user selects their “speech model,” specifying their accent and language. You can also select the type of computer you are using and whether you’re using a Bluetooth or not. If there is a certain word that Dragon is having trouble recognition, you can train it by manually typing in the word that you are trying to say. Another way to improve Dragon is by reading certain passages that are included with the program. The more that Dragon is exposed to your voice, the more accurate it is at recording your speech. The author found that using a headset gives the best performance, but this may be inconvenient for some people. I find Nuance’s “Dragon NaturallySpeaking” innovative and fascinating, but I wondered which businesses can benefit from this type of technology.

    Hospitals can gain significant advantage from voice recognition technology. In Sutter Health Facilities in Sacramento California, they installed “a new radiology voice recognition system.” Rather than manually reporting patient diagnoses, hospitals can cut costs and save time. As a doctor analyzes a patient’s test results on a computer screen aloud, the speech is recorded and instantly sent to the patient’s record. The patient’s waiting time is decreased, freeing up hospital space for more patients. There is also less room for error because the program sends reminders to the doctor based on the treatment and diagnosis.

    Overall, I find voice recognition technology today to be extremely accurate. I think that it would be wise for businesses to invest in the equipment and software because it will eventually lead to a reduction in costs and a significant savings in time.