Speech Recognizer API – what do you need to know?

Speech Recognizer API – what do you need to know?
In iOS 10 we can use SFSpeechRecognizer API, which allows transcription in real-time or using pre-recorded audio files. The outcome of such transcription is not only a text, but also alternative interpretations of the audio, length of spoken words and level of accuracy of recognized words (range 0.0 - 1.0). API allows for the analysis of more than 50 languages. Using SFSpeechRecognizer API in an application is trivial, it boils down to four steps.

Adding appropriate keys along with their descriptions to the file info.plist

  • NSSpeechRecognitionUsageDescription – a key telling you what we will use SFSpeechRecognizer for in the application.
  • NSMicrophoneUsageDescription – in case you use the microphone to analyze live speech.

Calling request for permission SFSpeechRecognizer

SFSpeechRecognizer.requestAuthorization { authStatus in
            /*
                The callback may not be called on the main thread. Add an
                operation to the main queue to update the record button's state.
            */
            OperationQueue.main.addOperation {
                switch authStatus {
                    case .authorized:
                         //..
                    case .denied:
                         //..
                    case .restricted:
                         //..
                    case .notDetermined:
                         //..
                }
            }
        }

Creating speech recognition request

There are two types of such a request: SFSpeechURLRecognitionReqest – used to analyze the speech from a previously recorded audio file SFSpeechAudioBufferRecognitionRequest – used to analyze live speech (using a microphone) For both requests SFSpeechRecognizer class will be needed to perform an analysis of the speech.

//..
let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "pl-PL"))
//..

SFSpeechURLRecognitionReqest

Handling such a request comes down to a few lines of code:

let fileURL = URL(fileURLWithPath: Bundle.main.path(forResource: "audio", ofType: ".mp3")!)
let request = SFSpeechURLRecognitionRequest(url: fileURL)

SFSpeechAudioBufferRecognitionRequest

Handling analysis of live speech requires a little more work. You should additionally use AVAudioEngine to capture speech from the microphone of the device. The next and final element which is necessary for speech recognition is: SFSpeechRecognitionTask.

//..
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
        
        guard let inputNode = audioEngine.inputNode else { fatalError("Audio engine has no input node") }
        guard let recognitionRequest = recognitionRequest else { fatalError("Unable to created a SFSpeechAudioBufferRecognitionRequest object") }
        
        // Configure request so that results are returned before audio recording is finished
        recognitionRequest.shouldReportPartialResults = true
        
        // A recognition task represents a speech recognition session.
        // We keep a reference to the task so that it can be cancelled.
        recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
            //..
        }
        
        let recordingFormat = inputNode.outputFormat(forBus: 0)
        inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
            self.recognitionRequest?.append(buffer)
        }
        
        audioEngine.prepare()
        try audioEngine.start()

SFSpeechRecognitionResult includes bestTranscription and alternative transcripts.

Learn more

Promises on the example of PromiseKit in Objective-C

Sooner or later every programmer encounters the problem of synchronous execution of certain actions. For example: get user information from the API, parse server response, save data to the database, update the view, and many others. To make it even more, at some of these stages you still need to deal with error handling. What should you do exactly?
Read more

UIViewPropertyAnimator in iOS 10

Introducing 10th version of iOS, Apple gave developers a new tool – UIViewPropertyAnimator. It enhances options of creating animation in our application. What can we do with this tool, and how to use it?
Read more

Searching application content in iOS in a nutshell

Today, I will talk about one of the frameworks added in iOS 9: Core Spotlight . API allows you to add content to spotlight search engine, so that, for example, an application used for watching movies allows for adding movies, actors, directors and reacting if users select an item, so that we can move them to the desired location within the application.
Read more

Project estimation

Check out how we use our knowledge in practice, and make your project with us.

Why choose us?

Logo Mobile Trends Awards

Mobile Trends Awards 2017

Nomination in M-commerce category

17

clients reviews

Clutch logo
Logo Legalni bukmacherzy

Legal Bookmakers Award 2019

Best Mobile App

60+

projects in portfolio