Speech Recognizer API – what do you need to know?
In iOS 10 we can use SFSpeechRecognizer API, which allows transcription in real-time or using pre-recorded audio files. The outcome of such transcription is not only a text, but also alternative interpretations of the audio, length of spoken words and level of accuracy of recognized words (range 0.0 - 1.0). API allows for the analysis of more than 50 languages. Using SFSpeechRecognizer API in an application is trivial, it boils down to four steps.

Adding appropriate keys along with their descriptions to the file info.plist

  • NSSpeechRecognitionUsageDescription – a key telling you what we will use SFSpeechRecognizer for in the application.
  • NSMicrophoneUsageDescription – in case you use the microphone to analyze live speech.

Calling request for permission SFSpeechRecognizer

SFSpeechRecognizer.requestAuthorization { authStatus in
                The callback may not be called on the main thread. Add an
                operation to the main queue to update the record button's state.
            OperationQueue.main.addOperation {
                switch authStatus {
                    case .authorized:
                    case .denied:
                    case .restricted:
                    case .notDetermined:

Creating speech recognition request

There are two types of such a request: SFSpeechURLRecognitionReqest – used to analyze the speech from a previously recorded audio file SFSpeechAudioBufferRecognitionRequest – used to analyze live speech (using a microphone) For both requests SFSpeechRecognizer class will be needed to perform an analysis of the speech.

let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "pl-PL"))


Handling such a request comes down to a few lines of code:

let fileURL = URL(fileURLWithPath: Bundle.main.path(forResource: "audio", ofType: ".mp3")!)
let request = SFSpeechURLRecognitionRequest(url: fileURL)


Handling analysis of live speech requires a little more work. You should additionally use AVAudioEngine to capture speech from the microphone of the device. The next and final element which is necessary for speech recognition is: SFSpeechRecognitionTask.

recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
        guard let inputNode = audioEngine.inputNode else { fatalError("Audio engine has no input node") }
        guard let recognitionRequest = recognitionRequest else { fatalError("Unable to created a SFSpeechAudioBufferRecognitionRequest object") }
        // Configure request so that results are returned before audio recording is finished
        recognitionRequest.shouldReportPartialResults = true
        // A recognition task represents a speech recognition session.
        // We keep a reference to the task so that it can be cancelled.
        recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
        let recordingFormat = inputNode.outputFormat(forBus: 0)
        inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
        try audioEngine.start()

SFSpeechRecognitionResult includes bestTranscription and alternative transcripts.

projects in portfolio