Speech Recognizer API – what do you need to know?

Adding appropriate keys along with their descriptions to the file info.plist
- NSSpeechRecognitionUsageDescription – a key telling you what we will use SFSpeechRecognizer for in the application.
- NSMicrophoneUsageDescription – in case you use the microphone to analyze live speech.
Calling request for permission SFSpeechRecognizer
SFSpeechRecognizer.requestAuthorization { authStatus in
/*
The callback may not be called on the main thread. Add an
operation to the main queue to update the record button's state.
*/
OperationQueue.main.addOperation {
switch authStatus {
case .authorized:
//..
case .denied:
//..
case .restricted:
//..
case .notDetermined:
//..
}
}
}
Creating speech recognition request
There are two types of such a request: SFSpeechURLRecognitionReqest – used to analyze the speech from a previously recorded audio file SFSpeechAudioBufferRecognitionRequest – used to analyze live speech (using a microphone) For both requests SFSpeechRecognizer class will be needed to perform an analysis of the speech.
//..
let speechRecognizer = SFSpeechRecognizer(locale: Locale(identifier: "pl-PL"))
//..
SFSpeechURLRecognitionReqest
Handling such a request comes down to a few lines of code:
let fileURL = URL(fileURLWithPath: Bundle.main.path(forResource: "audio", ofType: ".mp3")!)
let request = SFSpeechURLRecognitionRequest(url: fileURL)
SFSpeechAudioBufferRecognitionRequest
Handling analysis of live speech requires a little more work. You should additionally use AVAudioEngine to capture speech from the microphone of the device. The next and final element which is necessary for speech recognition is: SFSpeechRecognitionTask.
//..
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
guard let inputNode = audioEngine.inputNode else { fatalError("Audio engine has no input node") }
guard let recognitionRequest = recognitionRequest else { fatalError("Unable to created a SFSpeechAudioBufferRecognitionRequest object") }
// Configure request so that results are returned before audio recording is finished
recognitionRequest.shouldReportPartialResults = true
// A recognition task represents a speech recognition session.
// We keep a reference to the task so that it can be cancelled.
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
//..
}
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer: AVAudioPCMBuffer, when: AVAudioTime) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
try audioEngine.start()
SFSpeechRecognitionResult includes bestTranscription and alternative transcripts.