STT) Azure STT 기술 검토

SMALL

안녕하세요, 코린이의 코딩 학습기 채니 입니다.

개인 포스팅용으로 내용에 오류 및 잘못된 정보가 있을 수 있습니다.

STT (Speech-To-Text) 기술을 사용해야하는 프로젝트가 생겨서 Google, Naver, Azure가 제공하는 STT 서비스를 기술 검토해보았다.

검토사항은 아래와 같다.

스트리밍 처리가 가능한가
한국어를 가장 잘 인식할 수 있는가
가격은 저렴한가

Azure

지원언어: 100개 이상 (언어지원)
Azure 공식 홈페이지

가격

Standard를 기준으로 실시간 대화 내용 기록만 사용한다고 가정했을 때 가격은 아래와 같다.

1분: $0.0167 (약 23원)
1시간: $1.00 (약 1,384원)

리소스 한도

서비스 이용해보기

Anaconda 설치 및 설정 (이미 설치되어 있는 경우 생략 가능)

$ brew install --cask anaconda
$ /opt/homebrew/anaconda3/bin/conda init zsh
$ source ~/.zshrc

가상환경 생성

$ conda create -n azureSttSample python=3.13

⭐️ 반드시 Python 3.7 이상 버전 사용 ⭐️

패키지 설치

requirements.txt 파일 생성

azure-cognitiveservices-speech==1.41.1
setuptools==75.1.0
wheel==0.44.0

$ pip install -r requirements.txt

예제코드

import azure.cognitiveservices.speech as speechsdk

def save_text_to_file(text):
    if text.strip():
        with open("transcription_output.txt", "a", encoding="utf-8") as file:
            file.write(f"{text}\n")

def speech_recognize_continuous_async_from_microphone():
    """performs continuous speech recognition asynchronously with input from microphone"""
    speech_config = speechsdk.SpeechConfig(subscription="YOUR SECRET KEY", region="YOUR REGION")
    speech_config.speech_recognition_language="ko-KR"
    speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs, "250")
    # The default language is "en-us".
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

    done = False

    def recognizing_cb(evt: speechsdk.SpeechRecognitionEventArgs):
        print(f"RECOGNIZING: {evt.result.text}")

    def recognized_cb(evt: speechsdk.SpeechRecognitionEventArgs):
        save_text_to_file(evt.result.text)

    def stop_cb(evt: speechsdk.SessionEventArgs):
        """callback that signals to stop continuous recognition"""
        print('CLOSING on {}'.format(evt))
        nonlocal done
        done = True

    # Connect callbacks to the events fired by the speech recognizer
    speech_recognizer.recognizing.connect(recognizing_cb)
    speech_recognizer.recognized.connect(recognized_cb)
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)

    # Perform recognition. `start_continuous_recognition_async asynchronously initiates continuous recognition operation,
    # Other tasks can be performed on this thread while recognition starts...
    # wait on result_future.get() to know when initialization is done.
    # Call stop_continuous_recognition_async() to stop recognition.
    result_future = speech_recognizer.start_continuous_recognition_async()

    result_future.get()  # wait for voidfuture, so we know engine initialization is done.
    print('Continuous Recognition is now running, say something.')

    while not done:
        # No real sample parallel work to do on this thread, so just wait for user to type stop.
        # Can't exit function or speech_recognizer will go out of scope and be destroyed while running.
        print('type "stop" then enter when done')
        stop = input()
        if (stop.lower() == "stop"):
            print('Stopping async recognition.')
            speech_recognizer.stop_continuous_recognition_async()
            break

    print("recognition stopped, main thread can exit now.")


speech_recognize_continuous_async_from_microphone()

azure에서도 음성 인식에 대한 설정을 먼저 해주는 것을 확인할 수 있다.

speech_config.set_property(speechsdk.PropertyId.Speech_SegmentationSilenceTimeoutMs, "250") 설정은
250ms동안 말이 없는 경우 문장이 끝났다고 간주하여 인식 결과를 반환해주도록 해주는 것으로 How to recognize speech에서 자세한 설명을 볼 수 있다.

성능 테스트

샘플동영상을 10분 정도 틀어놓고 얼마나 잘 인식하는지 테스트 해보았다.

결과 값은 아래 파일을 참고하자!

Azure가 가장 구현이 쉬웠고, 성능도 괜찮았다!

근데 그만큼 상대적으로 비싼 편ㅠ

stt_azure_result.txt

0.01MB

LIST

저작자표시

'AI' 카테고리의 다른 글

STT) Naver Cloud(Clova) STT 기술 검토 (1)	2024.12.10
STT) Google Cloud STT 기술 검토 (0)	2024.11.27