Speech detection using pyaudio

일하는/Audio

Speech detection using pyaudio

김논리 2021. 3. 16. 10:13

pyaudio의 간단한 사용 방법은 아래 포스팅을 참고한다.

ungodly-hour.tistory.com/35

pyaudio 기본 사용 방법

Mac OS 기준으로 작성되었습니다. pyaudio는 portaudio library를 python을 이용하여 사용할 수 있도록 하는 일종의 wrapper? 모듈로 생각하면 된다. pyaudio와 관련된 자세한 내용은 아래 도큐먼트를 참고한다.

ungodly-hour.tistory.com

pyaudio의 audio volume 정보를 이용하여 음성 발화의 시작과 끝을 detect 해 보자.

발화 상태를 다음과 같이 구분해 보자.

Listening: 음성 발화 시작 전
Speech started: Listening 상태에서 threshold 이상의 vol 값이 발생하면 발화 시작으로 인식한다.
Speech ended: Speech started 상태에서 threshold 미만의 vol 값이 3초 이상 지속될 경우, 발화 종료로 인식한다.

vol 값은 하나의 chunk data에서 max 값을 취하고, 약 1초 간의 vol 값의 평균을 구해 사용한다.

import pyaudio
from array import array
from collections import deque
from queue import Queue, Full
from threading import Thread

# const values for mic streaming
CHUNK = 1024
BUFF = CHUNK * 10
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000

# const valaues for silence detection
SILENCE_THREASHOLD = 2000
SILENCE_SECONDS = 3

def main():
    q = Queue()
    Thread(target=listen, args=(q,)).start()

# define listen function for threading
def listen(q):
    # open stream
    audio = pyaudio.PyAudio()
    stream = audio.open(
        format=FORMAT,
        channels=CHANNELS,
        rate=RATE,
        input=True,
        input_device_index=2,
        frames_per_buffer=CHUNK
    )

    # FIXME: release initial noisy data (1sec)
    for _ in range(0, int(RATE / CHUNK)):
        data = stream.read(CHUNK, exception_on_overflow=False)

    is_started = False
    vol_que = deque(maxlen=SILENCE_SECONDS)

    print('start listening')
    while True:
        try:
            # define temporary variable to store sum of volume for 1 second 
            vol_sum = 0

            # read data for 1 second in chunk
            for _ in range(0, int(RATE / CHUNK)):
                data = stream.read(CHUNK, exception_on_overflow=False)

                # get max volume of chunked data and update sum of volume
                vol = max(array('h', data))
                vol_sum += vol

                # if status is listening, check the volume value
                if not is_started:
                    if vol >= SILENCE_THREASHOLD:
                        print('start of speech detected')
                        is_started = True

                # if status is speech started, write data
                if is_started:
                    q.put(data)

            # if status is speech started, update volume queue and check silence
            if is_started:
                vol_que.append(vol_sum / (RATE / CHUNK) < SILENCE_THREASHOLD)
                if len(vol_que) == SILENCE_SECONDS and all(vol_que):
                    print('end of speech detected')
                    break
        except Full:
            pass

    # close stream
    stream.stop_stream()
    stream.close()
    audio.terminate()


if __name__ == '__main__':
    main()

'일하는 > Audio' 카테고리의 다른 글

Web Audio API (0)	2021.06.03
Web-based audio recording (0)	2021.05.25
pyaudio 기본 사용 방법 (0)	2021.03.16

현재글Speech detection using pyaudio

spacy, restframework, 웹서버성능테스트, pyaudio, HEJ FAMILI, 화곡피그백, REST, django, nlp, LeetCode, 경주SG빌라앤호텔, drf, 부하테스트, 감포황포식당, codecommit, AWS CodeCommit, 감포용진대게직판장, 릿코드, Python, 성능테스트,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

놀고 먹고 일하는 이야기