ChatGPT - Raspberry Pi

note

この文書は AI によって翻訳されています。内容に不正確な点や改善すべき点がございましたら、文書下部のコメント欄または以下の Issue ページにてご報告ください。
https://github.com/Seeed-Studio/wiki-documents/issues

このプロジェクトでは、Raspberry Pi 5 を使用して音声入力、大規模モデルの応答、および音声出力機能を統合しています。ReSpeaker Lite を音声入出力デバイスとして使用し、ChatGPT および音声認識サービスとのシームレスな対話を可能にします。

必要なハードウェア

はじめに

まず、Raspberry Pi のセットアップドキュメントを確認し、Raspberry Pi をネットワークに接続してください。

note

Python のバージョンが python3.7.1 以上であることを確認してください。
バージョンを確認するには以下を実行してください：

python3 --version

ライブラリのインストール

sudo apt update
sudo apt install python3-pip python3-dev
sudo apt install portaudio19-dev
pip3 install pyaudio
pip3 install speechrecognition
pip3 install openai
pip3 install playsound

Raspberry Pi 5 では、以下のコマンドを実行して ReSpeaker Lite を設定してください：

pw-metadata -n settings 0 clock.force-rate 16000

永続的な変更を行うには、/etc/pipewire/pipewire.conf の default.clock.rate 行をアンハッシュして編集してください（まず /usr/share/ からコピーしてください）。

tip

ReSpeaker Lite の音量を調整するコマンド：

alsamixer

コード

この Python コードは、ウェイクワードを認識し、ユーザーの音声コマンドをテキストに変換し、GPT-4 を使用して応答を生成し、その応答を音声に変換して再生するシンプルな音声アシスタントを実装します。

デバイスはまずウェイクワードを待機し、その後ユーザーのコマンドを聞き取ります。コマンドを受信すると、プログラムは GPT-4 を使用して応答を生成し、それを音声として再生します。コマンドの認識に3回失敗すると、ウェイクワードの待機状態に戻り、新しい音声対話セッションを開始するには再度ウェイクワードを言う必要があります。

ステップ1: API キーを設定

export OPENAI_API_KEY= 'your-api-key-here'

ステップ2: 新しい Python ファイルを作成し、以下のコードを入力してください：

import speech_recognition as sr
from openai import OpenAI
from pathlib import Path
from pydub import AudioSegment
import os


client = OpenAI()

def text_to_speech(text):
    speech_file_path = Path(__file__).parent / "speech.mp3"
    response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input=text
    )

    response.stream_to_file(speech_file_path)
    audio = AudioSegment.from_mp3("speech.mp3")
    audio.export("speech.wav", format="wav")
    cmdline = 'aplay ' + " speech.wav" 
    os.system(cmdline)



# 音声認識器を初期化
recognizer = sr.Recognizer()
microphone = sr.Microphone()

# ウェイクワードを定義
WAKE_WORD = "hi"

def listen_for_wake_word():
    with microphone as source:
        recognizer.adjust_for_ambient_noise(source, duration=0.5)
        print("ウェイクワードを待機中...")
        
        while True:
            audio = recognizer.listen(source)
            # audio = recognizer.listen(source, timeout=5, phrase_time_limit=5)
            try:
                text = recognizer.recognize_google(audio).lower()
                if WAKE_WORD in text:
                    print(f"ウェイクワード '{WAKE_WORD}' を検出しました。")
                    text_to_speech("hi,what can i do for you?")
                    return True
            except sr.UnknownValueError:
                continue
            except sr.RequestError as e:
                print(f"結果をリクエストできませんでした； {e}")
                continue

i=0
def listen_for_command():
    global i
    with microphone as source:
        print("コマンドを待機中...")
        # audio = recognizer.listen(source)
        audio = recognizer.listen(source, timeout=5, phrase_time_limit=5)
        try:
            command = recognizer.recognize_google(audio)
            print(f"あなたの発言: {command}")
            i=0
            return command
        except sr.UnknownValueError:
            print("音声を認識できませんでした")
            i = i+1
        except sr.RequestError as e:
            print(f"結果をリクエストできませんでした； {e}")
            i = i+1


def get_gpt_response(prompt):
    completion = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "あなたの名前は speaker です。私のためにあらゆる質問に答えることができます"},
        {"role": "user", "content": prompt}
    ]
    )

    content_string = completion.choices[0].message.content
    paragraphs = content_string.split('\n\n')
    combined_content = ' '.join(paragraphs)
    return combined_content




def main():
    global i
    while 1:
        flag = listen_for_wake_word()
        while flag == True:
            user_input = listen_for_command()
            if i==3:
                flag = False
                i = 0
            if user_input:
                gpt_response = get_gpt_response(user_input)
                print(f"GPT の応答: {gpt_response}")
                text_to_speech(gpt_response)
                

if __name__ == "__main__":
    main()

ステップ3: Python ファイルを実行してください。

python openai.py

これで準備完了です。「Hi」で起動して話しかけてみてください！

必要なハードウェア​

はじめに​

ライブラリのインストール​

コード​

必要なハードウェア

はじめに

ライブラリのインストール

コード