Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discord voice #338

Closed
wants to merge 20 commits into from
Closed

discord voice #338

wants to merge 20 commits into from

Conversation

tcm390
Copy link
Collaborator

@tcm390 tcm390 commented Nov 15, 2024

related: #244

This PR aims to improve and fix the Discord bot's voice functionality and enhance the code structure for better usability.

1. Fix Bot Voice Singleton Issue
Resolved an issue where the bot's voice functionality failed because the service instance was shared as a singleton across different services. Introduced a Map to ensure each service type has its own singleton instance. This guarantees only one instance of each subclass is created and reused.

2. Add shouldRespond Function
Added a shouldRespond function for the bot's voice feature to control when the bot should respond to user voice inputs.

3. Refactor Transcription Process
Refactored the transcription process by adding a debounce function, ensuring voice messages are processed only when silence is detected.

4. Enable Bot to Respond to Text Messages in Voice Channels
Updated the bot to handle text messages sent in voice channels.

5. Move Template Functions to templates.ts
Relocated template-related functions to a new templates.ts file for reusability in message.ts and voice.ts.

6. Add Optional DISCORD_VOICE_CHANNEL_ID in .env
Introduced a new constant, DISCORD_VOICE_CHANNEL_ID, in the .env file to allow users to specify a voice channel the bot should join.

7. Implemented Audio Playback Interrupt Mechanism
Add a sliding window buffer that monitors the audio volume while the agent is speaking. If the average volume of the user's audio exceeds the defined threshold, it indicates active speaking. When active speaking is detected, stop the agent's current audio playback to avoid overlap.

@tcm390 tcm390 marked this pull request as draft November 15, 2024 22:50
@tcm390 tcm390 changed the base branch from main to tcm-single-instance-issue November 15, 2024 23:10
@ponderingdemocritus ponderingdemocritus marked this pull request as ready for review November 20, 2024 00:32
@@ -88,108 +87,6 @@ export type InterestChannels = {
};
};

const discordShouldRespondTemplate =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcm390discordShouldRespondTemplate and discordMessageHandlerTemplate were moved to templates.ts, but there are still remnants of similar templates being imported and used in messages.ts and voice.ts. Ensure all references are consistently updated to prevent confusion or inconsistent behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review! I’m not entirely sure what you mean by "similar templates," as I didn’t see any others explicitly named as templates. However, I did notice some export functions in messages.ts and voice.ts, so I moved them to utils.ts for better organization. Could you confirm if that aligns with what you were referring to?

00a64dc

1f2a7f3

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcm390 Thanks for making those changes. This looks good to me!

}
);

console.log("responseMemories: ", responseMemories);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcm390 There is a lot of logging (console.log) throughout the code, which could flood the console in a production environment. It would be better to introduce a more sophisticated logging mechanism that allows setting log levels (e.g., info, debug, error).

name,
userName
);
}, DEBOUNCE_TRANSCRIPTION_THRESHOLD);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tcm390 There is a potential risk in the way voice transcriptions are handled, where intermittent silence may cause loss of transcription progress. You might want to ensure the transcription buffer persists until a clear "end of input" state is detected, rather than relying solely on timeouts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out! Could you elaborate further on what you thought? I cannot to think of a better way to implement it and would appreciate your insights.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tcm390, just to clarify, my earlier comment was more of a suggestion to ensure there are no potential issues with the implementation. I don't actually have a concrete idea on how to go about solving this. It was just something that came to mind, and I wanted to flag it in case it helps.

@tcm390 tcm390 changed the base branch from tcm-single-instance-issue to main November 21, 2024 16:49
@ponderingdemocritus
Copy link
Contributor

can you fix conflicts?

@@ -23,304 +19,33 @@ import {
UUID,
} from "@ai16z/eliza/src/types.ts";
import { stringToUuid } from "@ai16z/eliza/src/uuid.ts";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove fulll imports from eveyrwhere - /src/uuid.ts

@tcm390 tcm390 closed this Nov 26, 2024
@tcm390
Copy link
Collaborator Author

tcm390 commented Nov 26, 2024

Closing this PR due to too many conflicts. I’ll create a new PR with a clean base to address the issues.

@lalalune lalalune deleted the tcm-discord-voice branch March 8, 2025 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants