# Fish Audio MCP Server
[](https://badge.fury.io/js/@alanse%2Ffish-audio-mcp-server) [](https://opensource.org/licenses/MIT)
An MCP (Model Context Protocol) server that provides seamless integration between Fish Audio's Text-to-Speech API and LLMs like Claude, enabling natural language-driven speech synthesis.
## What is Fish Audio?
[Fish Audio](https://fish.audio/) is a cutting-edge Text-to-Speech platform that offers:
- **State-of-the-art voice synthesis** with natural-sounding output
- **Voice cloning capabilities** to create custom voice models
- **Multilingual support** including English, Japanese, Chinese, and more
- **Low-latency streaming** for real-time applications
- **Fine-grained control** over speech prosody and emotions
This MCP server brings Fish Audio's powerful capabilities directly to your LLM workflows.
## Features
- **High-Quality TTS**: Leverage Fish Audio's state-of-the-art TTS models
- **Streaming Support**: Real-time audio streaming for low-latency applications
- **Multiple Voices**: Support for custom voice models via reference IDs
- **Smart Voice Selection**: Select voices by ID, name, or tags
- **Voice Library Management**: Configure and manage multiple voice references
- **Flexible Configuration**: Environment variable-based configuration
- **Multiple Audio Formats**: Support for MP3, WAV, PCM, and Opus
- **Easy Integration**: Simple setup with any MCP-compatible client
## Quick Start
### Installation
You can run this MCP server directly using npx:
\`\`\`bash
npx @alanse/fish-audio-mcp-server
\`\`\`
Or install it globally:
\`\`\`bash
npm install -g @alanse/fish-audio-mcp-server
\`\`\`
### Configuration
1. Get your Fish Audio API key from [Fish Audio](https://fish.audio/)
2. Set up environment variables:
\`\`\`bash
export FISH_API_KEY=your_fish_audio_api_key_here
\`\`\`
3. Add to your MCP settings configuration:
#### Single Voice Mode (Simple)
\`\`\`json
\{
"mcpServers": \{
"fish-audio": \{
"command": "npx",
"args": ["-y", "@alanse/fish-audio-mcp-server"],
"env": \{
"FISH_API_KEY": "your_fish_audio_api_key_here",
"FISH_MODEL_ID": "speech-1.6",
"FISH_REFERENCE_ID": "your_voice_reference_id_here",
"FISH_OUTPUT_FORMAT": "mp3",
"FISH_STREAMING": "false",
"FISH_LATENCY": "balanced",
"FISH_MP3_BITRATE": "128",
"FISH_AUTO_PLAY": "false",
"AUDIO_OUTPUT_DIR": "~/.fish-audio-mcp/audio_output"
\}
\}
\}
\}
\`\`\`
#### Multiple Voice Mode (Advanced)
\`\`\`json
\{
"mcpServers": \{
"fish-audio": \{
"command": "npx",
"args": ["-y", "@alanse/fish-audio-mcp-server"],
"env": \{
"FISH_API_KEY": "your_fish_audio_api_key_here",
"FISH_MODEL_ID": "speech-1.6",
"FISH_REFERENCES": "[\{'reference_id':'id1','name':'Alice','tags':['female','english']\},\{'reference_id':'id2','name':'Bob','tags':['male','japanese']\},\{'reference_id':'id3','name':'Carol','tags':['female','japanese','anime']\}]",
"FISH_DEFAULT_REFERENCE": "id1",
"FISH_OUTPUT_FORMAT": "mp3",
"FISH_STREAMING": "false",
"FISH_LATENCY": "balanced",
"FISH_MP3_BITRATE": "128",
"FISH_AUTO_PLAY": "false",
"AUDIO_OUTPUT_DIR": "~/.fish-audio-mcp/audio_output"
\}
\}
\}
\}
\`\`\`
## Environment Variables
| Variable | Description | Default | Required |
|----------|-------------|---------|----------|
| \`FISH_API_KEY\` | Your Fish Audio API key | - | Yes |
| \`FISH_MODEL_ID\` | TTS model to use (s1, speech-1.5, speech-1.6) | \`s1\` | Optional |
| \`FISH_REFERENCE_ID\` | Default voice reference ID (single reference mode) | - | Optional |
| \`FISH_REFERENCES\` | Multiple voice references (see below) | - | Optional |
| \`FISH_DEFAULT_REFERENCE\` | Default reference ID when using multiple references | - | Optional |
| \`FISH_OUTPUT_FORMAT\` | Default audio format (mp3, wav, pcm, opus) | \`mp3\` | Optional |
| \`FISH_STREAMING\` | Enable streaming mode (HTTP/WebSocket) | \`false\` | Optional |
| \`FISH_LATENCY\` | Latency mode (normal, balanced) | \`balanced\` | Optional |
| \`FISH_MP3_BITRATE\` | MP3 bitrate (64, 128, 192) | \`128\` | Optional |
| \`FISH_AUTO_PLAY\` | Auto-play audio and enable real-time playback | \`false\` | Optional |
| \`AUDIO_OUTPUT_DIR\` | Directory for audio file output | \`~/.fish-audio-mcp/audio_output\` | Optional |
### Configuring Multiple Voice References
You can configure multiple voice references in two ways:
#### JSON Array Format (Recommended)
Use the \`FISH_REFERENCES\` environment variable with a JSON array:
\`\`\`bash
FISH_REFERENCES='[
\{"reference_id":"id1","name":"Alice","tags":["female","english"]\},
\{"reference_id":"id2","name":"Bob","tags":["male","japanese"]\},
\{"reference_id":"id3","name":"Carol","tags":["female","japanese","anime"]\}
]'
FISH_DEFAULT_REFERENCE="id1"
\`\`\`
#### Individual Format (Backward Compatibility)
Use numbered environment variables:
\`\`\`bash
FISH_REFERENCE_1_ID=id1
FISH_REFERENCE_1_NAME=Alice
FISH_REFERENCE_1_TAGS=female,english
FISH_REFERENCE_2_ID=id2
FISH_REFERENCE_2_NAME=Bob
FISH_REFERENCE_2_TAGS=male,japanese
\`\`\`
## Usage
Once configured, the Fish Audio MCP server provides two tools to LLMs.
### Tool 1: \`fish_audio_tts\`
Generates speech from text using Fish Audio's TTS API.
#### Parameters
- \`text\` (required): Text to convert to speech (max 10,000 characters)
- \`reference_id\` (optional): Voice model reference ID
- \`reference_name\` (optional): Select voice by name
- \`reference_tag\` (optional): Select voice by tag
- \`streaming\` (optional): Enable streaming mode
- \`format\` (optional): Output format (mp3, wav, pcm, opus)
- \`mp3_bitrate\` (optional): MP3 bitrate (64, 128, 192)
- \`normalize\` (optional): Enable text normalization (default: true)
- \`latency\` (optional): Latency mode (normal, balanced)
- \`output_path\` (optional): Custom output file path
- \`auto_play\` (optional): Automatically play the generated audio
- \`websocket_streaming\` (optional): Use WebSocket streaming instead of HTTP
- \`realtime_play\` (optional): Play audio in real-time during WebSocket streaming
**Voice Selection Priority**: reference_id > reference_name > reference_tag > default
### Tool 2: \`fish_audio_list_references\`
Lists all configured voice references.
#### Parameters
No parameters required.
#### Returns
- List of configured voice references with their IDs, names, and tags
- Default reference ID
### Examples
#### Basic Text-to-Speech
\`\`\`
User: "Generate speech saying 'Hello, world! Welcome to Fish Audio TTS.'"
Claude: I'll generate speech for that text using Fish Audio TTS.
[Uses fish_audio_tts tool with text parameter]
Result: Audio file saved to ./audio_output/tts_2025-01-03T10-30-00.mp3
\`\`\`
#### Using Custom Voice by ID
\`\`\`
User: "Generate speech with voice model xyz123 saying 'This is a custom voice test'"
Claude: I'll generate speech using the specified voice model.
[Uses fish_audio_tts tool with text and reference_id parameters]
Result: Audio generated with custom voice model xyz123
\`\`\`
#### Using Voice by Name
\`\`\`
User: "Use Alice's voice to say 'Hello from Alice'"
Claude: I'll generate speech using Alice's voice.
[Uses fish_audio_tts tool with reference_name: "Alice"]
Result: Audio generated with Alice's voice
\`\`\`
#### Using Voice by Tag
\`\`\`
User: "Generate Japanese speech saying 'こんにちは' with an anime voice"
Claude: I'll generate Japanese speech with an anime-style voice.
[Uses fish_audio_tts tool with reference_tag: "anime"]
Result: Audio generated with anime voice style
\`\`\`
#### List Available Voices
\`\`\`
User: "What voices are available?"
Claude: I'll list all configured voice references.
[Uses fish_audio_list_references tool]
Result:
- Alice (id: id1) - Tags: female, english [Default]
- Bob (id: id2) - Tags: male, japanese
- Carol (id: id3) - Tags: female, japanese, anime
\`\`\`
#### HTTP Streaming Mode
\`\`\`
User: "Generate a long speech in streaming mode about the benefits of AI"
Claude: I'll generate the speech in streaming mode for faster response.
[Uses fish_audio_tts tool with streaming: true]
Result: Streaming audio saved to ./audio_output/tts_2025-01-03T10-35-00.mp3
\`\`\`
#### WebSocket Real-time Streaming
\`\`\`
User: "Stream and play in real-time: 'Welcome to the future of AI'"
Claude: I'll stream the speech via WebSocket and play it in real-time.
[Uses fish_audio_tts tool with websocket_streaming: true, realtime_play: true]
Result: Audio streamed and played in real-time via WebSocket
\`\`\`
## Development
### Local Development
1. Clone the repository:
\`\`\`bash
git clone https://github.com/da-okazaki/mcp-fish-audio-server.git
cd mcp-fish-audio-server
\`\`\`
2. Install dependencies:
\`\`\`bash
npm install
\`\`\`
3. Create \`.env\` file:
\`\`\`bash
cp .env.example .env
# Edit .env with your API key
\`\`\`
4. Build the project:
\`\`\`bash
npm run build
\`\`\`
5. Run in development mode:
\`\`\`bash
npm run dev
\`\`\`
### Testing
Run the test suite:
\`\`\`bash
npm test
\`\`\`
### Project Structure
\`\`\`
mcp-fish-audio-server/
├── src/
│ ├── index.ts # MCP server entry point
│ ├── tools/
│ │ └── tts.ts # TTS tool implementation
│ ├── services/
│ │ └── fishAudio.ts # Fish Audio API client
│ ├── types/
│ │ └── index.ts # TypeScript definitions
│ └── utils/
│ └── config.ts # Configuration management
├── tests/ # Test files
├── audio_output/ # Default audio output directory
├── package.json
├── tsconfig.json
└── README.md
\`\`\`
## API Documentation
### Fish Audio Service
The service provides two main methods:
1. **generateSpeech**: Standard TTS generation
- Returns audio buffer
- Suitable for short texts
- Lower memory usage
2. **generateSpeechStream**: Streaming TTS generation
- Returns audio stream
- Suitable for long texts
- Real-time processing
### Error Handling
The server handles various error scenarios:
- **INVALID_API_KEY**: Invalid or missing API key
- **NETWORK_ERROR**: Connection issues with Fish Audio API
- **INVALID_PARAMS**: Invalid request parameters
- **QUOTA_EXCEEDED**: API rate limit exceeded
- **SERVER_ERROR**: Fish Audio server errors
## Troubleshooting
### Common Issues
1. **"FISH_API_KEY environment variable is required"**
- Ensure you've set the \`FISH_API_KEY\` environment variable
- Check that the API key is valid
2. **"Network error: Unable to reach Fish Audio API"**
- Check your internet connection
- Verify Fish Audio API is accessible
- Check for proxy/firewall issues
3. **"Text length exceeds maximum limit"**
- Split long texts into smaller chunks
- Maximum supported length is 10,000 characters
4. **Audio files not appearing**
- Check the \`AUDIO_OUTPUT_DIR\` path exists
- Ensure write permissions for the directory
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
1. Fork the repository
2. Create your feature branch (\`git checkout -b feature/AmazingFeature\`)
3. Commit your changes (\`git commit -m 'Add some AmazingFeature'\`)
4. Push to the branch (\`git push origin feature/AmazingFeature\`)
5. Open a Pull Request
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## Acknowledgments
- [Fish Audio](https://fish.audio/) for providing the excellent TTS API
- [Anthropic](https://anthropic.com/) for creating the Model Context Protocol
- The MCP community for inspiration and examples
## Support
For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/da-okazaki/mcp-fish-audio-server).
## Changelog
See [CHANGELOG.md](CHANGELOG.md) for a detailed list of changes.