MCP Server¶
speech-mine ships an MCP (Model Context Protocol) server that exposes all its capabilities as tools for Claude Code and other MCP-compatible clients.
Installation¶
One command, no clone needed:
This uses uvx to pull the published version from PyPI and run the server. Restart Claude Code after running it.
If you cloned the repo: the included .mcp.json configures the server automatically when you open the project in Claude Code.
Available tools¶
| Tool | Description |
|---|---|
search_transcript |
Fuzzy-search a transcript CSV for a word or phrase |
get_transcript_stats |
Word count, speaker list, duration, and confidence |
read_transcript |
Export transcript data as utterances, segments, words, or JSON |
format_transcript |
Convert a CSV to a human-readable script file |
extract_audio |
Transcribe audio with speaker diarization (spawns subprocess) |
chunk_audio |
Split a WAV file into timed segments from a YAML config |
Usage examples¶
Once installed, Claude Code can use the tools directly. Some examples of what you can ask:
Format output.csv into a readable script and save to script.txt,
mapping SPEAKER_00 to "Alice" and SPEAKER_01 to "Bob" using speakers.json
Tool reference¶
search_transcript¶
csv_path Path to the transcript CSV
query Word, phrase, or sentence to search for
min_similarity Minimum similarity score (0.0–1.0, default 0.0)
max_similarity Maximum similarity score (0.0–1.0, default 1.0)
top_k Maximum results to return (default 10)
output_type "utterance" (default) or "timestamp"
metadata_path Optional path to the companion _metadata.json
get_transcript_stats¶
read_transcript¶
csv_path Path to the transcript CSV
format_type "utterances" (default), "segments", "words", or "json"
metadata_path Optional path to the companion _metadata.json
format_transcript¶
input_csv Path to the transcript CSV
output_txt Destination path for the formatted script
speakers_json Optional JSON mapping SPEAKER_00 labels to real names
extract_audio¶
Runs speech-mine extract in a subprocess. Can take several minutes for long files.
input_file Path to audio (.wav, .mp3, .ogg, .flac, .m4a, .webm)
output_csv Destination path for the output CSV
hf_token HuggingFace access token (required for pyannote)
model Whisper model size (default: large-v3)
device "auto" (default), "cpu", or "cuda"
compute_type "float16" (default), "int8", or "float32"
num_speakers Exact speaker count if known
min_speakers Minimum speakers (default 1)
max_speakers Maximum speakers
batch_size WhisperX transcription batch size (default 16, reduce if OOM)
language Language code e.g. "en", "fr" (auto-detected if omitted)
chunk_audio¶
audio_file Path to the input .wav file
config_file Path to the YAML config defining chunk boundaries
output_dir Directory to write output chunk files
fade_in Fade-in in milliseconds (default 0)
fade_out Fade-out in milliseconds (default 0)
padding Silence padding in milliseconds (default 0)
YAML config format: