Tenidus/Higgs-Audio-v2-Gradio-Interface: This is a Gradio/Web interface for Higgs Audio v2. This is more than a basic web interface for Higgs Audio.

Higgs-Audio-v2-Gradio-Interface

This is a Gradio/Web interface for Higgs Audio v2. Chunking for long-form audio generation included, and multi-speaker.

Generate Audio tab Options - New Ability to Upload Text to Speech from .TXT file

Will automatically read and clean the text (removes indents, normalizes spacing, removes improper characters)
Removes leading/trailing whitespace and tabs from each line
Preserves SPEAKER tags and other formatting needed for TTS

This is more than your basic web interface for Higgs as it provides the ability to customize all of the options that Higgs v2 has to offer. This does NOT require a modified installation of Higgs v2, you will simply add a file to the "higgs-audio/examples" directory. Steps will be listed below. Before Text to…

Higgs-Audio-v2-Gradio-Interface

This is a Gradio/Web interface for Higgs Audio v2. Chunking for long-form audio generation included, and multi-speaker.

Generate Audio tab Options - New Ability to Upload Text to Speech from .TXT file

Will automatically read and clean the text (removes indents, normalizes spacing, removes improper characters)
Removes leading/trailing whitespace and tabs from each line
Preserves SPEAKER tags and other formatting needed for TTS

Here is a list of the options that are customizable in the GUI:

model_path - Provides the option to specify the location of custom models.
audio_tokenizer - Provides the option to specify the location of custom tokenizers.
max_new_tokens - Slider for setting custom maximum number of new tokens to generate. Default is 2048
device - Selectable options for loading on Cuda, MPS, CPU or Auto select.
use_static_kv_cache - Enable or Disable option
transcript - Text to be generated into audio. This supports long-form generations!
scene_prompt - The scene description prompt to use for generation. (Optional) Visit: https://github.com/boson-ai/higgs-audio/tree/main/examples for scene prompt details.
ref_audio - When using for custom voice cloning. Ability to select from standard/built-in voices or upload your own custom voice. Uploading a custom voice will place it in the default higgs-audio/examples/voice_prompts folder.

Sampling Parameters:

temperature - Slider for setting custom temperatures
top_k - Slider for setting custom Top K
top_p - Slider for setting custom Top P
ras_win_len - Slider for setting custom RAS Window Length
ras_win_max_num_repeat - Slider for setting maximum number of times to repeat the RAS window
seed - Manually set Seed

Advanced Options:

chunk_method - Selectable options for setting text chunking. Options are "None, Speaker, Word"
chunk_max_word_num - Slider for setting custom maximum number of words for each chunk. Only supported when "Word" chunking is selected
chunk_max_num_turns - Slider for setting custom maximum number of turns for each chunk. Only supported when "Speaker" chunking is selected
generation_chunk_buffer_size - Slider for setting custom maximum number of chunks to keep in the buffer. Reference Audios and ‘max_chunk_bugger’ chunks are always kept

Additional Notes:

Model auto-loads before generation and auto-unloads after to save memory
You can manually initialize the model in "Model Setup" to keep it loaded between generations
Select a voice prompt from the dropdown or choose "None" for random voice
Upload a .txt file to automatically populate the transcript (you can still edit it)
For multi-speaker: Use [SPEAKER0], [SPEAKER1] tags in your transcript
For sound effects: Use tags like [laugh], [music], [applause]
Adjust temperature for more/less variation in speech
Use chunking for very long texts

Installation:

Download the "higgs_audio_gradio.py" Place it in the "higgs-audio/examples" folder (Replace "higgs-audio" directory for the correct directory name if you have a custom location)

Activate your python environment:

If you used Option 2: Using venv for the install:

Navigate to the installation directory
source higgs_audio_env/bin/activate
and run:
pip install gradio

If you used Option 3: Using conda for the install:

conda activate ./conda_env
and run:
pip install gradio

If you used Option 4: Using uv for the install:

Navigate to the installation directory
source .venv/bin/activate
and run:
uv pip install gradio

Running Gradio Interface:

Navigate to "higgs-audio/examples" (or custom folder name)
and run:
python higgs_audio_gradio.py

Higgs-Audio-v2-Gradio-Interface

This is a Gradio/Web interface for Higgs Audio v2. Chunking for long-form audio generation included, and multi-speaker.

Generate Audio tab Options - New Ability to Upload Text to Speech from .TXT file

Higgs-Audio-v2-Gradio-Interface

This is a Gradio/Web interface for Higgs Audio v2. Chunking for long-form audio generation included, and multi-speaker.

Generate Audio tab Options - New Ability to Upload Text to Speech from .TXT file

Here is a list of the options that are customizable in the GUI:

Sampling Parameters:

Advanced Options:

Additional Notes:

Installation:

If you used Option 2: Using venv for the install:

If you used Option 3: Using conda for the install:

If you used Option 4: Using uv for the install:

Screenshots:

Model Setup tab

Generate Audio tab Options - New Ability to Upload Custom Voices

Advanced Options (Under Generate Audio tab)

Similar Posts