The STAR user client is the frontend interface to this text to speech relay system. With it, you can connect to any coagulator you know about before synthesizing text into audio that can either be played through speakers or rendered to audio files.
If you are trying to learn how to host a coagulator so that your friends can share voices, then this coagulator quickstart guide on STAR's github will help you do that.
Almost all of the star client's functionality can be accessed via keyboard shortcuts, usually alt+letter.
To run the STAR client, simply click on STAR.exe or the equivalent on other platforms.
The first time the client is executed, you will see a simple screen with a status message informing you that the host is not configured, along with 3 buttons (Run locally, Options and Exit). If you are not part of a group and do not wish to use STAR's networking features, simply click Run Locally and you should be set to go! Otherwise, it is required that a valid configuration file containing at least one remote host exists to use this client, so one pass through the options dialog is needed. In that case you will want to click the options button, then set a valid host address to connect to, "ws://user:password@samtupy.com:7774" withoutt the quotes, for example.
Once you've clicked the Run Locally button or else OK in the options dialog after setting a host, you will be returned to the main STAR client screen if applicable accept that the status message will have now switched to the word connecting. When a connection is successfully established, the true main screen of STAR will appear.
From this point your connection choice is stored in a configuration file, meaning that restarting the STAR client will simply reconnect to the saved host or spin up a local coagulator and providers and connect to that based on your saved preference. You can of course change this preference from the options dialog at any time.
Once you've successfully connected to a remote coagulator, you will be presented with the actual main client interface, which contains access to the bulkk of the programs functionality.
From here, you can:
The controls on this screen are as follows:
You can also press alt+backspace to pause or resume any playing speech from any place in the main screen.
This dialog contains several options you can configure to customize the client, with the only one you are required to alter being host. You can access it at any time by pressing alt+o from the client's main screen. The controls are as follows:
{voice}
will be replaced with the name of the voice being previewed.The main function of this program involves being able to provide sevral lines of speech (all with different voices and parameters) that are then synthesized to wav files for use in dramatized tts audio productions.
To facilitate this, STAR parses the text in the script field based on a simple specification that allows the end user to denote what voice each line of text is spoken with, and what parameters such as rate and pitch should be applied to that line.
A typical line looks like:
Microsoft Sam: Hello, everybody knows me!
First a voice name is provided, then a colon and a space to denote the end of the voice name, and then the text that should be synthesized on that line. Only a partial voice name is required, for example the voice name "david" would resolve to "Microsoft David Desktop English United States."
It is also possible to provide parameters to the voice, for example to make Microsoft sam speak slower one might type:
Microsoft Sam<r=-5>: Aaaawch, that hurts!
The available parameters are r for rate and p for pitch, though the minimum and maximum values or even whether the parameters are supported is left up to each STAR provider/speech engine. Each parameter=value pair should be seperated by space if a line contains more than one of them, for example <rp1 p=5>
to make a voice speak slightly faster and significantly higher in pitch.
If the first non-whitespace character in a line is a semicolon (;), STAR will treat the line as a comment and will not process it.
Whitespace at the beginning of all lines is trimmed during parsing, meaning that indenting parts of your script is possible should you desire.
A common issue involves selecting the appropriate/desired voice based on a similar list of possible voices. For example, one voice might be called "Paul" while another is called "Espeak Paul English US." In this case, you can put a numeric specifier before a voice name to select alternate occurances of that voice. Such a line might look like:
2.david: Hi, I am the second david!
The next great feature supported in these scripts is voice aliases or character names. This allows you to refer to a voice by a shorthand identifier instead of by either a full voice name or a numeric identifier which might change based on what voices are connected. If the first non-whitespace character on a line begins with a verticle bar (|), the line is treated as extra metadata instead of as a speech line. The only currently supported metadata is the definition of a voice alias. For example, consider this script:
|john = Adult Male #8, American English TruVoice
|rs5 = RoboSoft Five
|sam=Microsoft Sam
;scene
John: aaaaaa I'm being attacked by a robot!
rs5: Get ready, for you will die now!
Sam: I'll save you!
rs5: nooooooooooooooaaaaaooooo!
John: Thanks Sam.
The above example shows the usage of comments to denote the characters from the scenes, and shows how by defining the character alias rs5 for example, we can then avoid needing to type RoboSoft Five over and over again in the script which can be a huge speed boost. Any whitespace is trimmed from the aliases so that space between the equals sign is optional, and an alias defined anywhere in your script will effect the entire document E. you can safely place your aliases at the bottom of your script if you like. Aliases can include default rate and pitch parameters, such as |MadMike = Microsoft Mike<p=9 r=3>
for example. If a script line then contains any parameters, that line will override the defaults in the voice alias.
Though they will not be fully explained here as Balabolka provides it's own documentation, it is worth noting that it is possible to use the typical voice tags supported by the Balabolka program, with the exception of the {{Audio=}} tag. For example,
Sam: Hi there, that's cool! {{Voice=mary}} yeah it really is!
Would cause the voice to switch from Sam to mary half way through synthesis.
If you really want to get the {{Audio=}} tag working with the balcony provider, you must download balcon.zip and place the libsamplerate.dll file alongside balcon.exe. Be sure you really want to do this though particularly if you want to share voices with others, as it provides access to any audio file on your system given a path! For example, if the audio tag is enabled, the following line would embed the specified sounds into the generated stream.
Sam: Hi there, {{Audio=C:\Windows\Media\Chord.wav}} that's cool! {{Voice=mary}} yeah it really is! {{Audio=C:\windows\media\notify.wav}}
The final supported feature in the script format allows for selective rendering. Often, it might happen that you might want to simply tweak one or 2 lines, or continuously add new voice clips to your audio production as you are sound designing. While you could just paste only the part of your script in the field you wish to render, this would mess up any counters which would allow you to account for what order the voice clips should play in. Instead, you can wrap groups of lines in < and >
characters to select only the lines contained within to render, like this.
|john = Adult Male #8, American English TruVoice
|rs5 = RoboSoft Five
|sam=Microsoft Sam
;scene
John: aaaaaa I'm being attacked by a robot!
rs5: Get ready, for you will die now!
Sam: I'll save you!
rs5: nooooooooooooooaaaaaooooo!
John: Thanks Sam.
<
Sam: You're welcome, but now as payment you must go publicly proclaim me to be the best tts voice that's ever existed!
John: Are you kidding? Dream on!
>
Here, we've selected only the 2 final lines in the script for rendering, but those lines will maintain the proper file counter. While this is also useful for editing existing lines, it may be somewhat less useful when trying to add lines in the middle of your script as doing so will invalidate the incrementing file/clip counter for any lines after the addition/deletion point. You can have as many blocks of selected lines as you wish. Another useful trick with this is that you can effectively disable render selection blocks without deleting them. Since any nesting levels of < and >
characters are ignored, you can first wrap some lines you wish to render in selection blocks, which will cause only those lines to be rendered. Then however if you place one large selection block around your entire script, now in effect your entire script will be rendered again, with the ability to simply remove the selection tokens from the beginning and end of the document to reenable the previously configured render selection blocks.
The STAR client package comes shipped with a couple of voice providers, making it easy to allow your computer's voices to be used by anybody else using the same coagulator as you are.
In the windows package, these are balcony.exe and sammy.exe. For MacOS voice sharing, currently you will need to download the STAR source code and run the macsay.py provider in your mac's terminal.
Each provider has a --configure command line option. So if you run balcony.exe --configure, for example, a dialog will appear allowing you to:
You can then run balcony.exe or sam.exe standalone and the voices will be shared using the set configuration. It's common to create shortcuts to the providers and place them in the shell:startup location accessible from the run dialog, causing voices to be shared to a list of coagulators on system boot.
This update to STAR contains all changes to the project that have taken place over the last 4+ months, including a slightly better visual UI, more providers, the coagulator web frontend, security/stability and bugfixes.
This is a major update to STAR which includes a complete user client rewrite and consequently the introduction of several useful features.
This is a tool created with the intention of making it possible for small groups of friends to create tts audio skits and dramas with increased colaberation, or even so that somebody can network all of their local voices together with fewer cables and hastles. By no means is this intended to deprive voice creators of income / hurt them in any way / disrespect their terms of service. Sharing access to voices that disallow such distrobution in their license agreements, particularly beyond small groups of friends, goes against the intended use I had in mind for this project and I expressly disclaim any responsibility for such misuse of the program. Please use this tool respectfully!