xVASynth Usage Guide


If sticking with the vanilla game's voice sets, then the options are:



I've never used Haskill, Martin or Uriel so cannot comment on those.

Female Redguards is the worst quality using voice model versions available as of today, avoid whenever possible. If you must use it, set the Pacing to 1.2 or 1.3 for slightly improved results (remember to restore to 1.0 when switching to another model).



The voices typically needing the least tweaking are Male Elves, Male Imperial, Male Argonians/Khajiit. Both Male and Female Nords are fairly reliable too. Female Elves takes quite a bit of effort to get acceptable results.


Whenever possible, use Bespoke HiFi GAN. A few lack this option, so you have to use WaveGlow instead (I have no idea what the difference is between WaveGlow and BigWaveGlow, I asked in the Comments on the Skyrim VASynth page 2 months ago but haven't been answered).



You should get and enable ARPAbet Dictionaries, this will improve pronunciation. Note that these expect American spelling so if you're used to spelling words correctly, you will just have to accept that not everyone on the planet knows how to spell English words the English way. There is a Custom Dictionary so you can always add your own entries with the English spelling if you wish.

There is an "Enable All" button at the bottom, click this as it saves time. If you ever find that a word in a Dictionary doesn't sound right, you can always untick just that word (or correct the ARPAbet and save your change). For words with multiple different pronunciations (such as "to live" and "live on channel 5") then there will be multiple entries in the Dictionary, you just type the one you want (e.g. "live" for one pronunciation, and "live(2)" for another pronunciation). Note that some of the voice models do not support ARPAbet and will pronounce"live(2)" as "livetwo" instead. With "the" I find it's quicker to just type "thee" when I want the different pronunciation, as this is supported by all voice models.



If a word doesn't sound right (mainly for words not in a Dictionary or for a model not supporting ARPAbet) try spelling the sound rather than the true word, it usually gives the result you want.

VASynth ignores dashes, so any hyphenated words will run together in the voice. Replace with a space before generating.

Most voices will reset ALL tweaks you have made to a voice line if you edit the text, so make sure that every word sounds correct before you start tweaking the pitch and energy.

I recommend prefixing each sentence with a single space, this tends to improve the sounding of the first word.

Not all voice models support Energy tweaking. If the circles don't show, that's a voice model not supporting Energy tweaking.

image.png image.png


Gotta say--this ABSOLUTELY beats having to messing around for hours trying to produce something reasonable.

