SillyTavern AI guide

Arztpla · Post by **Arztpla** » Fri Nov 03, 2023 11:40 pm

Updated: 3. nov-23

Foreword
At first, i will only focus on text generation so that you have a working chat ai working.
Later i will update this topic with the instructions for image generation, text to speech and so on (As i learn about them myself).
Even though im a software engineer, i am not advanced with any of these tools yet, and i dont know a lot about them. I will try to help you out as much as i can.

I have written this over a couple of days, and i hate rereading things i've already written. So there are probably quite a few mistakes or unclear instructions. Let me know, and ill fix them.
If you want to get into these tools, i will highly recommend you to read more about it on your own. Some of the tools have a discord or reddit where you can ask questions or get help, just remember to keep it SFW (Safe for work). In most of these places NSFW is not allowed.
SillyTavern have a discord server where you need to be a member for 7 days until you get access to NSFW channels.

If you have never played with any of these tools before, it will probably take you a lot of time to get working. I've tried to be as descriptive as possible. It can be infuriating setting up, but it is worth it in the end. I would guess it would take a technical person roughly 10 hours to setup, and way faster for a power user. It would be impossible for non-technical people.
There should also be a few youtube videoes for those who prefer that.
I found a tool called "AI-Toolbox" which is probably not well known, so you will probably not find any guides on how to use this to install all of the other AI applications.

First, you should probably check your computer. RAM is a heavy must. The AI applications will load your selected model into your RAM so it does not have to load the models from your drive every time it has to generate a response. This will most likely take up 4-12 GB RAM. I have 24 GB RAM, and it is sometimes running at 22-23 GB usage (With both text and image generation running idle)

Also, your graphics card should have 10+ GB VRAM, if any lower you could be constantly running into "out of memory" errors.
I am using a 1080 TI 11GB, and with my current settings i havent run into "out of memory" in a while.

Again, you need to spend time learning everything, use google, youtube etc..

Useful links

Spoiler: show: SillyTavern documentation: https://docs.sillytavern.app/

Github links:
SillyTavern: https://github.com/SillyTavern/SillyTavern
SillyTavern-Extras: https://github.com/SillyTavern/SillyTavern-extras
Obabooga/text-generation-webui: https://github.com/oobabooga/text-generation-webui
stable-diffusion-webui: https://github.com/AUTOMATIC1111/stable-diffusion-webui

Places to find AI Models:
Civit AI: https://civitai.com/models
Hugging Face: https://huggingface.co/models

Premade SillyTavern characters
ChubAI: https://www.chub.ai/

Terminology

Spoiler: show

Explainers

Spoiler: show

AI model

Spoiler: show: There are different types of AI models, they all have their own purpose.
The types we often hear about is "Text Generation" and "Image Generation"
ChatGPT is running a "Text Generation" AI

The AI "Predicts" the output based on the model.
For text generation this means that, the AI will try to predict the next sequence in a given text.
If the model was trained on Harry Potter books, the AI would try to predict the next sequence in a Harry Potter style.
If the model was trained on erotic novels, the AI would output erotic fashioned text.

For image generation the AI will try to predict what an image will look like for the given prompt/text.
If the image generation model was trained on cartoon images, the output image would be a cartoon image.
If the image generation model was trained on porn pictures it would output porn style images.

One way you could this of models "The model is the AI's style"

Also, gotta mention. If the model was not trained on erotica or nudity, the AI would not be able to output that kind of content. Because the AI does not know what a concept is, then it cant correct its output accordingly.
For instance if the Image generation model was never exposed to images of boobs then it would not know how to output an image for your prompt that includes "large boobs"
The same goes for text generation model, if it was never exposed to the word "dildo" it would not know what that means.

Last thing, your AI is not learning or training over time as you use it.
It has to be specifically trained on your use of it if you want it to learn.
The model is a checkpoint in its training.
Think of this, as if you were to talk to your 10 year younger self, they would not know what you learning for the past 10 years they would need to go through all the same experiences you had to get that knowledge.

Summary of the different tools/applications

Spoiler: show: AI-Toolbox: will automatically install all requirements for the specific AI applications.
Oobabooga/Text-generation-webui: hosts the text generation AI
Stable-Diffusion-webui: hosts the image generation AI
SillyTavern: Brings everything together, it manages your characters, communicates with the AI hosts and controls text to speech. Basically a hub that controls all the other applications.
SillyTavern-Extras: is a link between SillyTavern and extensions for stuff the developers did not want to be a part of the main application, or community extensions.

Installation

Spoiler: show

We will use AI-Toolbox which is a community made tool for installing various ai applications.

At first we will install the required applications to get running. This only includes SillyTavern and text generation for now.
First we need AI-Toolbox which we will use to install the rest of the applications.

AI-toolbox

Spoiler: show: URL: https://github.com/deffcolony/ai-toolbox/tree/main
1. Click the link above this text
2. Click the dropdown button with the text "<> Code"
3. Click "Download ZIP" to download the AI-Toolbox

Now that you have downloaded AI-Toolbox, you gotta extract the zip file into a folder. This folder will be your installation location for all of the different AI applications.
Keep in mind that all of the AI applications together can easily take up over 100GB. And also which drive you use, the speed of the drive has a huge effect on the applications. The AI's will need to read the models which sometimes take up 10GB.
4. Extract the ZIP file to your preffered installation location
5. Now you should have AI-Toolbox installed and be ready to install the AI applications.
I will now refer to this installation location as "AI-Toolbox root", for me the AI-Toolbox root is "E:\ai-toolbox-main" and i will use my own paths as examples for describing which folder i am talking about.

SillyTavern & Textgen/oobabooga/text-generation-webui

Spoiler: show

Navigate to the SillyTavern folder inside of your AI-Toolbox root. "E:\ai-toolbox-main\sillytavern"
This folder includes 5 files, an icon and two install scripts for both Windows and Linux.
.bat for windows
.sh for linux

1. Run the st-install file for your system (st-install.bat for windows)
It will open a console window and ask what you want to install.
You will need to type a number and press enter (just like with TeaseAI), to confirm your choice.

It will throw an error saying something like "Windows cannot find vs_buildtools.exe"
Just click OK to this error. I made a bug report about this: https://github.com/deffcolony/ai-toolbox/issues/2
I am not sure what it is for, if you run into any problems with running SillyTavern or its features this could be potential cause for your problem.
In short, there is something that is not being installed, and i dont know if it is needed.

It will install all the required tools to run SillyTavern and SillyTavern itself.
You can select SillyTavern + Extras if you later want to use image generation and text to speech.

After the console closes or writes "Complete" everything should have created two folders ("SillyTavern" & "SillyTavern-extras") inside the "E:\ai-toolbox-main\sillytavern".

You can run the "Start.bat" inside SillyTavern folder: "E:\ai-toolbox-main\sillytavern\SillyTavern\Start.bat"
SillyTavern does not "need" the extras to be running for the application to work.
Take a look at SillyTavern's documentation: https://docs.sillytavern.app/
Anything mentioned under the "Extras" category is gonna require the extras application to be running. The extras application is located in "E:\ai-toolbox-main\sillytavern\SillyTavern-extras"

After you have started SillyTavern it should have opened a webpage that you are hosting locally in the console window you opened. If the webpage is not opened automatically you can open it yourself in a new tab "http://localhost:8000/". It will also print the web address in the console window.
"127.0.0.1" is the same as "localhost"
Take a look around and click on everything. I will explain enough to get it running for you, but there are a lot of small settings you can adjust.

Now we will install the Text Generation AI application.
1. Run the textgen-launcher.bat file located in the Oobabooga folder "E:\ai-toolbox-main\oobabooga\textgen-launcher.bat"
2. Select the option to "Install textgen"

While waiting for text-generation-webui to install you can look further down and look for "Downloading SillyTavern Characters". It will walk you through downloading a character for SillyTavern.

When the installation is finished and you have decided if you want a shortcut for text-generation-webui the console window will navigate back to its "main menu", we will keep this console window open for now and come back to it later.

As it is starting you can download a text generation model. I recommend starting with this one: https://huggingface.co/Undi95/Xwin-MLew ... 4_k_m.gguf
Open the link and press on the "download" button to download the model file. You want to save that file to: "E:\ai-toolbox-main\oobabooga\text-generation-webui\models". All your text generation models should be downloaded into this folder.

One last thing we need to configure before starting text-generation-webui
In the text-generation-webui root folder: E:\ai-toolbox-main\oobabooga\text-generation-webui
We are going to edit the file called "CMD_FLAGS.txt": E:\ai-toolbox-main\oobabooga\text-generation-webui\CMD_FLAGS.txt
In this file if a line starts with "#" then that line is marked as a comment.
Make a new line at the bottom and add "--api" to this line. (Remember to save the file)

: oobabooga_cmd_flags_example.PNG (9.71 KiB) Viewed 13672 times

Now head back to the console we left open before (Not the one hosting SillyTavern, but the one with the main menu)
Select option "2" to run text-generation-webui
It will open a new console window that is hosting text-generation-webui.

If it fail at some point and dont continue, just close the console window and rerun textgen with the launcher script: E:\ai-toolbox-main\oobabooga\textgen-launcher.bat
It failed for me, but after rerunning it, it sucessfully started.

After sucessfully starting it should look like:

: oobabooga_startup_example.PNG (26.54 KiB) Viewed 13672 times

Notice it says "Running on local URL: http://0.0.0.0:7910"
It might say "localhost" "127.0.0.1" but all these just references that it is running on your own computer.
The numbers after ":" is the port that the web page is running on. They might be different for you.

Open the textgen web page with your browser.
First head to the "Model" tab at the top.

1. Then open the dropdown menu at the top. Select the Xwin-MLewd model you downloaded before (This is where we select which AI model we want to use).

2. Make sure that where it says "Model Loader" the option "llama.cpp" is selected.

3. You will need to set the options in the small window with "n-gpu-layers" and "threads"
You can read about all the different options here: https://github.com/oobabooga/text-gener ... b#llamacpp

Notibly you should set "n-gpu-layers", "threads" & "threads_batch" according to the link i mentioned above.
Here is a screenshot of the settings i have used.

: oobabooga_model_settings_example.PNG (96.29 KiB) Viewed 13672 times

What is cores?

Spoiler: show

It references your CPU cores. I use a Ryzen R7 1800x, it has 8 cores.
The virtual cores is generally double the amount of physical cores.
You can easily see how many cores you have.
If you press ctrl+alt+delete and click on the "task manager".
Head to the "Performance" tab and select "CPU"
Then it mentions how many cores and virtual cores you have. I have outlined them in the following screenshot.
"Cores" is your CPU's physical cores, "Logical processors" is your CPU's virtual cores.

: task_manager.PNG (42.33 KiB) Viewed 13672 times

After you have set the values accordingly. You can press the button "Save Settings" so the values will be saved.
Finally you will need to press "Load" to load the actual model.

Now we just need to connect SillyTavern to the textgen application.

Back in SillyTavern we can open the connection tab.
Remember to select Text Gen WebUI for the API.
Then you should be able to click connect.
If it is successfully connecting the red dot at the bottom will turn green.

: sillytavern_connect.PNG (59.39 KiB) Viewed 13672 times

The blocking api url and streaming api url, you can double check those urls by taking a look at the textgen console window.
Scroll to the top of the console.
It mentions the urls that you are supposed to use in SillyTavern.

: oobabooga_startup_example.PNG (26.54 KiB) Viewed 13672 times

In the screenshot it mentions these two urls:
Blocking API URL: http://0.0.0.0:5000/api
Streaming API URL: ws://0.0.0.0:5005/api/v1/stream

When you are connected, you can select a character from the character list, you should already have downloaded one.
When a character is selected, it should display a chat window and a opening message in the chat.
You can respond to it, and the AI should take a few seconds to respond.

Downloading SillyTavern Characters

Spoiler: show

Head over to https://www.chub.ai/ and find a character you want to try out, remember to turn on the NSFW at the top to show NSFW characters.

When you are on the character page, you click on the purple button that says "V2" that should download a .png file. I outlined the button on the screenshot.

: aerjhaerh.PNG (427.91 KiB) Viewed 13672 times

Now that you have downloaded the character png you can import this into SillyTavern.
Open the character list window in SillyTavern and click the import character button. I outlined the buttons in the screenshot.

: arejrfgtjsnrts.PNG (68.84 KiB) Viewed 13672 times

Now your new character should be ready to use.

The character information, description etc.. is embedded in the .png file. So the .png is not just an image it is the entire character.

valdez7 · Post by **valdez7** » Mon Nov 06, 2023 5:51 pm

I used Silly Tavern a lot over the summer back when you could use Slaude or Sage/Poe as the text generation model. Once those got neutered I pretty much lost interest. How do the local models compare? When I first tried them some months back, I wasn't getting very good results.

rotta · Post by **rotta** » Tue Nov 07, 2023 2:56 pm

Nice guide. I definitely recommend playing around with local text generation. I've been playing around with it now and then and here's my five cents to add to the OP.

The results are very dependant on the model you use and the memory (VRAM) of your video card depends which you can use.

I highly recommend the quantised models since you can fit a larger model to your card with little drawback, kind of like a zipped file. The 13B models are way better than 7B and if you can fit a 7B on your card you should be able to fit a quantised 13B. But this is something you need to experiment with and is probably the most difficult part. Some models just don't work with some character cards, so you might need to edit the character card or try a different model.

Finally, you can offload some of the work to your CPU if you have a weaker GPU. GGUF models generally speaking are meant for offloading and GPTQ models are meant to be loaded completely to your GPU.

Give it a try and have fun.

valdez7 · Post by **valdez7** » Tue Nov 07, 2023 4:40 pm

Thanks for the input! I'm still new to all the 7B/13B quantized non-quantized jargon, so it can be confusing as to where to start. My card is a 3060 (12GB VRAM). It works well enough for image-generation (SD / Comfy UI), so hopefully I can get some halfway decent text-generation going, too!

That's interesting about certain character cards working better/worse with different models. Any guidance as to what to look out for there or what things to change if the results aren't good?

Arztpla · Post by **Arztpla** » Wed Nov 08, 2023 1:25 am

Havent had the best experience running 13B because of my card. I recently found this https://rentry.co/ayumi_erp_rating
Zephyr 7B seems to run well for me, and is somewhat better than Xwin-Lewd i originally recommended

swk727 · Post by **swk727** » Wed Nov 15, 2023 5:26 am

Having trouble getting this to work on my end. I'm following the guide up until you load textgen in the browser. It says it's running on the same local URL as in your post, but it isn't able to actually load anything in the browser at that URL.

Electro · Post by **Electro** » Thu Nov 16, 2023 2:17 am

Arztpla wrote: ↑Wed Nov 08, 2023 1:25 am Havent had the best experience running 13B because of my card. I recently found this https://rentry.co/ayumi_erp_rating
Zephyr 7B seems to run well for me, and is somewhat better than Xwin-Lewd i originally recommended

There's a post in the undi95 huggingface discussion thing that the Xwin-MLewd 7b was discovered to have failed and it essentially only compiled Xwin. I've had a really good experience with the Xwin-MLewd 13B though, even though I need to be patient because I only have 2 gigs of vram and 16gigs ram. undi95 has Toppy 7b, perhaps that does it for you, I found it doesn't always follow directions that I give it via system/assistant prompts, but sometimes it's pretty good, it seems to be common for 7b models to be like this though. I haven't played with it inside Silly Tavern. There are other 7b LLM models if performance is really what you need, that seem to work reasonably such as some of the various Mistral based stuff, Openhermes v2 (I've had bad luck with 2.5) seems to not be censored, or perhaps my initial system and assistant prompts have easily walked past any NSFW censorship that may have been there.

H2SO4Nudes · Post by **H2SO4Nudes** » Wed Nov 29, 2023 5:59 am

Thank you for your post. After a week I got the stuff running on my machine (8 i7 cores, 32GB RAM, GTX 1660 Ti).
The response times for zephyr-7b-alpha.Q5_K_M.gguf are under 30 seconds. I use Silero TTS and Vosk STT to have both hands free

Unfortunately, converting numbers into words doesn't always work. Silero can't say "10" but only "ten". I try to explain this to the character in the example chats.
The talking heads are a funny feature and bring some life into the chat.
With some script modifications in Silero and Vosk i am able to use my native language. Unfortunately, Libre Translate has problems with ambiguous terms. That's why I stuck using Google translate as last online service required.

The definition of user, character, scenario are a science in themselves. Here is an interesting link: https://rentry.org/chai-pygmalion-tips.

If you give the user a list of toys and the character a few chat examples, then the list of toys will be "processed". You probably won't be able to get a sensible JOI at all. I'm currently reading up on programming external plugins.
The idea would be to send certain trigger words to the JOI script, which would then start reciting the text generated by the script via TTS.

sotherbee · Post by **sotherbee** » Fri Dec 01, 2023 6:27 am

Worked, but obnoxiously slow and unstable. One short response in 1-3 minutes. call_connection_lost errors and so on.
Will wait until there will be a better method to use it.

My specs: i5 13400F, RAM 32GB, Geforce RTX 4060.

H2SO4Nudes · Post by **H2SO4Nudes** » Fri Dec 01, 2023 7:40 am

I finished my first attempt at an extension:
https://github.com/H2SO4H2SO4/joi-extension
Maybe someone has use for it.

H2SO4Nudes · Post by **H2SO4Nudes** » Fri Dec 15, 2023 4:15 pm

I now have a Linux computer with a
MSI GeForce RTX 4060 Ti Ventus 2X Black 16G installed.
I can even load mxlewd-l2-20b.Q5_K_M.gguf now.
The response times can be neglected.
If I understand this correctly, only the memory of the graphics card is important.

I have also written a new extension with which you can control your Intiface-compatible vibrator.
https://github.com/H2SO4H2SO4/sin-tiface

What's still missing is a function so that the AI can "see" you.
With Motion (https://github.com/Motion-Project/motion) you can create images whenever you move.
You can then use clip-interrogator (https://github.com/pharmapsychotic/clip-interrogator) to generate a prompt for the image.
What's still missing is a comparison of whether the prompt matches the characters' instructions.
Your dominatrix says: kneel down. Kneeling down triggers Motion and creates a new image. For the new image, clip-interrogator generates a prompt that contains "man kneeling". Your dominatrix can now check whether you are really kneeling.

Electro · Post by **Electro** » Sat Dec 16, 2023 12:06 am

VRAM memory is the biggest part for performance for local LLM because you want to be able to load as many GPU layers as you can (while not cranking it high enough to where it rolls back into system RAM causing it to do a lot of swapping back to normal RAM) so the CPU does less work. There's also a tensor count that's a factor and any layers that aren't offloaded to GPU take up system ram are still handled by the CPU, so that's a factor too. For what it's worth, I'm using a laptop i7 1165g7 CPU with 16gig ram machine and can only offload 4-6 layers depending on the model without overflowing my laptops puny 2 gig GeForce MX350 and crashing, but offloading just that little bit helps a ton for performance. Still slow by comparison to a better GPU that could handle more, but I'm not about to replace a 2 year old machine yet. I didn't buy it with AI in mind, but my next laptop will have as much GPU vram that's available now that AI has taken off. Seems like dedicated laptop GPUs must not have the same price tag for manufacturers as the same specced ones sold by computer shops for desktop machines because looking at the price tags on graphics cards is painful.

H2SO4Nudes · Post by **H2SO4Nudes** » Sat Dec 16, 2023 5:17 am

I've been reading a lot lately to find out what's technically possible.
Laptops always seem to have the GPU as a bottle neck. If the laptop has Thunderbolt, an eGPU can be considered:
https://egpu.io/best-egpu-buyers-guide/
Unfortunately not cheap.
I'm already thinking about running a second graphics card in this way. But I'm not sure whether distributing the layer across 2 cards works in terms of performance.
It's also not clear to me whether using 70B models improves the experience enough to make the additional cost worth it.

In any case, the potential for using AI in our specific case is considerable. I suspect in a few years you will be able to simply install a complete solution on your laptop.

Electro · Post by **Electro** » Sun Dec 17, 2023 1:16 am

I don't know how to feel about 70B, but I can run smaller quants of larger models such as 20B q3_k_m and 23B q3_k_s and it seems to produce great results, or at least good enough for what I'm using it for, but as I'd expect the performance isn't great running one of those but the performance seems similar to larger file size quants for a 13B at q5_k_m, 9-10 gigabyte model sizes.

If I want raw speed it seems like has been with OpenHermes-2 with q4 4.11gb model size and I can toss a third of the model at the GPU. I can only imagine what it would be like to have a 12gb vram or larger throwing these 10gb models at it but I imagine I'd get sub minute for short responses. For what it's worth I have the patience of typing my side of the chat and doing something else while waiting a few minutes for a response from a larger model so the thought of spending over a grand on a new machine seems like a stretch despite the fact that I could afford it, it just doesn't feel like a great use of my money. ..but opinions are exactly that and I'd be a bit more swayed if I was already in the market for a new machine.

I expect in 2-3 years I'll be looking for a new laptop and by then the stretch to a 70B model might not be too far out there and likely running a middle of the road 20-30Bish model might just be fitting for great performance at a grand or so, but then again right now a 4090 16gb vram laptop is $2499, 16gb vram laptop with a 3080Ti is $1599, 12gb 4080 is $1,999. Not sure on the performance difference of the 3080Ti vs a 4090 would be though but I only found one example of the Ti and the non-Ti only has half the VRAM and it makes me think it was a typo on the vram amount. In any case 16gb vram I expect would fully load one of these 10gb models along with any overhead for context and maybe even let a slightly larger quant fit in. ..but rolling to a 70B model even at a q2 quant at a 30gb model size with only half or less in the vram I imagine would be a crawlfest by comparison, but I don't really know because I don't have the equipment to test. A 20B q5_k_m though at 14gb rolling on a 16gb vram likely would be one nice sweet spot. Technology is technology and I'm sure we will have 32gb vram in not too much time, especially since AI is putting the demand on it and I'm sure AMD and maybe someone else will step in with a competitive response to the current Nvidia CUDA setups we have today but I'm happy with the 13B and 20B responses I'm getting for now, despite mediocre performance on my lipstick potato of a laptop.

H2SO4Nudes · Post by **H2SO4Nudes** » Sun Dec 17, 2023 5:29 am

I'll stay on 20B for now. The difference to 7B is clear. So far I haven't experienced any strange endless loops.
If I'm bored at some point, I can do a test with a 70B model and take several minutes for the response times.
You still get an impression of the quality of the answers.

At the moment I'm still trying to set up a somewhat realistic BDSM session. The problem is more that the AI is becoming too creative.
At the moment I'm trying to write an extension that manages several scenes, each of which is dynamically loaded into the Authors Notes.
This allows you to define a little more precisely what the dominatrix has to do.

Example: Hot Wax Torture scene
User has to light candle.
Domina asks if the wax is liquid.
User says yes.
Dominatrix lets the user drip wax onto the left nipple and ask how it feels.
...etc

You can still have a free conversation with the AI. But a certain pattern is followed.
The scenes are managed in Lores and loaded in a random order.
It is problematic to recognize when a scene is finished. The Objective Extension method (https://docs.sillytavern.app/extras/ext ... objective/) somehow doesn't work:
'Pause your roleplay. Determine if this task is completed: [{{task}}]. To do this, examine the most recent messages. Your response must only contain either true or false, and nothing else. Example output: true'

At the moment I let the Dominatrix end every scene with a fixed phrase to signal that a change of scene is necessary.

SillyTavern AI guide

SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Re: SillyTavern AI guide

Who is online