Usage Examples

Image-to-Image (Default Mode)

  1. Connect a TOP to the first input of StreamDiffusionTD
  2. SD Mode is set to img2img by default
  3. Use Step Sliders to control denoising:
    • Higher values (40-49) = closer to input image
    • Lower values (1-20) = more AI transformation
  4. Modify prompt and adjust settings to refine output

Text-to-Image Mode

Generate images from text without an input image:

  1. Set SD Mode to txt2img in Settings 2
  2. Configure your prompt
  3. Uses seed-based generation

Model Selection

Built-in Options

  • stabilityai/sd-turbo - Best supported, under 8GB VRAM
  • stabilityai/sdxl-turbo - Higher quality, needs ~24GB VRAM for all features
  • prompthero/openjourney-v4 - Alternative style

LoRA Compatibility Warning

LoRA support is experimental/untested. Use v0.2.99 for confirmed LoRA support.

sd-turbo is SD 2.1 based so SD 1.5 LoRAs will NOT work with it.

ModelLoRA Version Needed
sd-turboSD 2.1 LoRAs (rare)
SD 1.5 modelsSD 1.5 LoRAs
SDXL modelsSDXL LoRAs

IP Adapter Workflow

Use reference images to guide generation style:

  1. Enable IP Adapter in Settings 1 (before starting stream)
  2. Connect reference image to IP Adapter Image parameter
  3. Adjust IP Adapter Scale (0.0-1.0) to control influence
  4. Click IP Adapter Update pulse to apply new reference image

FaceID Mode

  1. Enable FaceID toggle in IP Adapter section
  2. Provide face reference image
  3. Generated output will preserve facial features

Requirements:

  • Requires insightface package (installed automatically)
  • Uses buffalo_l model for face detection
  • Only works with SD 1.5 and SDXL models, NOT sd-turbo (SD 2.1 architecture)

Note: IP Adapter must be decided before TensorRT engine build. Cannot toggle after.

StreamV2V / Cached Attention (New in v0.3.1)

Video-to-video temporal consistency is back in v0.3.1 using cached attention maps. This smooths frame-to-frame transitions for video input.

Setup

  1. Go to Settings 2
  2. Enable Cached Attention (Cattenable)
  3. Set Max Frames (how many frames to cache, default 3)
  4. Set Interval (how often the cache updates, default 1)
  5. On Models page, set Acceleration to tensorrt (required)
  6. Start stream

Tips

  • Resolution is locked to the TRT engine build dimensions (e.g., a 512x512 engine only works at 512x512 with V2V)
  • Lower max frames = less VRAM, less temporal smoothing
  • Higher max frames = more VRAM, more consistency between frames
  • Works with img2img mode for best results

FX Processors

Two built-in FX processors ship with the operator. Add them from the FX Processors parameter page. All parameters update live without restarting.

Quick Start

  1. Go to the FX Processors parameter page
  2. Click + to add a processor
  3. Select from the dropdown (feedback_loop or feedback_grade)
  4. Adjust parameters to taste

Common Setup: Infinite Zoom with Color Grading

  • Add feedback_loop with zoom: 1.02, feedback_strength: 0.7
  • Add feedback_grade with strength: 0.5 and adjust brightness/contrast/hue as needed
  • Both run in image_pre, so their effects compound through the feedback loop each frame
  • Small values go a long way since everything accumulates. Try hue_degrees: 2 for slow color cycling

You can also create your own custom processors and drop them into the custom_processors/ folder. See the FX Processors page for full parameter reference and custom processor guide.

ControlNet

Input Routing

BackendControlNet Input
LocalTOP input 2
DaydreamTOP input 1

This switches automatically based on Backend selection.

Setup

  1. Enable ControlNet on the ControlNet page before starting stream
  2. Select ControlNet model matching your base model:
    • For SDXL: xinsir/controlnet-depth-sdxl-1.0, xinsir/controlnet-canny-sdxl-1.0, xinsir/controlnet-tile-sdxl-1.0
  3. Adjust weight to control influence

Preprocessor Options

  • canny - Edge detection
  • depth - Depth estimation (CPU, higher VRAM usage)
  • depth_tensorrt - Depth estimation (GPU accelerated, auto-builds on first use, ~60% faster)
  • hed - Holistically-nested edge detection
  • external - Use pre-processed input
  • feedback - Use previous output as conditioning

Multi-ControlNet (New in v0.3.1)

Dual ControlNet streaming is verified (e.g., depth_tensorrt + canny). To use multiple ControlNets, add additional CN blocks with the + button.

Note: Dual ControlNet on 24GB GPUs runs near the VRAM ceiling (~23.5 GB on a 4090) with reduced FPS (4-9 FPS). Single ControlNet is recommended for most workflows.

Depth TRT Auto-Build

When using depth_tensorrt as a preprocessor, the TRT engine builds automatically the first time (~2 minutes on a 4090). After that, it loads instantly. This gives roughly 60% better FPS and uses a fraction of the VRAM compared to the regular depth preprocessor.

Loading Custom Models

Via HuggingFace ID

  1. Find model on huggingface.co
  2. Copy the path (e.g., stabilityai/sd-turbo)
  3. Paste into “Model Id” parameter
  4. Start stream - downloads automatically on first use

Via Local File

Local .safetensors path support is unverified. Use HuggingFace IDs for reliable model loading.

Working Models List

After successful streaming (200+ frames), models are saved to: StreamDiffusion/streamdiffusionTD/working_models.json

These appear in the “My Models” dropdown.

Cloud Mode (Daydream)

Zero installation workflow:

  1. Set Backend to “Daydream”
  2. Enter API key
  3. Click Start Stream

Cloud Features

  • No GPU required locally
  • Works on Mac
  • Multi-ControlNet support
  • IP Adapter with FaceID

Note: FX Processors, StreamV2V, and custom processors are only available with the Local backend.


VRAM Budget Reference (RTX 4090 / 24 GB)

All values are nvidia-smi (includes Windows baseline ~5 GB).

ConfigVRAMFPSNotes
Base SDXL-turbo TRT (512x512)~18 GB~26Baseline
+ Single ControlNet (canny)~18 GB~15+2.4 GB per CN TRT engine
+ Depth preprocessor (PyTorch)~25 GB~9PyTorch depth adds ~5 GB
+ Depth preprocessor (depth_tensorrt)~18 GB~14.5TRT engine ~52 MB, auto-builds first use
+ Dual CN (depth_tensorrt + canny)~23.5 GB4-9At VRAM ceiling on 4090
+ IPAdapter+4.3 GB~19
+ StreamV2V (cached attention)~13 GB~21Requires peft package