vtube-studio/EXPRESSION_SYSTEM.md

11 KiB

Expression System Documentation

Overview

The new expression system provides full control over facial expressions by generating separate sprite assets for each expression type. This allows for smooth, dynamic expression switching based on face tracking data.

Key Changes

1. Blank Base Character

The AI now generates a character with a blank face (no eyes, no mouth, no eyebrows). This allows us to overlay expression assets without visual conflicts.

2. Expression Grid Layout

The generated sprite sheet uses a 3-row grid format:

┌─────────────────────────────────────────────────────────┐
│ ROW 1: BASE CHARACTER (full width)                      │
│ ┌─────────────────────────────────────────────────────┐ │
│ │  Blank face - no features (hair, head, body only)   │ │
│ └─────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────┤
│ ROW 2: EYE EXPRESSIONS (6 variants, equal spacing)      │
│ ┌─────┬─────┬─────┬─────┬─────┬─────┐                  │
│ │NTRL │HPPY │SRPR │ANGRY│ SAD │BLINK│                  │
│ └─────┴─────┴─────┴─────┴─────┴─────┘                  │
├─────────────────────────────────────────────────────────┤
│ ROW 3: MOUTH EXPRESSIONS (6 variants, equal spacing)    │
│ ┌─────┬─────┬──────┬──────┬──────┬──────┐              │
│ │NTRL │SMILE│TALK  │WIDE  │FROWN │O-SHP │              │
│ └─────┴─────┴──────┴──────┴──────┴──────┘              │
└─────────────────────────────────────────────────────────┘

3. Expression Types

Eye Expressions (6 types)

Type Description Trigger
NEUTRAL Normal open eyes, relaxed Default state
HAPPY Eyes curved upward, slightly closed Smile detection (future)
SURPRISED Wide open, circular eyes Mouth open > 70%
ANGRY Eyebrows angled down, narrowed Emotion detection (future)
SAD Eyebrows up, eyes droopy downward Emotion detection (future)
BLINK Both eyes fully closed (curves) Blink detection (active)

Mouth Expressions (6 types)

Type Description Trigger
NEUTRAL Small closed mouth, straight line Mouth open < 10%
HAPPY Closed mouth curved upward Smile detection (future)
OPEN_TALK Medium open mouth for vowels Mouth open 10-30%
WIDE_OPEN Large open mouth for shouting Mouth open > 30%
FROWN Mouth curved downward Emotion detection (future)
O_SHAPE Small circular open mouth Phoneme detection (future)

File Changes

src/shared/types.ts

export enum ExpressionType {
  NEUTRAL = 'NEUTRAL',
  HAPPY = 'HAPPY',
  SURPRISED = 'SURPRISED',
  ANGRY = 'ANGRY',
  SAD = 'SAD',
  BLINK = 'BLINK',
  OPEN_TALK = 'OPEN_TALK',
  WIDE_OPEN = 'WIDE_OPEN',
  FROWN = 'FROWN',
  O_SHAPE = 'O_SHAPE',
}

export interface AvatarConfig {
  imageUrl: string;
  baseFace?: Rect;           // Blank face area
  eyes?: {                    // Eye expression rects
    [ExpressionType.NEUTRAL]?: Rect;
    [ExpressionType.HAPPY]?: Rect;
    // ... etc
  };
  mouth?: {                   // Mouth expression rects
    [ExpressionType.NEUTRAL]?: Rect;
    [ExpressionType.HAPPY]?: Rect;
    // ... etc
  };
  riggingReference?: { ... };
  activeEyeExpression?: ExpressionType;
  activeMouthExpression?: ExpressionType;
}

src/renderer/services/geminiService.ts

Updated prompt to generate:

  • Blank base character (no facial features)
  • 6 eye expressions in row 2
  • 6 mouth expressions in row 3
  • Consistent sizing for easy extraction

src/renderer/components/RiggingEditor.tsx

Complete redesign:

  • Tab system: Switch between Eyes and Mouth rigging
  • Expression selector: Preview individual expressions
  • Color-coded boxes: Each expression has unique color
  • Base Face box: Define the blank character area
  • 12 expression boxes total: 6 eyes + 6 mouths

src/renderer/components/Studio.tsx

Dynamic expression rendering:

  • getCurrentEyeExpression(): Maps tracking data to eye expression
  • getCurrentMouthExpression(): Maps mouth openness to mouth expression
  • Automatic expression switching based on:
    • Blink detection → BLINK
    • Mouth openness → NEUTRAL / OPEN_TALK / WIDE_OPEN
    • Surprise detection → SURPRISED (when mouth very open)

Expression Switching Logic

Current Implementation

// Eye expression selection
const getCurrentEyeExpression = (): ExpressionType => {
  if (trackingData.isBlinkingLeft || trackingData.isBlinkingRight) {
    return ExpressionType.BLINK;
  }
  
  if (trackingData.mouthOpen > 0.7) {
    return ExpressionType.SURPRISED;
  }
  
  return ExpressionType.NEUTRAL; // Default
};

// Mouth expression selection
const getCurrentMouthExpression = (): ExpressionType => {
  const mouthOpen = trackingData.mouthOpen;
  
  if (mouthOpen < 0.1) return ExpressionType.NEUTRAL;
  if (mouthOpen < 0.3) return ExpressionType.OPEN_TALK;
  return ExpressionType.WIDE_OPEN;
};

Expression Flow

┌──────────────────┐
│ Face Tracking    │
│ Data Input       │
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│ mouthOpen: 0.05  │──────┐
│ isBlinking: true │      │
└────────┬─────────┘      │
         │                │
         ▼                ▼
┌──────────────────┐  ┌──────────────┐
│ Eye Expression   │  │ Mouth        │
│ Selector         │  │ Expression   │
│                  │  │ Selector     │
│ BLINK (priority) │  │ NEUTRAL      │
└────────┬─────────┘  └──────┬───────┘
         │                   │
         └────────┬──────────┘
                  │
                  ▼
         ┌────────────────┐
         │ Render Avatar  │
         │ with selected  │
         │ expressions    │
         └────────────────┘

Rigging Workflow

Step 1: Generate Avatar

User enters prompt → AI generates sprite sheet with:
- Row 1: Blank character
- Row 2: 6 eye expressions
- Row 3: 6 mouth expressions

Step 2: Rig Expressions

1. Adjust Base Face box (yellow) around blank character
2. Switch to "Eyes" tab
3. For each eye expression:
   - Click expression name to highlight
   - Drag/resizing box to match asset
4. Switch to "Mouth" tab
5. For each mouth expression:
   - Click expression name to highlight
   - Drag/resize box to match asset
6. Click "Finish Rigging"

Step 3: Live Animation

System automatically switches expressions based on:
- Your blinks → Eye BLINK
- Your mouth opening → Mouth OPEN_TALK / WIDE_OPEN
- Wide mouth → Eye SURPRISED

Future Enhancements

Planned Features

  1. Manual Expression Override

    • Hotkeys to force specific expressions
    • Emotion wheel UI for manual selection
  2. Advanced Triggers

    // Future: Audio-based phoneme detection
    if (phoneme === 'AH') return ExpressionType.OPEN_TALK;
    if (phoneme === 'OO') return ExpressionType.O_SHAPE;
    
    // Future: Eyebrow tracking
    if (eyebrowsRaised) return ExpressionType.SURPRISED;
    if (eyebrowsFurrowed) return ExpressionType.ANGRY;
    
  3. Expression Blending

    • Smooth transitions between expressions
    • Intensity-based blending (e.g., 50% happy + 50% neutral)
  4. Preset Management

    • Save expression configurations
    • Share rigging presets between avatars
  5. More Expressions

    • Additional eye variants (wink, heart eyes, etc.)
    • Mouth shapes for specific phonemes
    • Eyebrow-only expressions layer

Testing Tips

During Rigging

  1. Zoom in on sprite sheet for precise box placement
  2. Use consistent sizes for similar expression types
  3. Test all expressions by clicking through them before finishing
  4. Check the cyan face reference guide - it should encompass the face area

During Studio Use

  1. Wait for calibration (1 second after camera starts)
  2. Good lighting improves expression detection
  3. Center your face in camera for best results
  4. Exaggerate expressions initially to test range

Troubleshooting

Issue Solution
Expressions don't align Re-rig with more precise box placement
Blinking not detected Increase camera lighting, face camera directly
Mouth stuck open Check mouthOpen threshold in Studio.tsx
Wrong expression showing Verify riggingReference calculation in RiggingEditor
Expressions too small/large Ensure all expression assets are same size in sprite sheet

Code Architecture

src/
├── shared/types.ts              # ExpressionType enum, AvatarConfig interface
├── renderer/
│   ├── services/
│   │   └── geminiService.ts     # AI prompt for expression generation
│   ├── components/
│   │   ├── AvatarCreator.tsx    # Generate/upload avatar
│   │   ├── RiggingEditor.tsx    # Rig all expressions
│   │   └── Studio.tsx           # Dynamic expression switching
│   └── hooks/
│       └── useFaceTracking.ts   # Provides trackingData for triggers

Summary

The new expression system provides:

  • Full control over all facial features
  • Dynamic switching based on face tracking
  • Modular design - easy to add new expressions
  • Clean separation - blank base + overlay expressions
  • Future-proof - ready for audio/emotion integration

This is a major improvement over the previous 2-expression system (just blink/talk) and enables professional-quality VTuber animations.