vtube-studio/RIGGING_IMPROVEMENTS.md

7.2 KiB
Raw Permalink Blame History

Rigging System Improvements

Problem

The original rigging system had a huge mess in coordinate mapping between:

  • Avatar image coordinates (from rigging editor)
  • MediaPipe face tracking coordinates (from webcam)

This caused avatar features to not align properly with the user's face movements.

Solution Overview

1. Face Reference System (src/shared/types.ts)

Added riggingReference to AvatarConfig:

riggingReference?: {
  faceCenter: { x: number; y: number }; // Center point between eyes
  faceWidth: number;  // Normalized width of face at eye level
  faceHeight: number; // Normalized height from brow to chin
};

2. Rigging Editor Calculations (src/renderer/components/RiggingEditor.tsx)

The editor now calculates face reference points when rigging is complete:

const calculateRiggingReference = () => {
  // Face center is midpoint between eyes
  const faceCenterX = (leftEye.x + leftEye.w / 2 + rightEye.x + rightEye.w / 2) / 2;
  const faceCenterY = (leftEye.y + leftEye.h / 2 + rightEye.y + rightEye.h / 2) / 2;
  
  // Face width is distance between eye centers (normalized)
  const faceWidth = Math.abs(rightEyeCenter - leftEyeCenter) * 2.5;
  
  // Face height from brow to chin
  const faceHeight = chinY - browY;
  
  return { faceCenter, faceWidth, faceHeight };
};

Visual Guide: A cyan dashed box shows the calculated "Face Reference Area" during rigging.

3. Auto-Calibration (src/renderer/components/Studio.tsx)

On first face detection, the system:

  1. Waits 1 second for stable tracking
  2. Stores initial face position as calibrationOffset
  3. All subsequent movements are relative to this offset
const relX = trackingData.translationX - calibrationOffset.x;
const relY = trackingData.translationY - calibrationOffset.y;

4. Feature Position Mapping (src/renderer/components/Studio.tsx)

Features are now positioned relative to the face center:

const calculateFeaturePosition = (featureRect: Rect, featureType: 'eye' | 'mouth') => {
  const { faceCenter, faceWidth, faceHeight } = avatar.riggingReference;
  
  // Calculate feature position relative to face center in rigging space
  const relX = featureCenterX - faceCenter.x;
  const relY = featureCenterY - faceCenter.y;
  
  // Scale relative positions by face width/height to match tracking scale
  const scaledX = relX * faceWidth * avatarPosition.scale;
  const scaledY = relY * faceHeight * avatarPosition.scale;
  
  return { x: scaledX, y: scaledY };
};

5. Exponential Smoothing (src/renderer/hooks/useFaceTracking.ts)

Added smooth interpolation to prevent jittery movements:

const smoothingFactor = 0.15; // Lower = smoother but more lag

const smooth = (current: number, target: number) => {
  return current + (target - current) * smoothingFactor;
};

// Apply to all continuous values
const smoothedData = {
  rotationX: smooth(prevDataRef.current.rotationX, newData.rotationX),
  rotationY: smooth(prevDataRef.current.rotationY, newData.rotationY),
  // ... etc
};

Also improved blink detection threshold from 0.5 to 0.6 for more reliable blinks.

Coordinate Flow

┌─────────────────────────────────────────────────────────────┐
│  RIGGING PHASE                                              │
│  ┌─────────────────┐                                        │
│  │  Avatar Image   │  User places boxes on:                │
│  │  (Normalized)   │  - Left/Right Eye (Red/Blue)          │
│  │  0-1 coords     │  - Mouth (Green)                      │
│  └────────┬────────┘  - Main Body (Yellow)                 │
│           │                                                 │
│           ▼                                                 │
│  Calculate riggingReference:                                │
│  - faceCenter (between eyes)                                │
│  - faceWidth (eye distance × 2.5)                           │
│  - faceHeight (brow to chin)                                │
└───────────┬─────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────┐
│  STUDIO PHASE                                               │
│  ┌─────────────────┐                                        │
│  │  Webcam Feed    │  MediaPipe detects:                   │
│  │  (Real-time)    │  - translationX/Y (-1 to 1)           │
│  │                 │  - rotationX/Y/Z                      │
│  └────────┬────────┘  - mouthOpen, blink                   │
│           │                                                 │
│           ▼                                                 │
│  1. Auto-calibrate (store initial offset)                   │
│  2. Calculate relative movement                             │
│  3. Apply smoothing (EMA with α=0.15)                       │
│  4. Map rigging coords to tracking scale                    │
│           │                                                 │
│           ▼                                                 │
│  ┌─────────────────┐                                        │
│  │  Render Avatar  │  - Position from tracking             │
│  │  (Composited)   │  - Features from riggingReference     │
│  └─────────────────┘                                        │
└─────────────────────────────────────────────────────────────┘

Key Benefits

Before After
Fixed positions Dynamic face-relative positioning
No calibration Auto-calibration on startup
Jittery movement Smooth exponential interpolation
No visual feedback Face reference guide during rigging
Unreliable blinks Improved blink threshold (0.6)
Scale mismatches Proper scale mapping via faceWidth/Height

Testing Tips

  1. Rigging Phase:

    • Ensure the cyan "Face Reference Area" encompasses the entire face
    • Eye boxes should be centered on pupils
    • Mouth box should cover the lip area
  2. Studio Phase:

    • Wait for "Calibrating..." indicator to disappear
    • Start with face centered in camera
    • Move head slowly to test tracking range

Future Improvements

  • Manual calibration button for re-centering
  • Adjustable smoothing factor (UI slider)
  • Face outline overlay for alignment verification
  • Multiple face support
  • Save/load rigging presets