vtube-studio/RIGGING_IMPROVEMENTS.md

168 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Rigging System Improvements
## Problem
The original rigging system had a **huge mess** in coordinate mapping between:
- Avatar image coordinates (from rigging editor)
- MediaPipe face tracking coordinates (from webcam)
This caused avatar features to not align properly with the user's face movements.
## Solution Overview
### 1. **Face Reference System** (`src/shared/types.ts`)
Added `riggingReference` to `AvatarConfig`:
```typescript
riggingReference?: {
faceCenter: { x: number; y: number }; // Center point between eyes
faceWidth: number; // Normalized width of face at eye level
faceHeight: number; // Normalized height from brow to chin
};
```
### 2. **Rigging Editor Calculations** (`src/renderer/components/RiggingEditor.tsx`)
The editor now calculates face reference points when rigging is complete:
```typescript
const calculateRiggingReference = () => {
// Face center is midpoint between eyes
const faceCenterX = (leftEye.x + leftEye.w / 2 + rightEye.x + rightEye.w / 2) / 2;
const faceCenterY = (leftEye.y + leftEye.h / 2 + rightEye.y + rightEye.h / 2) / 2;
// Face width is distance between eye centers (normalized)
const faceWidth = Math.abs(rightEyeCenter - leftEyeCenter) * 2.5;
// Face height from brow to chin
const faceHeight = chinY - browY;
return { faceCenter, faceWidth, faceHeight };
};
```
**Visual Guide**: A cyan dashed box shows the calculated "Face Reference Area" during rigging.
### 3. **Auto-Calibration** (`src/renderer/components/Studio.tsx`)
On first face detection, the system:
1. Waits 1 second for stable tracking
2. Stores initial face position as `calibrationOffset`
3. All subsequent movements are **relative** to this offset
```typescript
const relX = trackingData.translationX - calibrationOffset.x;
const relY = trackingData.translationY - calibrationOffset.y;
```
### 4. **Feature Position Mapping** (`src/renderer/components/Studio.tsx`)
Features are now positioned relative to the face center:
```typescript
const calculateFeaturePosition = (featureRect: Rect, featureType: 'eye' | 'mouth') => {
const { faceCenter, faceWidth, faceHeight } = avatar.riggingReference;
// Calculate feature position relative to face center in rigging space
const relX = featureCenterX - faceCenter.x;
const relY = featureCenterY - faceCenter.y;
// Scale relative positions by face width/height to match tracking scale
const scaledX = relX * faceWidth * avatarPosition.scale;
const scaledY = relY * faceHeight * avatarPosition.scale;
return { x: scaledX, y: scaledY };
};
```
### 5. **Exponential Smoothing** (`src/renderer/hooks/useFaceTracking.ts`)
Added smooth interpolation to prevent jittery movements:
```typescript
const smoothingFactor = 0.15; // Lower = smoother but more lag
const smooth = (current: number, target: number) => {
return current + (target - current) * smoothingFactor;
};
// Apply to all continuous values
const smoothedData = {
rotationX: smooth(prevDataRef.current.rotationX, newData.rotationX),
rotationY: smooth(prevDataRef.current.rotationY, newData.rotationY),
// ... etc
};
```
Also improved blink detection threshold from `0.5` to `0.6` for more reliable blinks.
## Coordinate Flow
```
┌─────────────────────────────────────────────────────────────┐
│ RIGGING PHASE │
│ ┌─────────────────┐ │
│ │ Avatar Image │ User places boxes on: │
│ │ (Normalized) │ - Left/Right Eye (Red/Blue) │
│ │ 0-1 coords │ - Mouth (Green) │
│ └────────┬────────┘ - Main Body (Yellow) │
│ │ │
│ ▼ │
│ Calculate riggingReference: │
│ - faceCenter (between eyes) │
│ - faceWidth (eye distance × 2.5) │
│ - faceHeight (brow to chin) │
└───────────┬─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ STUDIO PHASE │
│ ┌─────────────────┐ │
│ │ Webcam Feed │ MediaPipe detects: │
│ │ (Real-time) │ - translationX/Y (-1 to 1) │
│ │ │ - rotationX/Y/Z │
│ └────────┬────────┘ - mouthOpen, blink │
│ │ │
│ ▼ │
│ 1. Auto-calibrate (store initial offset) │
│ 2. Calculate relative movement │
│ 3. Apply smoothing (EMA with α=0.15) │
│ 4. Map rigging coords to tracking scale │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Render Avatar │ - Position from tracking │
│ │ (Composited) │ - Features from riggingReference │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
## Key Benefits
| Before | After |
|--------|-------|
| ❌ Fixed positions | ✅ Dynamic face-relative positioning |
| ❌ No calibration | ✅ Auto-calibration on startup |
| ❌ Jittery movement | ✅ Smooth exponential interpolation |
| ❌ No visual feedback | ✅ Face reference guide during rigging |
| ❌ Unreliable blinks | ✅ Improved blink threshold (0.6) |
| ❌ Scale mismatches | ✅ Proper scale mapping via faceWidth/Height |
## Testing Tips
1. **Rigging Phase**:
- Ensure the cyan "Face Reference Area" encompasses the entire face
- Eye boxes should be centered on pupils
- Mouth box should cover the lip area
2. **Studio Phase**:
- Wait for "Calibrating..." indicator to disappear
- Start with face centered in camera
- Move head slowly to test tracking range
## Future Improvements
- [ ] Manual calibration button for re-centering
- [ ] Adjustable smoothing factor (UI slider)
- [ ] Face outline overlay for alignment verification
- [ ] Multiple face support
- [ ] Save/load rigging presets