vtube-studio/EXPRESSION_SYSTEM.md

291 lines
11 KiB
Markdown

# Expression System Documentation
## Overview
The new expression system provides **full control over facial expressions** by generating separate sprite assets for each expression type. This allows for smooth, dynamic expression switching based on face tracking data.
## Key Changes
### 1. **Blank Base Character**
The AI now generates a character with a **blank face** (no eyes, no mouth, no eyebrows). This allows us to overlay expression assets without visual conflicts.
### 2. **Expression Grid Layout**
The generated sprite sheet uses a **3-row grid format**:
```
┌─────────────────────────────────────────────────────────┐
│ ROW 1: BASE CHARACTER (full width) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Blank face - no features (hair, head, body only) │ │
│ └─────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────┤
│ ROW 2: EYE EXPRESSIONS (6 variants, equal spacing) │
│ ┌─────┬─────┬─────┬─────┬─────┬─────┐ │
│ │NTRL │HPPY │SRPR │ANGRY│ SAD │BLINK│ │
│ └─────┴─────┴─────┴─────┴─────┴─────┘ │
├─────────────────────────────────────────────────────────┤
│ ROW 3: MOUTH EXPRESSIONS (6 variants, equal spacing) │
│ ┌─────┬─────┬──────┬──────┬──────┬──────┐ │
│ │NTRL │SMILE│TALK │WIDE │FROWN │O-SHP │ │
│ └─────┴─────┴──────┴──────┴──────┴──────┘ │
└─────────────────────────────────────────────────────────┘
```
### 3. **Expression Types**
#### Eye Expressions (6 types)
| Type | Description | Trigger |
|------|-------------|---------|
| `NEUTRAL` | Normal open eyes, relaxed | Default state |
| `HAPPY` | Eyes curved upward, slightly closed | Smile detection (future) |
| `SURPRISED` | Wide open, circular eyes | Mouth open > 70% |
| `ANGRY` | Eyebrows angled down, narrowed | Emotion detection (future) |
| `SAD` | Eyebrows up, eyes droopy downward | Emotion detection (future) |
| `BLINK` | Both eyes fully closed (curves) | Blink detection (active) |
#### Mouth Expressions (6 types)
| Type | Description | Trigger |
|------|-------------|---------|
| `NEUTRAL` | Small closed mouth, straight line | Mouth open < 10% |
| `HAPPY` | Closed mouth curved upward | Smile detection (future) |
| `OPEN_TALK` | Medium open mouth for vowels | Mouth open 10-30% |
| `WIDE_OPEN` | Large open mouth for shouting | Mouth open > 30% |
| `FROWN` | Mouth curved downward | Emotion detection (future) |
| `O_SHAPE` | Small circular open mouth | Phoneme detection (future) |
## File Changes
### `src/shared/types.ts`
```typescript
export enum ExpressionType {
NEUTRAL = 'NEUTRAL',
HAPPY = 'HAPPY',
SURPRISED = 'SURPRISED',
ANGRY = 'ANGRY',
SAD = 'SAD',
BLINK = 'BLINK',
OPEN_TALK = 'OPEN_TALK',
WIDE_OPEN = 'WIDE_OPEN',
FROWN = 'FROWN',
O_SHAPE = 'O_SHAPE',
}
export interface AvatarConfig {
imageUrl: string;
baseFace?: Rect; // Blank face area
eyes?: { // Eye expression rects
[ExpressionType.NEUTRAL]?: Rect;
[ExpressionType.HAPPY]?: Rect;
// ... etc
};
mouth?: { // Mouth expression rects
[ExpressionType.NEUTRAL]?: Rect;
[ExpressionType.HAPPY]?: Rect;
// ... etc
};
riggingReference?: { ... };
activeEyeExpression?: ExpressionType;
activeMouthExpression?: ExpressionType;
}
```
### `src/renderer/services/geminiService.ts`
Updated prompt to generate:
- Blank base character (no facial features)
- 6 eye expressions in row 2
- 6 mouth expressions in row 3
- Consistent sizing for easy extraction
### `src/renderer/components/RiggingEditor.tsx`
Complete redesign:
- **Tab system**: Switch between Eyes and Mouth rigging
- **Expression selector**: Preview individual expressions
- **Color-coded boxes**: Each expression has unique color
- **Base Face box**: Define the blank character area
- **12 expression boxes total**: 6 eyes + 6 mouths
### `src/renderer/components/Studio.tsx`
Dynamic expression rendering:
- `getCurrentEyeExpression()`: Maps tracking data to eye expression
- `getCurrentMouthExpression()`: Maps mouth openness to mouth expression
- Automatic expression switching based on:
- Blink detection → `BLINK`
- Mouth openness → `NEUTRAL` / `OPEN_TALK` / `WIDE_OPEN`
- Surprise detection → `SURPRISED` (when mouth very open)
## Expression Switching Logic
### Current Implementation
```typescript
// Eye expression selection
const getCurrentEyeExpression = (): ExpressionType => {
if (trackingData.isBlinkingLeft || trackingData.isBlinkingRight) {
return ExpressionType.BLINK;
}
if (trackingData.mouthOpen > 0.7) {
return ExpressionType.SURPRISED;
}
return ExpressionType.NEUTRAL; // Default
};
// Mouth expression selection
const getCurrentMouthExpression = (): ExpressionType => {
const mouthOpen = trackingData.mouthOpen;
if (mouthOpen < 0.1) return ExpressionType.NEUTRAL;
if (mouthOpen < 0.3) return ExpressionType.OPEN_TALK;
return ExpressionType.WIDE_OPEN;
};
```
### Expression Flow
```
┌──────────────────┐
│ Face Tracking │
│ Data Input │
└────────┬─────────┘
┌──────────────────┐
│ mouthOpen: 0.05 │──────┐
│ isBlinking: true │ │
└────────┬─────────┘ │
│ │
▼ ▼
┌──────────────────┐ ┌──────────────┐
│ Eye Expression │ │ Mouth │
│ Selector │ │ Expression │
│ │ │ Selector │
│ BLINK (priority) │ │ NEUTRAL │
└────────┬─────────┘ └──────┬───────┘
│ │
└────────┬──────────┘
┌────────────────┐
│ Render Avatar │
│ with selected │
│ expressions │
└────────────────┘
```
## Rigging Workflow
### Step 1: Generate Avatar
```
User enters prompt → AI generates sprite sheet with:
- Row 1: Blank character
- Row 2: 6 eye expressions
- Row 3: 6 mouth expressions
```
### Step 2: Rig Expressions
```
1. Adjust Base Face box (yellow) around blank character
2. Switch to "Eyes" tab
3. For each eye expression:
- Click expression name to highlight
- Drag/resizing box to match asset
4. Switch to "Mouth" tab
5. For each mouth expression:
- Click expression name to highlight
- Drag/resize box to match asset
6. Click "Finish Rigging"
```
### Step 3: Live Animation
```
System automatically switches expressions based on:
- Your blinks → Eye BLINK
- Your mouth opening → Mouth OPEN_TALK / WIDE_OPEN
- Wide mouth → Eye SURPRISED
```
## Future Enhancements
### Planned Features
1. **Manual Expression Override**
- Hotkeys to force specific expressions
- Emotion wheel UI for manual selection
2. **Advanced Triggers**
```typescript
// Future: Audio-based phoneme detection
if (phoneme === 'AH') return ExpressionType.OPEN_TALK;
if (phoneme === 'OO') return ExpressionType.O_SHAPE;
// Future: Eyebrow tracking
if (eyebrowsRaised) return ExpressionType.SURPRISED;
if (eyebrowsFurrowed) return ExpressionType.ANGRY;
```
3. **Expression Blending**
- Smooth transitions between expressions
- Intensity-based blending (e.g., 50% happy + 50% neutral)
4. **Preset Management**
- Save expression configurations
- Share rigging presets between avatars
5. **More Expressions**
- Additional eye variants (wink, heart eyes, etc.)
- Mouth shapes for specific phonemes
- Eyebrow-only expressions layer
## Testing Tips
### During Rigging
1. **Zoom in** on sprite sheet for precise box placement
2. **Use consistent sizes** for similar expression types
3. **Test all expressions** by clicking through them before finishing
4. **Check the cyan face reference guide** - it should encompass the face area
### During Studio Use
1. **Wait for calibration** (1 second after camera starts)
2. **Good lighting** improves expression detection
3. **Center your face** in camera for best results
4. **Exaggerate expressions** initially to test range
## Troubleshooting
| Issue | Solution |
|-------|----------|
| Expressions don't align | Re-rig with more precise box placement |
| Blinking not detected | Increase camera lighting, face camera directly |
| Mouth stuck open | Check mouthOpen threshold in Studio.tsx |
| Wrong expression showing | Verify riggingReference calculation in RiggingEditor |
| Expressions too small/large | Ensure all expression assets are same size in sprite sheet |
## Code Architecture
```
src/
├── shared/types.ts # ExpressionType enum, AvatarConfig interface
├── renderer/
│ ├── services/
│ │ └── geminiService.ts # AI prompt for expression generation
│ ├── components/
│ │ ├── AvatarCreator.tsx # Generate/upload avatar
│ │ ├── RiggingEditor.tsx # Rig all expressions
│ │ └── Studio.tsx # Dynamic expression switching
│ └── hooks/
│ └── useFaceTracking.ts # Provides trackingData for triggers
```
## Summary
The new expression system provides:
-**Full control** over all facial features
-**Dynamic switching** based on face tracking
-**Modular design** - easy to add new expressions
-**Clean separation** - blank base + overlay expressions
-**Future-proof** - ready for audio/emotion integration
This is a **major improvement** over the previous 2-expression system (just blink/talk) and enables professional-quality VTuber animations.