11 KiB
Expression System Documentation
Overview
The new expression system provides full control over facial expressions by generating separate sprite assets for each expression type. This allows for smooth, dynamic expression switching based on face tracking data.
Key Changes
1. Blank Base Character
The AI now generates a character with a blank face (no eyes, no mouth, no eyebrows). This allows us to overlay expression assets without visual conflicts.
2. Expression Grid Layout
The generated sprite sheet uses a 3-row grid format:
┌─────────────────────────────────────────────────────────┐
│ ROW 1: BASE CHARACTER (full width) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Blank face - no features (hair, head, body only) │ │
│ └─────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────┤
│ ROW 2: EYE EXPRESSIONS (6 variants, equal spacing) │
│ ┌─────┬─────┬─────┬─────┬─────┬─────┐ │
│ │NTRL │HPPY │SRPR │ANGRY│ SAD │BLINK│ │
│ └─────┴─────┴─────┴─────┴─────┴─────┘ │
├─────────────────────────────────────────────────────────┤
│ ROW 3: MOUTH EXPRESSIONS (6 variants, equal spacing) │
│ ┌─────┬─────┬──────┬──────┬──────┬──────┐ │
│ │NTRL │SMILE│TALK │WIDE │FROWN │O-SHP │ │
│ └─────┴─────┴──────┴──────┴──────┴──────┘ │
└─────────────────────────────────────────────────────────┘
3. Expression Types
Eye Expressions (6 types)
| Type | Description | Trigger |
|---|---|---|
NEUTRAL |
Normal open eyes, relaxed | Default state |
HAPPY |
Eyes curved upward, slightly closed | Smile detection (future) |
SURPRISED |
Wide open, circular eyes | Mouth open > 70% |
ANGRY |
Eyebrows angled down, narrowed | Emotion detection (future) |
SAD |
Eyebrows up, eyes droopy downward | Emotion detection (future) |
BLINK |
Both eyes fully closed (curves) | Blink detection (active) |
Mouth Expressions (6 types)
| Type | Description | Trigger |
|---|---|---|
NEUTRAL |
Small closed mouth, straight line | Mouth open < 10% |
HAPPY |
Closed mouth curved upward | Smile detection (future) |
OPEN_TALK |
Medium open mouth for vowels | Mouth open 10-30% |
WIDE_OPEN |
Large open mouth for shouting | Mouth open > 30% |
FROWN |
Mouth curved downward | Emotion detection (future) |
O_SHAPE |
Small circular open mouth | Phoneme detection (future) |
File Changes
src/shared/types.ts
export enum ExpressionType {
NEUTRAL = 'NEUTRAL',
HAPPY = 'HAPPY',
SURPRISED = 'SURPRISED',
ANGRY = 'ANGRY',
SAD = 'SAD',
BLINK = 'BLINK',
OPEN_TALK = 'OPEN_TALK',
WIDE_OPEN = 'WIDE_OPEN',
FROWN = 'FROWN',
O_SHAPE = 'O_SHAPE',
}
export interface AvatarConfig {
imageUrl: string;
baseFace?: Rect; // Blank face area
eyes?: { // Eye expression rects
[ExpressionType.NEUTRAL]?: Rect;
[ExpressionType.HAPPY]?: Rect;
// ... etc
};
mouth?: { // Mouth expression rects
[ExpressionType.NEUTRAL]?: Rect;
[ExpressionType.HAPPY]?: Rect;
// ... etc
};
riggingReference?: { ... };
activeEyeExpression?: ExpressionType;
activeMouthExpression?: ExpressionType;
}
src/renderer/services/geminiService.ts
Updated prompt to generate:
- Blank base character (no facial features)
- 6 eye expressions in row 2
- 6 mouth expressions in row 3
- Consistent sizing for easy extraction
src/renderer/components/RiggingEditor.tsx
Complete redesign:
- Tab system: Switch between Eyes and Mouth rigging
- Expression selector: Preview individual expressions
- Color-coded boxes: Each expression has unique color
- Base Face box: Define the blank character area
- 12 expression boxes total: 6 eyes + 6 mouths
src/renderer/components/Studio.tsx
Dynamic expression rendering:
getCurrentEyeExpression(): Maps tracking data to eye expressiongetCurrentMouthExpression(): Maps mouth openness to mouth expression- Automatic expression switching based on:
- Blink detection →
BLINK - Mouth openness →
NEUTRAL/OPEN_TALK/WIDE_OPEN - Surprise detection →
SURPRISED(when mouth very open)
- Blink detection →
Expression Switching Logic
Current Implementation
// Eye expression selection
const getCurrentEyeExpression = (): ExpressionType => {
if (trackingData.isBlinkingLeft || trackingData.isBlinkingRight) {
return ExpressionType.BLINK;
}
if (trackingData.mouthOpen > 0.7) {
return ExpressionType.SURPRISED;
}
return ExpressionType.NEUTRAL; // Default
};
// Mouth expression selection
const getCurrentMouthExpression = (): ExpressionType => {
const mouthOpen = trackingData.mouthOpen;
if (mouthOpen < 0.1) return ExpressionType.NEUTRAL;
if (mouthOpen < 0.3) return ExpressionType.OPEN_TALK;
return ExpressionType.WIDE_OPEN;
};
Expression Flow
┌──────────────────┐
│ Face Tracking │
│ Data Input │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ mouthOpen: 0.05 │──────┐
│ isBlinking: true │ │
└────────┬─────────┘ │
│ │
▼ ▼
┌──────────────────┐ ┌──────────────┐
│ Eye Expression │ │ Mouth │
│ Selector │ │ Expression │
│ │ │ Selector │
│ BLINK (priority) │ │ NEUTRAL │
└────────┬─────────┘ └──────┬───────┘
│ │
└────────┬──────────┘
│
▼
┌────────────────┐
│ Render Avatar │
│ with selected │
│ expressions │
└────────────────┘
Rigging Workflow
Step 1: Generate Avatar
User enters prompt → AI generates sprite sheet with:
- Row 1: Blank character
- Row 2: 6 eye expressions
- Row 3: 6 mouth expressions
Step 2: Rig Expressions
1. Adjust Base Face box (yellow) around blank character
2. Switch to "Eyes" tab
3. For each eye expression:
- Click expression name to highlight
- Drag/resizing box to match asset
4. Switch to "Mouth" tab
5. For each mouth expression:
- Click expression name to highlight
- Drag/resize box to match asset
6. Click "Finish Rigging"
Step 3: Live Animation
System automatically switches expressions based on:
- Your blinks → Eye BLINK
- Your mouth opening → Mouth OPEN_TALK / WIDE_OPEN
- Wide mouth → Eye SURPRISED
Future Enhancements
Planned Features
-
Manual Expression Override
- Hotkeys to force specific expressions
- Emotion wheel UI for manual selection
-
Advanced Triggers
// Future: Audio-based phoneme detection if (phoneme === 'AH') return ExpressionType.OPEN_TALK; if (phoneme === 'OO') return ExpressionType.O_SHAPE; // Future: Eyebrow tracking if (eyebrowsRaised) return ExpressionType.SURPRISED; if (eyebrowsFurrowed) return ExpressionType.ANGRY; -
Expression Blending
- Smooth transitions between expressions
- Intensity-based blending (e.g., 50% happy + 50% neutral)
-
Preset Management
- Save expression configurations
- Share rigging presets between avatars
-
More Expressions
- Additional eye variants (wink, heart eyes, etc.)
- Mouth shapes for specific phonemes
- Eyebrow-only expressions layer
Testing Tips
During Rigging
- Zoom in on sprite sheet for precise box placement
- Use consistent sizes for similar expression types
- Test all expressions by clicking through them before finishing
- Check the cyan face reference guide - it should encompass the face area
During Studio Use
- Wait for calibration (1 second after camera starts)
- Good lighting improves expression detection
- Center your face in camera for best results
- Exaggerate expressions initially to test range
Troubleshooting
| Issue | Solution |
|---|---|
| Expressions don't align | Re-rig with more precise box placement |
| Blinking not detected | Increase camera lighting, face camera directly |
| Mouth stuck open | Check mouthOpen threshold in Studio.tsx |
| Wrong expression showing | Verify riggingReference calculation in RiggingEditor |
| Expressions too small/large | Ensure all expression assets are same size in sprite sheet |
Code Architecture
src/
├── shared/types.ts # ExpressionType enum, AvatarConfig interface
├── renderer/
│ ├── services/
│ │ └── geminiService.ts # AI prompt for expression generation
│ ├── components/
│ │ ├── AvatarCreator.tsx # Generate/upload avatar
│ │ ├── RiggingEditor.tsx # Rig all expressions
│ │ └── Studio.tsx # Dynamic expression switching
│ └── hooks/
│ └── useFaceTracking.ts # Provides trackingData for triggers
Summary
The new expression system provides:
- ✅ Full control over all facial features
- ✅ Dynamic switching based on face tracking
- ✅ Modular design - easy to add new expressions
- ✅ Clean separation - blank base + overlay expressions
- ✅ Future-proof - ready for audio/emotion integration
This is a major improvement over the previous 2-expression system (just blink/talk) and enables professional-quality VTuber animations.