attempting to fix rigging and asking AI to generate the images slightly differently

This commit is contained in:
itsamejms 2026-06-07 16:54:46 +01:00
parent 2fc6f8c476
commit b120219b4f
10 changed files with 2847 additions and 252 deletions

290
EXPRESSION_SYSTEM.md Normal file
View File

@ -0,0 +1,290 @@
# Expression System Documentation
## Overview
The new expression system provides **full control over facial expressions** by generating separate sprite assets for each expression type. This allows for smooth, dynamic expression switching based on face tracking data.
## Key Changes
### 1. **Blank Base Character**
The AI now generates a character with a **blank face** (no eyes, no mouth, no eyebrows). This allows us to overlay expression assets without visual conflicts.
### 2. **Expression Grid Layout**
The generated sprite sheet uses a **3-row grid format**:
```
┌─────────────────────────────────────────────────────────┐
│ ROW 1: BASE CHARACTER (full width) │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Blank face - no features (hair, head, body only) │ │
│ └─────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────┤
│ ROW 2: EYE EXPRESSIONS (6 variants, equal spacing) │
│ ┌─────┬─────┬─────┬─────┬─────┬─────┐ │
│ │NTRL │HPPY │SRPR │ANGRY│ SAD │BLINK│ │
│ └─────┴─────┴─────┴─────┴─────┴─────┘ │
├─────────────────────────────────────────────────────────┤
│ ROW 3: MOUTH EXPRESSIONS (6 variants, equal spacing) │
│ ┌─────┬─────┬──────┬──────┬──────┬──────┐ │
│ │NTRL │SMILE│TALK │WIDE │FROWN │O-SHP │ │
│ └─────┴─────┴──────┴──────┴──────┴──────┘ │
└─────────────────────────────────────────────────────────┘
```
### 3. **Expression Types**
#### Eye Expressions (6 types)
| Type | Description | Trigger |
|------|-------------|---------|
| `NEUTRAL` | Normal open eyes, relaxed | Default state |
| `HAPPY` | Eyes curved upward, slightly closed | Smile detection (future) |
| `SURPRISED` | Wide open, circular eyes | Mouth open > 70% |
| `ANGRY` | Eyebrows angled down, narrowed | Emotion detection (future) |
| `SAD` | Eyebrows up, eyes droopy downward | Emotion detection (future) |
| `BLINK` | Both eyes fully closed (curves) | Blink detection (active) |
#### Mouth Expressions (6 types)
| Type | Description | Trigger |
|------|-------------|---------|
| `NEUTRAL` | Small closed mouth, straight line | Mouth open < 10% |
| `HAPPY` | Closed mouth curved upward | Smile detection (future) |
| `OPEN_TALK` | Medium open mouth for vowels | Mouth open 10-30% |
| `WIDE_OPEN` | Large open mouth for shouting | Mouth open > 30% |
| `FROWN` | Mouth curved downward | Emotion detection (future) |
| `O_SHAPE` | Small circular open mouth | Phoneme detection (future) |
## File Changes
### `src/shared/types.ts`
```typescript
export enum ExpressionType {
NEUTRAL = 'NEUTRAL',
HAPPY = 'HAPPY',
SURPRISED = 'SURPRISED',
ANGRY = 'ANGRY',
SAD = 'SAD',
BLINK = 'BLINK',
OPEN_TALK = 'OPEN_TALK',
WIDE_OPEN = 'WIDE_OPEN',
FROWN = 'FROWN',
O_SHAPE = 'O_SHAPE',
}
export interface AvatarConfig {
imageUrl: string;
baseFace?: Rect; // Blank face area
eyes?: { // Eye expression rects
[ExpressionType.NEUTRAL]?: Rect;
[ExpressionType.HAPPY]?: Rect;
// ... etc
};
mouth?: { // Mouth expression rects
[ExpressionType.NEUTRAL]?: Rect;
[ExpressionType.HAPPY]?: Rect;
// ... etc
};
riggingReference?: { ... };
activeEyeExpression?: ExpressionType;
activeMouthExpression?: ExpressionType;
}
```
### `src/renderer/services/geminiService.ts`
Updated prompt to generate:
- Blank base character (no facial features)
- 6 eye expressions in row 2
- 6 mouth expressions in row 3
- Consistent sizing for easy extraction
### `src/renderer/components/RiggingEditor.tsx`
Complete redesign:
- **Tab system**: Switch between Eyes and Mouth rigging
- **Expression selector**: Preview individual expressions
- **Color-coded boxes**: Each expression has unique color
- **Base Face box**: Define the blank character area
- **12 expression boxes total**: 6 eyes + 6 mouths
### `src/renderer/components/Studio.tsx`
Dynamic expression rendering:
- `getCurrentEyeExpression()`: Maps tracking data to eye expression
- `getCurrentMouthExpression()`: Maps mouth openness to mouth expression
- Automatic expression switching based on:
- Blink detection → `BLINK`
- Mouth openness → `NEUTRAL` / `OPEN_TALK` / `WIDE_OPEN`
- Surprise detection → `SURPRISED` (when mouth very open)
## Expression Switching Logic
### Current Implementation
```typescript
// Eye expression selection
const getCurrentEyeExpression = (): ExpressionType => {
if (trackingData.isBlinkingLeft || trackingData.isBlinkingRight) {
return ExpressionType.BLINK;
}
if (trackingData.mouthOpen > 0.7) {
return ExpressionType.SURPRISED;
}
return ExpressionType.NEUTRAL; // Default
};
// Mouth expression selection
const getCurrentMouthExpression = (): ExpressionType => {
const mouthOpen = trackingData.mouthOpen;
if (mouthOpen < 0.1) return ExpressionType.NEUTRAL;
if (mouthOpen < 0.3) return ExpressionType.OPEN_TALK;
return ExpressionType.WIDE_OPEN;
};
```
### Expression Flow
```
┌──────────────────┐
│ Face Tracking │
│ Data Input │
└────────┬─────────┘
┌──────────────────┐
│ mouthOpen: 0.05 │──────┐
│ isBlinking: true │ │
└────────┬─────────┘ │
│ │
▼ ▼
┌──────────────────┐ ┌──────────────┐
│ Eye Expression │ │ Mouth │
│ Selector │ │ Expression │
│ │ │ Selector │
│ BLINK (priority) │ │ NEUTRAL │
└────────┬─────────┘ └──────┬───────┘
│ │
└────────┬──────────┘
┌────────────────┐
│ Render Avatar │
│ with selected │
│ expressions │
└────────────────┘
```
## Rigging Workflow
### Step 1: Generate Avatar
```
User enters prompt → AI generates sprite sheet with:
- Row 1: Blank character
- Row 2: 6 eye expressions
- Row 3: 6 mouth expressions
```
### Step 2: Rig Expressions
```
1. Adjust Base Face box (yellow) around blank character
2. Switch to "Eyes" tab
3. For each eye expression:
- Click expression name to highlight
- Drag/resizing box to match asset
4. Switch to "Mouth" tab
5. For each mouth expression:
- Click expression name to highlight
- Drag/resize box to match asset
6. Click "Finish Rigging"
```
### Step 3: Live Animation
```
System automatically switches expressions based on:
- Your blinks → Eye BLINK
- Your mouth opening → Mouth OPEN_TALK / WIDE_OPEN
- Wide mouth → Eye SURPRISED
```
## Future Enhancements
### Planned Features
1. **Manual Expression Override**
- Hotkeys to force specific expressions
- Emotion wheel UI for manual selection
2. **Advanced Triggers**
```typescript
// Future: Audio-based phoneme detection
if (phoneme === 'AH') return ExpressionType.OPEN_TALK;
if (phoneme === 'OO') return ExpressionType.O_SHAPE;
// Future: Eyebrow tracking
if (eyebrowsRaised) return ExpressionType.SURPRISED;
if (eyebrowsFurrowed) return ExpressionType.ANGRY;
```
3. **Expression Blending**
- Smooth transitions between expressions
- Intensity-based blending (e.g., 50% happy + 50% neutral)
4. **Preset Management**
- Save expression configurations
- Share rigging presets between avatars
5. **More Expressions**
- Additional eye variants (wink, heart eyes, etc.)
- Mouth shapes for specific phonemes
- Eyebrow-only expressions layer
## Testing Tips
### During Rigging
1. **Zoom in** on sprite sheet for precise box placement
2. **Use consistent sizes** for similar expression types
3. **Test all expressions** by clicking through them before finishing
4. **Check the cyan face reference guide** - it should encompass the face area
### During Studio Use
1. **Wait for calibration** (1 second after camera starts)
2. **Good lighting** improves expression detection
3. **Center your face** in camera for best results
4. **Exaggerate expressions** initially to test range
## Troubleshooting
| Issue | Solution |
|-------|----------|
| Expressions don't align | Re-rig with more precise box placement |
| Blinking not detected | Increase camera lighting, face camera directly |
| Mouth stuck open | Check mouthOpen threshold in Studio.tsx |
| Wrong expression showing | Verify riggingReference calculation in RiggingEditor |
| Expressions too small/large | Ensure all expression assets are same size in sprite sheet |
## Code Architecture
```
src/
├── shared/types.ts # ExpressionType enum, AvatarConfig interface
├── renderer/
│ ├── services/
│ │ └── geminiService.ts # AI prompt for expression generation
│ ├── components/
│ │ ├── AvatarCreator.tsx # Generate/upload avatar
│ │ ├── RiggingEditor.tsx # Rig all expressions
│ │ └── Studio.tsx # Dynamic expression switching
│ └── hooks/
│ └── useFaceTracking.ts # Provides trackingData for triggers
```
## Summary
The new expression system provides:
- ✅ **Full control** over all facial features
- ✅ **Dynamic switching** based on face tracking
- ✅ **Modular design** - easy to add new expressions
- ✅ **Clean separation** - blank base + overlay expressions
- ✅ **Future-proof** - ready for audio/emotion integration
This is a **major improvement** over the previous 2-expression system (just blink/talk) and enables professional-quality VTuber animations.

167
RIGGING_IMPROVEMENTS.md Normal file
View File

@ -0,0 +1,167 @@
# Rigging System Improvements
## Problem
The original rigging system had a **huge mess** in coordinate mapping between:
- Avatar image coordinates (from rigging editor)
- MediaPipe face tracking coordinates (from webcam)
This caused avatar features to not align properly with the user's face movements.
## Solution Overview
### 1. **Face Reference System** (`src/shared/types.ts`)
Added `riggingReference` to `AvatarConfig`:
```typescript
riggingReference?: {
faceCenter: { x: number; y: number }; // Center point between eyes
faceWidth: number; // Normalized width of face at eye level
faceHeight: number; // Normalized height from brow to chin
};
```
### 2. **Rigging Editor Calculations** (`src/renderer/components/RiggingEditor.tsx`)
The editor now calculates face reference points when rigging is complete:
```typescript
const calculateRiggingReference = () => {
// Face center is midpoint between eyes
const faceCenterX = (leftEye.x + leftEye.w / 2 + rightEye.x + rightEye.w / 2) / 2;
const faceCenterY = (leftEye.y + leftEye.h / 2 + rightEye.y + rightEye.h / 2) / 2;
// Face width is distance between eye centers (normalized)
const faceWidth = Math.abs(rightEyeCenter - leftEyeCenter) * 2.5;
// Face height from brow to chin
const faceHeight = chinY - browY;
return { faceCenter, faceWidth, faceHeight };
};
```
**Visual Guide**: A cyan dashed box shows the calculated "Face Reference Area" during rigging.
### 3. **Auto-Calibration** (`src/renderer/components/Studio.tsx`)
On first face detection, the system:
1. Waits 1 second for stable tracking
2. Stores initial face position as `calibrationOffset`
3. All subsequent movements are **relative** to this offset
```typescript
const relX = trackingData.translationX - calibrationOffset.x;
const relY = trackingData.translationY - calibrationOffset.y;
```
### 4. **Feature Position Mapping** (`src/renderer/components/Studio.tsx`)
Features are now positioned relative to the face center:
```typescript
const calculateFeaturePosition = (featureRect: Rect, featureType: 'eye' | 'mouth') => {
const { faceCenter, faceWidth, faceHeight } = avatar.riggingReference;
// Calculate feature position relative to face center in rigging space
const relX = featureCenterX - faceCenter.x;
const relY = featureCenterY - faceCenter.y;
// Scale relative positions by face width/height to match tracking scale
const scaledX = relX * faceWidth * avatarPosition.scale;
const scaledY = relY * faceHeight * avatarPosition.scale;
return { x: scaledX, y: scaledY };
};
```
### 5. **Exponential Smoothing** (`src/renderer/hooks/useFaceTracking.ts`)
Added smooth interpolation to prevent jittery movements:
```typescript
const smoothingFactor = 0.15; // Lower = smoother but more lag
const smooth = (current: number, target: number) => {
return current + (target - current) * smoothingFactor;
};
// Apply to all continuous values
const smoothedData = {
rotationX: smooth(prevDataRef.current.rotationX, newData.rotationX),
rotationY: smooth(prevDataRef.current.rotationY, newData.rotationY),
// ... etc
};
```
Also improved blink detection threshold from `0.5` to `0.6` for more reliable blinks.
## Coordinate Flow
```
┌─────────────────────────────────────────────────────────────┐
│ RIGGING PHASE │
│ ┌─────────────────┐ │
│ │ Avatar Image │ User places boxes on: │
│ │ (Normalized) │ - Left/Right Eye (Red/Blue) │
│ │ 0-1 coords │ - Mouth (Green) │
│ └────────┬────────┘ - Main Body (Yellow) │
│ │ │
│ ▼ │
│ Calculate riggingReference: │
│ - faceCenter (between eyes) │
│ - faceWidth (eye distance × 2.5) │
│ - faceHeight (brow to chin) │
└───────────┬─────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ STUDIO PHASE │
│ ┌─────────────────┐ │
│ │ Webcam Feed │ MediaPipe detects: │
│ │ (Real-time) │ - translationX/Y (-1 to 1) │
│ │ │ - rotationX/Y/Z │
│ └────────┬────────┘ - mouthOpen, blink │
│ │ │
│ ▼ │
│ 1. Auto-calibrate (store initial offset) │
│ 2. Calculate relative movement │
│ 3. Apply smoothing (EMA with α=0.15) │
│ 4. Map rigging coords to tracking scale │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Render Avatar │ - Position from tracking │
│ │ (Composited) │ - Features from riggingReference │
│ └─────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
## Key Benefits
| Before | After |
|--------|-------|
| ❌ Fixed positions | ✅ Dynamic face-relative positioning |
| ❌ No calibration | ✅ Auto-calibration on startup |
| ❌ Jittery movement | ✅ Smooth exponential interpolation |
| ❌ No visual feedback | ✅ Face reference guide during rigging |
| ❌ Unreliable blinks | ✅ Improved blink threshold (0.6) |
| ❌ Scale mismatches | ✅ Proper scale mapping via faceWidth/Height |
## Testing Tips
1. **Rigging Phase**:
- Ensure the cyan "Face Reference Area" encompasses the entire face
- Eye boxes should be centered on pupils
- Mouth box should cover the lip area
2. **Studio Phase**:
- Wait for "Calibrating..." indicator to disappear
- Start with face centered in camera
- Move head slowly to test tracking range
## Future Improvements
- [ ] Manual calibration button for re-centering
- [ ] Adjustable smoothing factor (UI slider)
- [ ] Face outline overlay for alignment verification
- [ ] Multiple face support
- [ ] Save/load rigging presets

1869
pnpm-lock.yaml generated Normal file

File diff suppressed because it is too large Load Diff

View File

@ -77,23 +77,25 @@ const App: React.FC = () => {
};
const handleRiggingComplete = (data: {
leftEye: Rect, rightEye: Rect, mouth: Rect, skinColor: string,
textureClosedEye: Rect, textureOpenMouth: Rect,
mainBody: Rect, chromaKeyColor: string
baseFace: Rect;
eyes: { [key: string]: Rect };
mouth: { [key: string]: Rect };
skinColor: string;
riggingReference: { faceCenter: { x: number; y: number }; faceWidth: number; faceHeight: number }
}) => {
if (generatedData) {
setAvatar({
imageUrl: generatedData.url,
name: generatedData.name,
description: '',
leftEye: data.leftEye,
rightEye: data.rightEye,
baseFace: data.baseFace,
eyes: data.eyes,
mouth: data.mouth,
skinColor: data.skinColor,
textureClosedEye: data.textureClosedEye,
textureOpenMouth: data.textureOpenMouth,
mainBody: data.mainBody,
chromaKeyColor: data.chromaKeyColor
chromaKeyColor: 'AI_AUTO',
riggingReference: data.riggingReference,
activeEyeExpression: undefined,
activeMouthExpression: undefined
});
setAppState(AppState.STUDIO);
}

View File

@ -1,9 +1,7 @@
import React, { useState } from 'react';
import { analyzeAvatarImage } from '../services/visionService';
import { stitchAssets, fileToDataUrl } from '../services/imageService';
import { generateAvatarImage } from '../services/geminiService';
import LoadingSpinner from './LoadingSpinner';
import { Rect } from '../../shared/types';
const placeholderGenerate = async (prompt: string) => {
const text = encodeURIComponent((prompt || '').substring(0, 80));
@ -12,8 +10,7 @@ const placeholderGenerate = async (prompt: string) => {
interface AvatarCreatorProps {
onAvatarGenerated: (url: string, name: string, initialData?: {
leftEye?: Rect, rightEye?: Rect, mouth?: Rect, skinColor?: string,
mainBody?: Rect, textureClosedEye?: Rect, textureOpenMouth?: Rect
skinColor?: string
}) => void;
}
@ -22,7 +19,7 @@ const AvatarCreator: React.FC<AvatarCreatorProps> = ({ onAvatarGenerated }) => {
const [prompt, setPrompt] = useState('');
const [name, setName] = useState('');
const [status, setStatus] = useState<'idle' | 'generating' | 'analyzing' | 'stitching'>('idle');
const [status, setStatus] = useState<'idle' | 'generating' | 'stitching'>('idle');
const [error, setError] = useState<string | null>(null);
const [baseFile, setBaseFile] = useState<File | null>(null);
@ -38,14 +35,8 @@ const AvatarCreator: React.FC<AvatarCreatorProps> = ({ onAvatarGenerated }) => {
try {
const imageUrl = await generateAvatarImage(prompt);
setStatus('analyzing');
const analysisData = await analyzeAvatarImage(imageUrl);
if (analysisData) {
onAvatarGenerated(imageUrl, name, analysisData);
} else {
onAvatarGenerated(imageUrl, name);
}
// No automatic analysis - user will rig expressions manually
onAvatarGenerated(imageUrl, name, {});
} catch (err) {
console.error(err);
setError("Failed to generate avatar. Please try again.");
@ -62,58 +53,18 @@ const AvatarCreator: React.FC<AvatarCreatorProps> = ({ onAvatarGenerated }) => {
try {
const baseDataUrl = await fileToDataUrl(baseFile);
const baseAnalysis = await analyzeAvatarImage(baseDataUrl);
let blinkDataUrl, blinkAnalysis;
if (blinkFile) {
blinkDataUrl = await fileToDataUrl(blinkFile);
blinkAnalysis = await analyzeAvatarImage(blinkDataUrl);
// Simple stitch if blink/talk files provided, otherwise just use base
let imageUrl = baseDataUrl;
if (blinkFile || talkFile) {
const blinkDataUrl = blinkFile ? await fileToDataUrl(blinkFile) : undefined;
const talkDataUrl = talkFile ? await fileToDataUrl(talkFile) : undefined;
const result = await stitchAssets(baseDataUrl, blinkDataUrl, talkDataUrl);
imageUrl = result.imageUrl;
}
let talkDataUrl, talkAnalysis;
if (talkFile) {
talkDataUrl = await fileToDataUrl(talkFile);
talkAnalysis = await analyzeAvatarImage(talkDataUrl);
}
const { imageUrl, mainBody, textureClosedEye: stitchBlinkRect, textureOpenMouth: stitchTalkRect } = await stitchAssets(baseDataUrl, blinkDataUrl, talkDataUrl);
const mapRect = (r: Rect, container: Rect) => ({
x: container.x + r.x * container.w,
y: container.y + r.y * container.h,
w: r.w * container.w,
h: r.h * container.h
});
let initialData: any = {
mainBody,
textureClosedEye: stitchBlinkRect,
textureOpenMouth: stitchTalkRect
};
if (baseAnalysis) {
initialData.leftEye = mapRect(baseAnalysis.leftEye, mainBody);
initialData.rightEye = mapRect(baseAnalysis.rightEye, mainBody);
initialData.mouth = mapRect(baseAnalysis.mouth, mainBody);
initialData.skinColor = baseAnalysis.skinColor;
}
if (blinkAnalysis && stitchBlinkRect) {
const be = blinkAnalysis;
const minX = Math.min(be.leftEye.x, be.rightEye.x);
const minY = Math.min(be.leftEye.y, be.rightEye.y);
const maxX = Math.max(be.leftEye.x + be.leftEye.w, be.rightEye.x + be.rightEye.w);
const maxY = Math.max(be.leftEye.y + be.leftEye.h, be.rightEye.y + be.rightEye.h);
const eyesRect = { x: minX, y: minY, w: maxX - minX, h: maxY - minY };
initialData.textureClosedEye = mapRect(eyesRect, stitchBlinkRect);
}
if (talkAnalysis && stitchTalkRect) {
initialData.textureOpenMouth = mapRect(talkAnalysis.mouth, stitchTalkRect);
}
onAvatarGenerated(imageUrl, name, initialData);
onAvatarGenerated(imageUrl, name, {});
} catch (err) {
console.error(err);
setError("Failed to process uploaded images. Please ensure they are valid image files.");
@ -160,8 +111,8 @@ const AvatarCreator: React.FC<AvatarCreatorProps> = ({ onAvatarGenerated }) => {
</h2>
<p className="text-slate-400">
{mode === 'generate'
? 'Describe your dream VTuber model. Gemini will generate a character sheet with expression assets.'
: 'Upload your existing character art. We support separate files for blink and talk variants.'
? 'Describe your dream VTuber model. Gemini will generate a character sheet with multiple expression assets.'
: 'Upload your existing character art. Supports separate files for different expressions.'
}
</p>
</div>
@ -187,23 +138,28 @@ const AvatarCreator: React.FC<AvatarCreatorProps> = ({ onAvatarGenerated }) => {
placeholder="e.g., A cyberpunk anime girl with neon blue hair, glowing headphones, wearing a futuristic jacket..."
className="w-full h-32 bg-slate-900/50 border border-slate-600 rounded-xl px-4 py-3 text-white placeholder-slate-500 focus:ring-2 focus:ring-cyan-500 focus:border-transparent transition-all outline-none resize-none"
/>
<p className="text-xs text-slate-500 mt-2">
💡 The AI will generate a sprite sheet with 6 eye expressions and 6 mouth expressions, plus a blank base face.
</p>
</div>
) : (
<div className="space-y-4">
<div className="p-4 bg-slate-900/50 rounded-xl border border-slate-600 border-dashed">
<label className="block text-sm font-bold text-slate-300 mb-2">Base Model (Required)</label>
<input type="file" accept="image/*" onChange={(e) => handleFileChange(e, setBaseFile)} className="text-sm text-slate-400 file:mr-4 file:py-2 file:px-4 file:rounded-full file:border-0 file:text-sm file:font-semibold file:bg-cyan-500/10 file:text-cyan-400 hover:file:bg-cyan-500/20"/>
<p className="text-xs text-slate-500 mt-1">The main look of your character (Eyes Open, Mouth Closed).</p>
<p className="text-xs text-slate-500 mt-1">The main look of your character with blank face (no features).</p>
</div>
<div className="grid grid-cols-1 md:grid-cols-2 gap-4">
<div className="p-4 bg-slate-900/50 rounded-xl border border-slate-600 border-dashed">
<label className="block text-sm font-bold text-slate-300 mb-2">Closed Eyes (Optional)</label>
<label className="block text-sm font-bold text-slate-300 mb-2">Expression Sheet (Optional)</label>
<input type="file" accept="image/*" onChange={(e) => handleFileChange(e, setBlinkFile)} className="text-sm text-slate-400 file:mr-4 file:py-2 file:px-4 file:rounded-full file:border-0 file:text-sm file:font-semibold file:bg-purple-500/10 file:text-purple-400 hover:file:bg-purple-500/20"/>
<p className="text-xs text-slate-500 mt-1">Sprite sheet with all eye/mouth expressions.</p>
</div>
<div className="p-4 bg-slate-900/50 rounded-xl border border-slate-600 border-dashed">
<label className="block text-sm font-bold text-slate-300 mb-2">Open Mouth (Optional)</label>
<label className="block text-sm font-bold text-slate-300 mb-2">Alternate (Optional)</label>
<input type="file" accept="image/*" onChange={(e) => handleFileChange(e, setTalkFile)} className="text-sm text-slate-400 file:mr-4 file:py-2 file:px-4 file:rounded-full file:border-0 file:text-sm file:font-semibold file:bg-pink-500/10 file:text-pink-400 hover:file:bg-pink-500/20"/>
<p className="text-xs text-slate-500 mt-1">Additional expression variants.</p>
</div>
</div>
</div>
@ -228,9 +184,7 @@ const AvatarCreator: React.FC<AvatarCreatorProps> = ({ onAvatarGenerated }) => {
<div className="flex items-center justify-center gap-3">
<LoadingSpinner />
<span>
{status === 'generating' ? 'Dreaming up Sheet...' :
status === 'stitching' ? 'Processing Assets...' :
'Analyzing Features...'}
{status === 'generating' ? 'Dreaming up Character...' : 'Processing Assets...'}
</span>
</div>
) : (

View File

@ -1,17 +1,41 @@
import React, { useState, useRef, useEffect } from 'react';
import { Rect } from '../../shared/types';
import { Rect, ExpressionType } from '../../shared/types';
interface RiggingEditorProps {
imageUrl: string;
initialData?: { leftEye: Rect; rightEye: Rect; mouth: Rect; skinColor: string };
initialData?: {
baseFace?: Rect;
eyes?: { [key: string]: Rect };
mouth?: { [key: string]: Rect };
skinColor?: string
};
onComplete: (data: {
leftEye: Rect; rightEye: Rect; mouth: Rect; skinColor: string;
textureClosedEye: Rect; textureOpenMouth: Rect;
mainBody: Rect; chromaKeyColor: string;
baseFace: Rect;
eyes: { [key: string]: Rect };
mouth: { [key: string]: Rect };
skinColor: string;
riggingReference: { faceCenter: { x: number; y: number }; faceWidth: number; faceHeight: number };
}) => void;
}
type ActiveFeature = 'leftEye' | 'rightEye' | 'mouth' | 'textureClosedEye' | 'textureOpenMouth' | 'mainBody' | null;
type ActiveFeature =
| 'baseFace'
| `eye-${ExpressionType}`
| `mouth-${ExpressionType}`
| null;
const EXPRESSION_LABELS: Record<string, string> = {
[ExpressionType.NEUTRAL]: 'Neutral',
[ExpressionType.HAPPY]: 'Happy',
[ExpressionType.SURPRISED]: 'Surprised',
[ExpressionType.ANGRY]: 'Angry',
[ExpressionType.SAD]: 'Sad',
[ExpressionType.BLINK]: 'Blink',
[ExpressionType.OPEN_TALK]: 'Talk',
[ExpressionType.WIDE_OPEN]: 'Wide',
[ExpressionType.FROWN]: 'Frown',
[ExpressionType.O_SHAPE]: 'O-Shape',
};
const ResizableBox: React.FC<{
rect: Rect;
@ -20,7 +44,8 @@ const ResizableBox: React.FC<{
isActive: boolean;
onUpdate: (rect: Rect) => void;
onActivate: () => void;
}> = ({ rect, color, label, isActive, onUpdate, onActivate }) => {
visible?: boolean;
}> = ({ rect, color, label, isActive, onUpdate, onActivate, visible = true }) => {
const boxRef = useRef<HTMLDivElement>(null);
const [isDragging, setIsDragging] = useState(false);
const [isResizing, setIsResizing] = useState(false);
@ -87,6 +112,8 @@ const ResizableBox: React.FC<{
};
}, [isDragging, isResizing, rect, onUpdate]);
if (!visible) return null;
return (
<div
ref={boxRef}
@ -118,28 +145,86 @@ const ResizableBox: React.FC<{
};
const RiggingEditor: React.FC<RiggingEditorProps> = ({ imageUrl, initialData, onComplete }) => {
const [leftEye, setLeftEye] = useState<Rect>(initialData?.leftEye || { x: 0.25, y: 0.4, w: 0.1, h: 0.1 });
const [rightEye, setRightEye] = useState<Rect>(initialData?.rightEye || { x: 0.45, y: 0.4, w: 0.1, h: 0.1 });
const [mouth, setMouth] = useState<Rect>(initialData?.mouth || { x: 0.35, y: 0.55, w: 0.1, h: 0.05 });
// Base face (blank character)
const [baseFace, setBaseFace] = useState<Rect>(initialData?.baseFace || { x: 0.05, y: 0.05, w: 0.65, h: 0.9 });
const [mainBody, setMainBody] = useState<Rect>({ x: 0.05, y: 0.05, w: 0.65, h: 0.9 });
// Eye expressions - initialize with defaults
const defaultEyeRect: Rect = { x: 0.7, y: 0.05, w: 0.25, h: 0.15 };
const [eyes, setEyes] = useState<{ [key: string]: Rect }>({
[ExpressionType.NEUTRAL]: initialData?.eyes?.[ExpressionType.NEUTRAL] || { ...defaultEyeRect, y: 0.05 },
[ExpressionType.HAPPY]: initialData?.eyes?.[ExpressionType.HAPPY] || { ...defaultEyeRect, y: 0.22 },
[ExpressionType.SURPRISED]: initialData?.eyes?.[ExpressionType.SURPRISED] || { ...defaultEyeRect, y: 0.39 },
[ExpressionType.ANGRY]: initialData?.eyes?.[ExpressionType.ANGRY] || { ...defaultEyeRect, y: 0.56 },
[ExpressionType.SAD]: initialData?.eyes?.[ExpressionType.SAD] || { ...defaultEyeRect, y: 0.73 },
[ExpressionType.BLINK]: initialData?.eyes?.[ExpressionType.BLINK] || { ...defaultEyeRect, y: 0.90 },
});
const [textureClosedEye, setTextureClosedEye] = useState<Rect>({ x: 0.7, y: 0.1, w: 0.2, h: 0.2 });
const [textureOpenMouth, setTextureOpenMouth] = useState<Rect>({ x: 0.7, y: 0.5, w: 0.2, h: 0.2 });
// Mouth expressions - initialize with defaults
const defaultMouthRect: Rect = { x: 0.7, y: 0.05, w: 0.25, h: 0.15 };
const [mouth, setMouth] = useState<{ [key: string]: Rect }>({
[ExpressionType.NEUTRAL]: initialData?.mouth?.[ExpressionType.NEUTRAL] || { ...defaultMouthRect, y: 0.05 },
[ExpressionType.HAPPY]: initialData?.mouth?.[ExpressionType.HAPPY] || { ...defaultMouthRect, y: 0.22 },
[ExpressionType.OPEN_TALK]: initialData?.mouth?.[ExpressionType.OPEN_TALK] || { ...defaultMouthRect, y: 0.39 },
[ExpressionType.WIDE_OPEN]: initialData?.mouth?.[ExpressionType.WIDE_OPEN] || { ...defaultMouthRect, y: 0.56 },
[ExpressionType.FROWN]: initialData?.mouth?.[ExpressionType.FROWN] || { ...defaultMouthRect, y: 0.73 },
[ExpressionType.O_SHAPE]: initialData?.mouth?.[ExpressionType.O_SHAPE] || { ...defaultMouthRect, y: 0.90 },
});
const [skinColor, setSkinColor] = useState<string>(initialData?.skinColor || '#fcd3bf');
const [useAiBackground, setUseAiBackground] = useState<boolean>(true);
const [activeFeature, setActiveFeature] = useState<ActiveFeature>(null);
const [activeTab, setActiveTab] = useState<'eyes' | 'mouth'>('eyes');
const [visibleExpression, setVisibleExpression] = useState<ExpressionType>(ExpressionType.NEUTRAL);
// Calculate face reference points for mapping rigging to tracking
const calculateRiggingReference = () => {
// Use the neutral eye expression as reference
const neutralEyes = eyes[ExpressionType.NEUTRAL];
if (!neutralEyes) {
return {
faceCenter: { x: 0.5, y: 0.5 },
faceWidth: 0.3,
faceHeight: 0.4
};
}
// Assume eyes rect contains both eyes side by side
const faceCenterX = neutralEyes.x + neutralEyes.w / 2;
const faceCenterY = neutralEyes.y + neutralEyes.h / 2;
// Face width is approximately 2.5x the eye width
const faceWidth = neutralEyes.w * 2.5;
// Face height from brow to chin
const faceHeight = neutralEyes.h * 3.5;
return {
faceCenter: { x: faceCenterX, y: faceCenterY },
faceWidth,
faceHeight
};
};
const updateEye = (type: ExpressionType, rect: Rect) => {
setEyes(prev => ({ ...prev, [type]: rect }));
};
const updateMouth = (type: ExpressionType, rect: Rect) => {
setMouth(prev => ({ ...prev, [type]: rect }));
};
const getExpressionColor = (type: ExpressionType, index: number) => {
const colors = ['#ef4444', '#3b82f6', '#22c55e', '#f59e0b', '#8b5cf6', '#ec4899'];
return colors[index % colors.length];
};
return (
<div className="flex flex-col items-center h-full max-w-6xl mx-auto p-4">
<div className="text-center mb-6">
<h2 className="text-2xl font-bold text-white mb-2">Rig Your Character</h2>
<p className="text-slate-400 text-sm">
1. Adjust the <b>Main Body</b> (Yellow) to frame your character.<br/>
2. Match the <b>Targets</b> (Red/Blue/Green) to the face features.<br/>
3. Match the <b>Sources</b> (Purple/Orange) to the assets on the right.
1. Adjust the <b>Base Face</b> (Yellow) to frame your blank character.<br/>
2. For each expression, match the colored box to the corresponding asset on the right.<br/>
3. Use tabs to switch between eye and mouth expressions.
</p>
</div>
@ -154,60 +239,95 @@ const RiggingEditor: React.FC<RiggingEditorProps> = ({ imageUrl, initialData, on
/>
<div className="absolute inset-0 w-full h-full">
{/* Base Face */}
<ResizableBox
rect={mainBody} color="#facc15" label="Main Body"
isActive={activeFeature === 'mainBody'}
onUpdate={setMainBody} onActivate={() => setActiveFeature('mainBody')}
rect={baseFace}
color="#facc15"
label="Base Face (Blank)"
isActive={activeFeature === 'baseFace'}
onUpdate={setBaseFace}
onActivate={() => setActiveFeature('baseFace')}
/>
{/* Eye expressions - only show active one on main image for clarity */}
{Object.entries(eyes).map(([type, rect], index) => (
<ResizableBox
rect={leftEye} color="#ef4444" label="Left Eye Target"
isActive={activeFeature === 'leftEye'}
onUpdate={setLeftEye} onActivate={() => setActiveFeature('leftEye')}
/>
<ResizableBox
rect={rightEye} color="#3b82f6" label="Right Eye Target"
isActive={activeFeature === 'rightEye'}
onUpdate={setRightEye} onActivate={() => setActiveFeature('rightEye')}
/>
<ResizableBox
rect={mouth} color="#22c55e" label="Mouth Target"
isActive={activeFeature === 'mouth'}
onUpdate={setMouth} onActivate={() => setActiveFeature('mouth')}
key={`eye-${type}`}
rect={rect}
color={getExpressionColor(type as ExpressionType, index)}
label={`Eye: ${EXPRESSION_LABELS[type]}`}
isActive={activeFeature === `eye-${type}`}
onUpdate={(r) => updateEye(type as ExpressionType, r)}
onActivate={() => setActiveFeature(`eye-${type}`)}
visible={activeTab === 'eyes'}
/>
))}
{/* Mouth expressions */}
{Object.entries(mouth).map(([type, rect], index) => (
<ResizableBox
rect={textureClosedEye} color="#a855f7" label="Source: Closed Eyes"
isActive={activeFeature === 'textureClosedEye'}
onUpdate={setTextureClosedEye} onActivate={() => setActiveFeature('textureClosedEye')}
/>
<ResizableBox
rect={textureOpenMouth} color="#f97316" label="Source: Open Mouth"
isActive={activeFeature === 'textureOpenMouth'}
onUpdate={setTextureOpenMouth} onActivate={() => setActiveFeature('textureOpenMouth')}
key={`mouth-${type}`}
rect={rect}
color={getExpressionColor(type as ExpressionType, index)}
label={`Mouth: ${EXPRESSION_LABELS[type]}`}
isActive={activeFeature === `mouth-${type}`}
onUpdate={(r) => updateMouth(type as ExpressionType, r)}
onActivate={() => setActiveFeature(`mouth-${type}`)}
visible={activeTab === 'mouth'}
/>
))}
</div>
</div>
</div>
<div className="w-72 flex flex-col gap-4 bg-slate-800/50 p-6 rounded-xl border border-slate-700 h-full overflow-y-auto">
<div className="w-80 flex flex-col gap-4 bg-slate-800/50 p-6 rounded-xl border border-slate-700 h-full overflow-y-auto">
{/* Expression type tabs */}
<div className="flex gap-2">
<button
onClick={() => setActiveTab('eyes')}
className={`flex-1 py-2 rounded-lg font-bold text-sm transition-colors ${
activeTab === 'eyes'
? 'bg-cyan-500 text-white'
: 'bg-slate-700 text-slate-300 hover:bg-slate-600'
}`}
>
👁 Eyes
</button>
<button
onClick={() => setActiveTab('mouth')}
className={`flex-1 py-2 rounded-lg font-bold text-sm transition-colors ${
activeTab === 'mouth'
? 'bg-pink-500 text-white'
: 'bg-slate-700 text-slate-300 hover:bg-slate-600'
}`}
>
👄 Mouth
</button>
</div>
{/* Expression selector */}
<div className="space-y-2">
<div className="text-xs font-bold text-slate-400 uppercase border-b border-slate-700 pb-1">
{activeTab === 'eyes' ? 'Eye' : 'Mouth'} Expressions
</div>
{(activeTab === 'eyes' ? Object.keys(eyes) : Object.keys(mouth)).map((type) => (
<button
key={type}
onClick={() => setVisibleExpression(type as ExpressionType)}
className={`w-full text-left px-3 py-2 rounded-lg text-sm transition-colors ${
visibleExpression === type
? 'bg-slate-600 text-white border-l-4 border-cyan-400'
: 'text-slate-400 hover:bg-slate-700 hover:text-white'
}`}
>
{EXPRESSION_LABELS[type]}
</button>
))}
</div>
{/* Color picker */}
<div className="bg-slate-900/50 p-4 rounded-lg space-y-3">
<div>
<label className="block text-xs font-bold text-slate-400 mb-2 uppercase">Background Removal</label>
<div className="flex items-center justify-between p-2 bg-slate-800 rounded-lg border border-slate-700">
<span className="text-xs text-slate-300">AI Magic Removal</span>
<label className="relative inline-flex items-center cursor-pointer">
<input
type="checkbox"
className="sr-only peer"
checked={useAiBackground}
onChange={(e) => setUseAiBackground(e.target.checked)}
/>
<div className="w-9 h-5 bg-slate-600 peer-focus:outline-none rounded-full peer peer-checked:after:translate-x-full peer-checked:after:border-white after:content-[''] after:absolute after:top-[2px] after:left-[2px] after:bg-white after:border-gray-300 after:border after:rounded-full after:h-4 after:w-4 after:transition-all peer-checked:bg-cyan-500"></div>
</label>
</div>
</div>
<div>
<label className="block text-xs font-bold text-slate-400 mb-1 uppercase">Eyelid Skin Color</label>
<div className="flex items-center gap-3">
@ -222,39 +342,29 @@ const RiggingEditor: React.FC<RiggingEditorProps> = ({ imageUrl, initialData, on
</div>
</div>
<div className="space-y-3 flex-1">
<div className="text-xs font-bold text-slate-400 uppercase border-b border-slate-700 pb-1">Composition</div>
<div className="flex items-center gap-2 text-sm text-slate-300 cursor-pointer hover:text-white" onClick={() => setActiveFeature('mainBody')}>
<div className="w-3 h-3 bg-yellow-400 rounded-full shadow"></div> Main Body Crop
</div>
<div className="text-xs font-bold text-slate-400 uppercase border-b border-slate-700 pb-1 mt-4">Targets (Main Face)</div>
<div className="flex items-center gap-2 text-sm text-slate-300 cursor-pointer hover:text-white" onClick={() => setActiveFeature('leftEye')}>
<div className="w-3 h-3 bg-red-500 rounded-full shadow"></div> Left Eye
</div>
<div className="flex items-center gap-2 text-sm text-slate-300 cursor-pointer hover:text-white" onClick={() => setActiveFeature('rightEye')}>
<div className="w-3 h-3 bg-blue-500 rounded-full shadow"></div> Right Eye
</div>
<div className="flex items-center gap-2 text-sm text-slate-300 cursor-pointer hover:text-white" onClick={() => setActiveFeature('mouth')}>
<div className="w-3 h-3 bg-green-500 rounded-full shadow"></div> Mouth
</div>
<div className="text-xs font-bold text-slate-400 uppercase border-b border-slate-700 pb-1 mt-4">Sources (Right Side)</div>
<div className="flex items-center gap-2 text-sm text-slate-300 cursor-pointer hover:text-white" onClick={() => setActiveFeature('textureClosedEye')}>
<div className="w-3 h-3 bg-purple-500 rounded-full shadow"></div> Closed Eye Texture
</div>
<div className="flex items-center gap-2 text-sm text-slate-300 cursor-pointer hover:text-white" onClick={() => setActiveFeature('textureOpenMouth')}>
<div className="w-3 h-3 bg-orange-500 rounded-full shadow"></div> Open Mouth Texture
</div>
{/* Instructions */}
<div className="bg-slate-900/30 p-4 rounded-lg">
<div className="text-xs font-bold text-slate-400 mb-2">TIPS:</div>
<ul className="text-xs text-slate-500 space-y-1">
<li> Click expression name to preview</li>
<li> Drag boxes to position</li>
<li> Use corner handle to resize</li>
<li> Match boxes to expression assets on right side of sprite sheet</li>
</ul>
</div>
<div className="mt-4">
<button
onClick={() => onComplete({
leftEye, rightEye, mouth, skinColor,
textureClosedEye, textureOpenMouth, mainBody,
chromaKeyColor: useAiBackground ? 'AI_AUTO' : ''
})}
onClick={() => {
const riggingReference = calculateRiggingReference();
onComplete({
baseFace,
eyes,
mouth,
skinColor,
riggingReference
});
}}
className="w-full py-4 bg-gradient-to-r from-cyan-500 to-blue-600 hover:from-cyan-400 hover:to-blue-500 text-white rounded-xl font-bold shadow-lg shadow-cyan-500/25 transform hover:scale-[1.02] transition-all"
>
Finish Rigging

View File

@ -1,7 +1,7 @@
import React, { useEffect, useRef, useState } from 'react';
import { useFaceTracking } from '../hooks/useFaceTracking';
import { removeBackground } from '../services/visionService';
import { AvatarConfig, Rect } from '../../shared/types';
import { AvatarConfig, Rect, ExpressionType } from '../../shared/types';
import LoadingSpinner from './LoadingSpinner';
interface StudioProps {
@ -45,9 +45,46 @@ const Studio: React.FC<StudioProps> = ({ avatar, onBack }) => {
const videoRef = useRef<HTMLVideoElement>(null);
const [cameraReady, setCameraReady] = useState(false);
const [processedImageUrl, setProcessedImageUrl] = useState<string | null>(null);
const containerRef = useRef<HTMLDivElement>(null);
const [avatarPosition, setAvatarPosition] = useState({ x: 0, y: 0, scale: 1 });
const [calibrated, setCalibrated] = useState(false);
const [calibrationOffset, setCalibrationOffset] = useState({ x: 0, y: 0 });
const { trackingData, isLoading: isModelLoading, startTracking } = useFaceTracking(videoRef.current);
// Determine current expression based on tracking data
const getCurrentEyeExpression = (): ExpressionType => {
if (trackingData.isBlinkingLeft || trackingData.isBlinkingRight) {
return ExpressionType.BLINK;
}
// Map mouth openness to eye expression intensity
if (trackingData.mouthOpen > 0.7) {
return ExpressionType.SURPRISED;
}
// Default to neutral - could be enhanced with emotion detection
return ExpressionType.NEUTRAL;
};
const getCurrentMouthExpression = (): ExpressionType => {
const mouthOpen = trackingData.mouthOpen;
if (mouthOpen < 0.1) {
// Mouth closed - choose based on context (could add more logic)
return ExpressionType.NEUTRAL;
} else if (mouthOpen < 0.3) {
return ExpressionType.OPEN_TALK;
} else if (mouthOpen < 0.6) {
return ExpressionType.WIDE_OPEN;
} else {
return ExpressionType.WIDE_OPEN;
}
};
const currentEyeExpression = getCurrentEyeExpression();
const currentMouthExpression = getCurrentMouthExpression();
useEffect(() => {
const startCamera = async () => {
try {
@ -99,31 +136,98 @@ const Studio: React.FC<StudioProps> = ({ avatar, onBack }) => {
}
}, [cameraReady, isModelLoading, startTracking]);
// Auto-calibrate on first face detection
useEffect(() => {
if (!calibrated && cameraReady && !isModelLoading && trackingData) {
// Wait for stable tracking
const timeout = setTimeout(() => {
// Store initial position as calibration offset
setCalibrationOffset({
x: trackingData.translationX,
y: trackingData.translationY
});
setCalibrated(true);
if ((window as any).electronLog) (window as any).electronLog.info('Face tracking calibrated');
}, 1000);
return () => clearTimeout(timeout);
}
}, [calibrated, cameraReady, isModelLoading, trackingData]);
// Calculate avatar position and scale based on face tracking
useEffect(() => {
if (!containerRef.current || !avatar.riggingReference || !calibrated) return;
const container = containerRef.current;
const containerRect = container.getBoundingClientRect();
// Use tracking translation relative to calibration offset
const relX = trackingData.translationX - calibrationOffset.x;
const relY = trackingData.translationY - calibrationOffset.y;
// Scale movement to container size (smoother, smaller movements)
const faceX = relX * containerRect.width * 0.25;
const faceY = relY * containerRect.height * 0.25;
// Scale based on face distance (approximated from face size in tracking)
const baseScale = 1 + relY * 0.15;
setAvatarPosition({
x: faceX,
y: faceY,
scale: baseScale
});
}, [trackingData.translationX, trackingData.translationY, avatar.riggingReference, calibrated, calibrationOffset]);
// Calculate feature position based on rigging reference and tracking
const calculateFeaturePosition = (featureRect: Rect, featureType: 'eye' | 'mouth') => {
if (!avatar.riggingReference || !featureRect) return { x: 0, y: 0 };
const { faceCenter, faceWidth, faceHeight } = avatar.riggingReference;
// Calculate feature position relative to face center in rigging space
const featureCenterX = featureRect.x + featureRect.w / 2;
const featureCenterY = featureRect.y + featureRect.h / 2;
const relX = featureCenterX - faceCenter.x;
const relY = featureCenterY - faceCenter.y;
// Scale relative positions by face width/height to match tracking scale
const scaledX = relX * faceWidth * avatarPosition.scale;
const scaledY = relY * faceHeight * avatarPosition.scale;
return {
x: scaledX,
y: scaledY
};
};
const getAvatarStyle = () => {
const smooth = (val: number) => Math.abs(val) < 0.02 ? 0 : val;
const rX = smooth(trackingData.rotationX);
const rY = smooth(trackingData.rotationY);
const rZ = smooth(trackingData.rotationZ);
const tX = smooth(trackingData.translationX);
const tY = smooth(trackingData.translationY);
const bounce = trackingData.mouthOpen > 0.1 ? -5 * trackingData.mouthOpen : 0;
// Apply rotation transforms
const rotation = `rotate(${rZ * 1}rad) perspective(500px) rotateX(${rX * 15}deg) rotateY(${rY * -25}deg)`;
// Apply translation from avatarPosition (which includes tracking)
const translation = `translate(${avatarPosition.x}px, ${avatarPosition.y}px)`;
// Apply scale
const scale = `scale(${avatarPosition.scale})`;
return {
transform: `
translate(${tX * 150}px, ${tY * 100 + bounce}px)
rotate(${rZ * 1}rad)
perspective(500px)
rotateX(${rX * 15}deg)
rotateY(${rY * -25}deg)
scale(${1 + trackingData.mouthOpen * 0.02})
`,
transform: `${translation} ${rotation} ${scale}`,
filter: `brightness(${1 + trackingData.mouthOpen * 0.05})`,
transition: 'transform 0.1s ease-out, filter 0.1s ease'
transition: 'transform 0.08s ease-out, filter 0.05s ease'
} as React.CSSProperties;
};
// Get current expression rects
const currentEyeRect = avatar.eyes?.[currentEyeExpression];
const currentMouthRect = avatar.mouth?.[currentMouthExpression];
return (
<div className="h-screen w-full flex flex-col bg-slate-900 overflow-hidden relative">
<video
@ -149,6 +253,15 @@ const Studio: React.FC<StudioProps> = ({ avatar, onBack }) => {
<div className="px-3 py-1 rounded-full text-xs font-bold bg-purple-500/20 text-purple-400 border border-purple-500/30">
{avatar.name}
</div>
{!calibrated && cameraReady && (
<div className="px-3 py-1 rounded-full text-xs font-bold bg-cyan-500/20 text-cyan-400 border border-cyan-500/30 animate-pulse">
Calibrating...
</div>
)}
<div className="px-3 py-1 rounded-full text-xs font-bold bg-slate-700/50 text-slate-300 border border-slate-600 flex gap-2">
<span>👁 {currentEyeExpression}</span>
<span>👄 {currentMouthExpression}</span>
</div>
</div>
</div>
@ -162,7 +275,7 @@ const Studio: React.FC<StudioProps> = ({ avatar, onBack }) => {
<div className="absolute inset-0 bg-gradient-to-t from-slate-900 via-transparent to-slate-900 pointer-events-none"></div>
<div className="relative w-[600px] h-[600px] flex items-center justify-center z-10">
<div ref={containerRef} className="relative w-[600px] h-[600px] flex items-center justify-center z-10">
{!processedImageUrl ? (
<div className="flex flex-col items-center justify-center gap-4">
<LoadingSpinner />
@ -173,10 +286,11 @@ const Studio: React.FC<StudioProps> = ({ avatar, onBack }) => {
className="relative w-full h-full flex items-center justify-center"
style={getAvatarStyle()}
>
{avatar.mainBody ? (
{/* Base blank face */}
{avatar.baseFace ? (
<Sprite
imageSrc={processedImageUrl}
sourceRect={avatar.mainBody}
sourceRect={avatar.baseFace}
className="w-full h-full object-contain drop-shadow-[0_0_15px_rgba(168,85,247,0.5)]"
/>
) : (
@ -187,64 +301,59 @@ const Studio: React.FC<StudioProps> = ({ avatar, onBack }) => {
/>
)}
{avatar.leftEye && avatar.textureClosedEye && (
<Sprite
imageSrc={processedImageUrl}
sourceRect={avatar.textureClosedEye}
{/* Current eye expression */}
{currentEyeRect && (
<div
className="absolute pointer-events-none z-20"
style={{
left: `${avatar.leftEye.x * 100}%`,
top: `${avatar.leftEye.y * 100}%`,
width: `${avatar.leftEye.w * 100}%`,
height: `${avatar.leftEye.h * 100}%`,
opacity: trackingData.isBlinkingLeft ? 1 : 0,
left: `calc(50% + ${calculateFeaturePosition(currentEyeRect, 'eye').x}px - ${currentEyeRect.w * 100 / 2}%)`,
top: `calc(50% + ${calculateFeaturePosition(currentEyeRect, 'eye').y}px - ${currentEyeRect.h * 100 / 2}%)`,
width: `${currentEyeRect.w * 100}%`,
height: `${currentEyeRect.h * 100}%`,
}}
>
<Sprite
imageSrc={processedImageUrl}
sourceRect={currentEyeRect}
className="w-full h-full"
style={{
transition: 'opacity 0.05s linear',
}}
/>
</div>
)}
{avatar.rightEye && avatar.textureClosedEye && (
<Sprite
imageSrc={processedImageUrl}
sourceRect={avatar.textureClosedEye}
className="absolute pointer-events-none z-20"
style={{
left: `${avatar.rightEye.x * 100}%`,
top: `${avatar.rightEye.y * 100}%`,
width: `${avatar.rightEye.w * 100}%`,
height: `${avatar.rightEye.h * 100}%`,
opacity: trackingData.isBlinkingRight ? 1 : 0,
transition: 'opacity 0.05s linear',
}}
/>
)}
{avatar.mouth && avatar.textureOpenMouth && (
{/* Current mouth expression */}
{currentMouthRect && (
<div
className="absolute pointer-events-none flex items-center justify-center z-10"
style={{
left: `${avatar.mouth.x * 100}%`,
top: `${avatar.mouth.y * 100}%`,
width: `${avatar.mouth.w * 100}%`,
height: `${avatar.mouth.h * 100}%`,
left: `calc(50% + ${calculateFeaturePosition(currentMouthRect, 'mouth').x}px)`,
top: `calc(50% + ${calculateFeaturePosition(currentMouthRect, 'mouth').y}px)`,
width: `${currentMouthRect.w * 100}%`,
height: `${currentMouthRect.h * 100}%`,
transform: 'translate(-50%, -50%)'
}}
>
{/* Skin-colored backing for open mouth */}
{trackingData.mouthOpen > 0.1 && (
<div
className="absolute w-[120%] h-[120%] transition-opacity duration-75"
style={{
backgroundColor: avatar.skinColor || '#fcd3bf',
opacity: trackingData.mouthOpen > 0.1 ? 1 : 0,
opacity: 1,
filter: 'blur(4px)',
borderRadius: '50%'
}}
/>
)}
<Sprite
imageSrc={processedImageUrl}
sourceRect={avatar.textureOpenMouth}
sourceRect={currentMouthRect}
className="w-full h-full"
style={{
opacity: trackingData.mouthOpen > 0.05 ? 1 : 0,
opacity: trackingData.mouthOpen > 0.05 ? 1 : 0.3,
transform: `scaleY(${0.8 + trackingData.mouthOpen * 0.5})`,
}}
/>
@ -291,6 +400,14 @@ const Studio: React.FC<StudioProps> = ({ avatar, onBack }) => {
<div className={`w-8 h-2 rounded-full ${trackingData.isBlinkingRight ? 'bg-pink-500' : 'bg-slate-700'}`}></div>
</div>
</div>
<div className="flex flex-col items-center">
<span className="text-xs text-slate-400 mb-1 font-mono">EXPRESSION</span>
<div className="flex gap-2 text-xs">
<span className="px-2 py-1 bg-cyan-500/20 text-cyan-400 rounded">{currentEyeExpression}</span>
<span className="px-2 py-1 bg-pink-500/20 text-pink-400 rounded">{currentMouthExpression}</span>
</div>
</div>
</div>
</div>
);

View File

@ -8,6 +8,15 @@ export const useFaceTracking = (videoElement: HTMLVideoElement | null) => {
const faceLandmarkerRef = useRef<FaceLandmarker | null>(null);
const requestRef = useRef<number | null>(null);
const lastVideoTimeRef = useRef<number>(-1);
// Smoothing refs for exponential moving average
const smoothingFactor = 0.15; // Lower = smoother but more lag
const prevDataRef = useRef<TrackingData>({
rotationX: 0, rotationY: 0, rotationZ: 0,
translationX: 0, translationY: 0, mouthOpen: 0,
isBlinkingLeft: false, isBlinkingRight: false
});
const [trackingData, setTrackingData] = useState<TrackingData>({
rotationX: 0,
rotationY: 0,
@ -94,16 +103,36 @@ export const useFaceTracking = (videoElement: HTMLVideoElement | null) => {
const transX = (nose.x - 0.5) * 2;
const transY = (nose.y - 0.5) * 2;
setTrackingData({
// Apply exponential smoothing to continuous values
const smooth = (current: number, target: number) => {
return current + (target - current) * smoothingFactor;
};
const newData: TrackingData = {
rotationZ: roll,
rotationY: yaw,
rotationX: pitch,
translationX: transX,
translationY: transY,
mouthOpen,
isBlinkingLeft: eyeBlinkLeft > 0.5,
isBlinkingRight: eyeBlinkRight > 0.5
});
isBlinkingLeft: eyeBlinkLeft > 0.6, // Higher threshold for more reliable blink detection
isBlinkingRight: eyeBlinkRight > 0.6
};
// Smooth the data
const smoothedData = {
rotationX: smooth(prevDataRef.current.rotationX, newData.rotationX),
rotationY: smooth(prevDataRef.current.rotationY, newData.rotationY),
rotationZ: smooth(prevDataRef.current.rotationZ, newData.rotationZ),
translationX: smooth(prevDataRef.current.translationX, newData.translationX),
translationY: smooth(prevDataRef.current.translationY, newData.translationY),
mouthOpen: smooth(prevDataRef.current.mouthOpen, newData.mouthOpen),
isBlinkingLeft: newData.isBlinkingLeft,
isBlinkingRight: newData.isBlinkingRight
};
prevDataRef.current = newData;
setTrackingData(smoothedData);
}
}
requestRef.current = requestAnimationFrame(predict);

View File

@ -40,18 +40,38 @@ export const generateAvatarImage = async (description: string): Promise<string>
const prompt = `
Create a VTuber character sheet with a flat 2D anime style.
LAYOUT:
1. MAIN CHARACTER (Left side, takes up 70% of width):
- Front-facing view, head and shoulders.
- Neutral expression, eyes open, mouth closed.
IMPORTANT: The base character face should be BLANK - NO eyes, NO mouth, NO eyebrows. Just the face outline, hair, and body. We will overlay separate expression assets.
2. EXPRESSION ASSETS (Right side, vertical column):
- Top: The same character's face with EYES CLOSED (for blinking).
- Bottom: The same character's face with MOUTH OPEN (for talking).
LAYOUT (grid format on white or green background):
Character Description: ${description}
ROW 1 - BASE CHARACTER (full width):
- Front-facing view, head and shoulders, BLANK FACE (no facial features)
Style: Vibrant, clean lines, solid white or green background for easy keying.
ROW 2 - EYE EXPRESSIONS (6 variants, equal spacing):
1. NEUTRAL: Normal open eyes, relaxed
2. HAPPY: Eyes curved upward in smile shape, slightly closed
3. SURPRISED: Wide open, circular eyes with visible iris
4. ANGRY: Eyebrows angled down, eyes narrowed with sharp shape
5. SAD: Eyebrows angled up, eyes looking downward, slightly droopy
6. BLINK: Both eyes fully closed (curved lines)
ROW 3 - MOUTH EXPRESSIONS (6 variants, equal spacing):
1. NEUTRAL: Small closed mouth, straight line
2. SMILE: Closed mouth curved upward
3. OPEN TALK: Medium open mouth for vowel sounds
4. WIDE OPEN: Large open mouth for shouting/laughing
5. FROWN: Mouth curved downward
6. O-SHAPE: Small circular open mouth
CHARACTER DESCRIPTION: ${description}
STYLE REQUIREMENTS:
- All expressions should be the SAME SIZE for easy swapping
- Clean lines, solid colors, no shading on expressions
- Expressions should be on transparent or solid color background
- Eyes should include eyebrows as part of the eye asset
- Consistent art style across all variants
- High contrast for easy extraction
`;
const response = await ai.models.generateContent({

View File

@ -1,3 +1,16 @@
export enum ExpressionType {
NEUTRAL = 'NEUTRAL',
HAPPY = 'HAPPY',
SURPRISED = 'SURPRISED',
ANGRY = 'ANGRY',
SAD = 'SAD',
BLINK = 'BLINK',
OPEN_TALK = 'OPEN_TALK',
WIDE_OPEN = 'WIDE_OPEN',
FROWN = 'FROWN',
O_SHAPE = 'O_SHAPE',
}
export enum AppState {
SETUP = 'SETUP',
CREATION = 'CREATION',
@ -16,14 +29,38 @@ export interface AvatarConfig {
imageUrl: string;
name: string;
description: string;
leftEye?: Rect;
rightEye?: Rect;
mouth?: Rect;
// Base face (blank, no features)
baseFace?: Rect;
// Eye expressions - each expression type maps to a rect on the sprite sheet
eyes?: {
[ExpressionType.NEUTRAL]?: Rect;
[ExpressionType.HAPPY]?: Rect;
[ExpressionType.SURPRISED]?: Rect;
[ExpressionType.ANGRY]?: Rect;
[ExpressionType.SAD]?: Rect;
[ExpressionType.BLINK]?: Rect;
};
// Mouth expressions
mouth?: {
[ExpressionType.NEUTRAL]?: Rect;
[ExpressionType.HAPPY]?: Rect;
[ExpressionType.OPEN_TALK]?: Rect;
[ExpressionType.WIDE_OPEN]?: Rect;
[ExpressionType.FROWN]?: Rect;
[ExpressionType.O_SHAPE]?: Rect;
};
skinColor?: string;
textureClosedEye?: Rect;
textureOpenMouth?: Rect;
mainBody?: Rect;
chromaKeyColor?: string;
// Reference points for mapping rigging to face tracking (normalized 0-1)
// These define where features should be relative to the face center
riggingReference?: {
faceCenter: { x: number; y: number }; // Center point between eyes
faceWidth: number; // Normalized width of face at eye level
faceHeight: number; // Normalized height from brow to chin
};
// Currently active expressions (for studio)
activeEyeExpression?: ExpressionType;
activeMouthExpression?: ExpressionType;
}
export interface TrackingData {