π― Ehyra LoRA Training β Z-Image Base
Ehyra is a character developed within the BrierStudios ecosystem, designed for use across visual media (illustrations, animations, avatar generation). This document covers the LoRA training pipeline using the Z-Image Base model.
Overviewβ
LoRA (Low-Rank Adaptation) allows us to fine-tune existing diffusion models with a small, trainable adapter β producing character-consistent output without retraining the entire base model.
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β LoRA Training Pipeline β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β Dataset βββββ Training βββββ LoRA β β
β β Prep β β Config β β Export β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β β β β
β βΌ βΌ βΌ β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β Captions β β Kohya/SD β β Validationβ β
β β & Tags β β Scripts β β & Test β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Prerequisitesβ
| Requirement | Version | Notes |
|---|---|---|
| Python | 3.10+ | Required for training scripts |
| CUDA | 11.8+ | GPU training required |
| VRAM | 12GB+ minimum | 24GB recommended |
| Z-Image Base | v1.5+ | Base model for LoRA |
| Kohya_ss | Latest | Training framework |
| Dataset | 30β100 images | Character reference images |
Step 1: Dataset Preparationβ
Image Collectionβ
Gather 30β100 high-quality reference images of Ehyra. Quality over quantity.
Requirements:
- Minimum resolution: 512Γ512 pixels
- Recommended resolution: 768Γ768 or 1024Γ1024
- Format: PNG (lossless)
- Consistent character design across all images
Image distribution:
- 40% head/bust shots
- 30% full body
- 20% half body
- 10% detail shots (eyes, outfit details, accessories)
Naming Conventionβ
ehyra_<number>_<type>_<variant>.png
Examples:
ehyra_001_head_neutral.png
ehyra_002_full_casual.png
ehyra_003_bust_smiling.png
Captioningβ
Each image needs a corresponding .txt caption file with the same name:
ehyra_001_head_neutral.png
ehyra_001_head_neutral.txt β Caption file
Caption format:
ehyra, <hair description>, <eye description>, <outfit description>, <pose>, <expression>, <background type>, <lighting>
Example caption:
ehyra, long silver hair, cyan tips, cyan eyes, dark techwear jacket, neon trim, forward facing, serious expression, dark background, neon rim lighting, cyberpunk style
Tag Strategyβ
Use a trigger word as the first tag in every caption. This is the word that activates the LoRA during generation.
Trigger word: ehyra
Keep your trigger word unique and short. Avoid common words that might conflict with existing model concepts. "ehyra" works well because it's unique to this character.
Step 2: Training Configurationβ
LoRA Parametersβ
Create a training config file:
[model]
pretrained_model = "Z-Image/Base/v1.5"
trigger_word = "ehyra"
[network]
type = "lora" # LoRA (not LoCon/Full)
dim = 32 # Rank dimension β good balance for characters
alpha = 16 # Network alpha (dim/2 is standard)
dropout = 0.1 # Prevent overfitting
algo = "lora" # Algorithm
[dataset]
image_dir = "./dataset/ehyra"
batch_size = 2
resolution = 768
enable_bucket = true # Aspect ratio bucketing
min_bucket_reso = 384
max_bucket_reso = 1024
[training]
epochs = 15 # 15 epochs for 50-80 images
learning_rate = 1e-4 # 0.0001
lr_scheduler = "cosine"
warmup_steps = 100
gradient_accumulation = 2
mixed_precision = "bf16" # Use fp16 if no bf16 support
save_every_n_epochs = 3
seed = 42
[output]
output_dir = "./output/ehyra-lora"
output_name = "ehyra_v1_zimage"
save_model_as = "safetensors"
Parameter Explanationβ
| Parameter | Value | Why |
|---|---|---|
dim | 32 | Sufficient for character definition. Higher = more detail but risk of overfit. Lower = faster but less flexibility. |
alpha | 16 | Half of dim. Controls learning impact per step. |
dropout | 0.1 | Gentle regularization to prevent memorizing training images. |
epochs | 15 | With 50 images and batch 2, this gives ~375 training steps per epoch. |
learning_rate | 1e-4 | Standard for LoRA. Lower (5e-5) if overfitting, higher (2e-4) if underfitting. |
resolution | 768 | Z-Image Base works well at 768. Use 1024 only if you have the VRAM. |
Step 3: Training Executionβ
# Activate your training environment
conda activate kohya
# Start training
accelerate launch train_network.py \
--config_file ehyra_lora_config.toml
# Or using the Kohya GUI β load ehyra_lora_config.toml
Monitor training:
- Watch the loss curve β it should decrease steadily
- If loss plateaus early, reduce learning rate
- If loss oscillates wildly, reduce learning rate and increase warmup
# View training logs
tensorboard --logdir ./output/ehyra-lora/logs
Step 4: Validation & Testingβ
After training, test the LoRA with various prompts:
Test Promptsβ
# Basic character test
prompt = "ehyra, 1girl, silver hair, cyan eyes, dark techwear, cyberpunk style, dark background"
negative = "lowres, bad anatomy, extra digits, worst quality, low quality, blurry"
# Expression variation
prompt = "ehyra, 1girl, silver hair, cyan eyes, smiling, happy expression, neon city background"
# Action pose
prompt = "ehyra, 1girl, silver hair, cyan eyes, dynamic pose, fighting stance, neon glow"
# Casual scene
prompt = "ehyra, 1girl, silver hair, cyan eyes, sitting, relaxed, casual outfit, indoor scene"
Quality Checklistβ
- Character is recognizable as "ehyra" with LoRA weight 0.7β1.0
- Silver hair with cyan tips renders consistently
- Cyan eyes are visible and correctly colored
- Techwear outfit appears without explicit prompting
- No artifacts in the background when using low LoRA weights
- Character can switch expressions (happy, serious, surprised)
- Multiple poses work without breaking character details
- No bleeding of training image compositions
Step 5: Export & Integrationβ
Export Formatβ
# The output will be in safetensors format
ls ./output/ehyra-lora/
# ehyra_v1_zimage.safetensors
Integration with Z-Image Baseβ
In your generation pipeline:
from diffusers import StableDiffusionPipeline
# Load base model
pipe = StableDiffusionPipeline.from_pretrained("Z-Image/Base/v1.5")
# Load LoRA weight
pipe.load_lora_weights("./output/ehyra-lora/ehyra_v1_zimage.safetensors")
# Generate with character
image = pipe(
prompt="ehyra, 1girl, silver hair, cyan eyes, serious expression, dark background",
negative_prompt="lowres, bad anatomy, worst quality",
num_inference_steps=30,
guidance_scale=7.5,
cross_attention_kwargs={"scale": 0.8}, # LoRA strength 0.8
).images[0]
LoRA Weight Guidanceβ
| Weight | Effect | Use Case |
|---|---|---|
| 0.3β0.5 | Subtle influence | Style mixing with other LoRAs |
| 0.6β0.8 | Balanced | Default generation, good character fidelity |
| 0.9β1.0 | Strong influence | Maximum character consistency, risk of artifacts |
| 1.0+ | Overfit territory | Rarely recommended, may produce training image copies |
Version Namingβ
ehyra_v<VERSION>_zimage.safetensors
Examples:
ehyra_v1_zimage.safetensors β First training run
ehyra_v1.1_zimage.safetensors β Bug fix / retune
ehyra_v2_zimage.safetensors β New dataset or major changes
Troubleshootingβ
Overfittingβ
Symptoms: Character only appears in training poses/scenes; images look identical to training data.
Solutions:
- Reduce
dimfrom 32 to 16 - Reduce
learning_ratefrom 1e-4 to 5e-5 - Increase
dropoutfrom 0.1 to 0.2 - Add more varied images to the dataset
- Reduce epochs from 15 to 10
Underfittingβ
Symptoms: Character doesn't appear; LoRA seems to have no effect.
Solutions:
- Increase
dimfrom 32 to 64 - Increase
learning_ratefrom 1e-4 to 2e-4 - Increase epochs from 15 to 25
- Verify your captions include the trigger word
Color Bleedingβ
Symptoms: Character colors leak into backgrounds or other elements.
Solutions:
- Improve captions β describe backgrounds separately
- Reduce LoRA weight during generation
- Add color-specific negative prompts
From dataset to deployment β the pipeline that brings characters to life. π―β‘