Skip to main content

🎯 Ehyra LoRA Training β€” Z-Image Base

Ehyra is a character developed within the BrierStudios ecosystem, designed for use across visual media (illustrations, animations, avatar generation). This document covers the LoRA training pipeline using the Z-Image Base model.

Overview​

LoRA (Low-Rank Adaptation) allows us to fine-tune existing diffusion models with a small, trainable adapter β€” producing character-consistent output without retraining the entire base model.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ LoRA Training Pipeline β”‚
β”‚ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Dataset │──→│ Training │──→│ LoRA β”‚ β”‚
β”‚ β”‚ Prep β”‚ β”‚ Config β”‚ β”‚ Export β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β–Ό β–Ό β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Captions β”‚ β”‚ Kohya/SD β”‚ β”‚ Validationβ”‚ β”‚
β”‚ β”‚ & Tags β”‚ β”‚ Scripts β”‚ β”‚ & Test β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Prerequisites​

RequirementVersionNotes
Python3.10+Required for training scripts
CUDA11.8+GPU training required
VRAM12GB+ minimum24GB recommended
Z-Image Basev1.5+Base model for LoRA
Kohya_ssLatestTraining framework
Dataset30–100 imagesCharacter reference images

Step 1: Dataset Preparation​

Image Collection​

Gather 30–100 high-quality reference images of Ehyra. Quality over quantity.

Requirements:

  • Minimum resolution: 512Γ—512 pixels
  • Recommended resolution: 768Γ—768 or 1024Γ—1024
  • Format: PNG (lossless)
  • Consistent character design across all images

Image distribution:

  • 40% head/bust shots
  • 30% full body
  • 20% half body
  • 10% detail shots (eyes, outfit details, accessories)

Naming Convention​

ehyra_<number>_<type>_<variant>.png

Examples:
ehyra_001_head_neutral.png
ehyra_002_full_casual.png
ehyra_003_bust_smiling.png

Captioning​

Each image needs a corresponding .txt caption file with the same name:

ehyra_001_head_neutral.png
ehyra_001_head_neutral.txt ← Caption file

Caption format:

ehyra, <hair description>, <eye description>, <outfit description>, <pose>, <expression>, <background type>, <lighting>

Example caption:

ehyra, long silver hair, cyan tips, cyan eyes, dark techwear jacket, neon trim, forward facing, serious expression, dark background, neon rim lighting, cyberpunk style

Tag Strategy​

Use a trigger word as the first tag in every caption. This is the word that activates the LoRA during generation.

Trigger word: ehyra
tip

Keep your trigger word unique and short. Avoid common words that might conflict with existing model concepts. "ehyra" works well because it's unique to this character.

Step 2: Training Configuration​

LoRA Parameters​

Create a training config file:

ehyra_lora_config.toml
[model]
pretrained_model = "Z-Image/Base/v1.5"
trigger_word = "ehyra"

[network]
type = "lora" # LoRA (not LoCon/Full)
dim = 32 # Rank dimension β€” good balance for characters
alpha = 16 # Network alpha (dim/2 is standard)
dropout = 0.1 # Prevent overfitting
algo = "lora" # Algorithm

[dataset]
image_dir = "./dataset/ehyra"
batch_size = 2
resolution = 768
enable_bucket = true # Aspect ratio bucketing
min_bucket_reso = 384
max_bucket_reso = 1024

[training]
epochs = 15 # 15 epochs for 50-80 images
learning_rate = 1e-4 # 0.0001
lr_scheduler = "cosine"
warmup_steps = 100
gradient_accumulation = 2
mixed_precision = "bf16" # Use fp16 if no bf16 support
save_every_n_epochs = 3
seed = 42

[output]
output_dir = "./output/ehyra-lora"
output_name = "ehyra_v1_zimage"
save_model_as = "safetensors"

Parameter Explanation​

ParameterValueWhy
dim32Sufficient for character definition. Higher = more detail but risk of overfit. Lower = faster but less flexibility.
alpha16Half of dim. Controls learning impact per step.
dropout0.1Gentle regularization to prevent memorizing training images.
epochs15With 50 images and batch 2, this gives ~375 training steps per epoch.
learning_rate1e-4Standard for LoRA. Lower (5e-5) if overfitting, higher (2e-4) if underfitting.
resolution768Z-Image Base works well at 768. Use 1024 only if you have the VRAM.

Step 3: Training Execution​

# Activate your training environment
conda activate kohya

# Start training
accelerate launch train_network.py \
--config_file ehyra_lora_config.toml

# Or using the Kohya GUI β€” load ehyra_lora_config.toml

Monitor training:

  • Watch the loss curve β€” it should decrease steadily
  • If loss plateaus early, reduce learning rate
  • If loss oscillates wildly, reduce learning rate and increase warmup
# View training logs
tensorboard --logdir ./output/ehyra-lora/logs

Step 4: Validation & Testing​

After training, test the LoRA with various prompts:

Test Prompts​

# Basic character test
prompt = "ehyra, 1girl, silver hair, cyan eyes, dark techwear, cyberpunk style, dark background"
negative = "lowres, bad anatomy, extra digits, worst quality, low quality, blurry"

# Expression variation
prompt = "ehyra, 1girl, silver hair, cyan eyes, smiling, happy expression, neon city background"

# Action pose
prompt = "ehyra, 1girl, silver hair, cyan eyes, dynamic pose, fighting stance, neon glow"

# Casual scene
prompt = "ehyra, 1girl, silver hair, cyan eyes, sitting, relaxed, casual outfit, indoor scene"

Quality Checklist​

  • Character is recognizable as "ehyra" with LoRA weight 0.7–1.0
  • Silver hair with cyan tips renders consistently
  • Cyan eyes are visible and correctly colored
  • Techwear outfit appears without explicit prompting
  • No artifacts in the background when using low LoRA weights
  • Character can switch expressions (happy, serious, surprised)
  • Multiple poses work without breaking character details
  • No bleeding of training image compositions

Step 5: Export & Integration​

Export Format​

# The output will be in safetensors format
ls ./output/ehyra-lora/
# ehyra_v1_zimage.safetensors

Integration with Z-Image Base​

In your generation pipeline:

from diffusers import StableDiffusionPipeline

# Load base model
pipe = StableDiffusionPipeline.from_pretrained("Z-Image/Base/v1.5")

# Load LoRA weight
pipe.load_lora_weights("./output/ehyra-lora/ehyra_v1_zimage.safetensors")

# Generate with character
image = pipe(
prompt="ehyra, 1girl, silver hair, cyan eyes, serious expression, dark background",
negative_prompt="lowres, bad anatomy, worst quality",
num_inference_steps=30,
guidance_scale=7.5,
cross_attention_kwargs={"scale": 0.8}, # LoRA strength 0.8
).images[0]

LoRA Weight Guidance​

WeightEffectUse Case
0.3–0.5Subtle influenceStyle mixing with other LoRAs
0.6–0.8BalancedDefault generation, good character fidelity
0.9–1.0Strong influenceMaximum character consistency, risk of artifacts
1.0+Overfit territoryRarely recommended, may produce training image copies

Version Naming​

ehyra_v<VERSION>_zimage.safetensors

Examples:
ehyra_v1_zimage.safetensors β€” First training run
ehyra_v1.1_zimage.safetensors β€” Bug fix / retune
ehyra_v2_zimage.safetensors β€” New dataset or major changes

Troubleshooting​

Overfitting​

Symptoms: Character only appears in training poses/scenes; images look identical to training data.

Solutions:

  • Reduce dim from 32 to 16
  • Reduce learning_rate from 1e-4 to 5e-5
  • Increase dropout from 0.1 to 0.2
  • Add more varied images to the dataset
  • Reduce epochs from 15 to 10

Underfitting​

Symptoms: Character doesn't appear; LoRA seems to have no effect.

Solutions:

  • Increase dim from 32 to 64
  • Increase learning_rate from 1e-4 to 2e-4
  • Increase epochs from 15 to 25
  • Verify your captions include the trigger word

Color Bleeding​

Symptoms: Character colors leak into backgrounds or other elements.

Solutions:

  • Improve captions β€” describe backgrounds separately
  • Reduce LoRA weight during generation
  • Add color-specific negative prompts

From dataset to deployment β€” the pipeline that brings characters to life. 🎯⚑