Bytecode Optimizer
The PgEngine bytecode optimizer is a multi-pass optimization system that transforms bytecode after compilation but before execution. It performs pattern-based transformations to improve runtime performance and reduce bytecode size.
Overview
The optimization system consists of:
PassManager: Orchestrates multiple optimization passes
BytecodePass: Base class for individual optimization passes
BytecodeRewriter: Pattern matching and bytecode transformation engine
Optimization Passes: Specific optimization implementations
The optimizer runs automatically after compilation (unless disabled) and can apply multiple iterations of each pass until no more optimizations are found.
Architecture
PassManager
The PassManager coordinates the execution of optimization passes:
Responsibilities:
Maintains ordered list of optimization passes
Executes passes in sequence
Handles multi-pass iterations
Provides debug output for optimization process
Manages the shared
BytecodeRewriterinstance
Configuration:
// Enable/disable optimization
vm->enableBytecodeOptimization();
vm->disableBytecodeOptimization();
// Enable debug output to see transformations
vm->enableOptimizationDebugging();
// List all registered passes
vm->listOptimizationPasses();
BytecodePass Interface
All optimization passes inherit from BytecodePass:
class BytecodePass {
public:
virtual ~BytecodePass() = default;
// Pass identification
virtual std::string getName() const = 0;
// Execute the optimization pass
virtual bool runPass(Chunk& chunk, BytecodeRewriter* rewriter) = 0;
// Indicates if pass changes bytecode size
virtual bool changesSize() const { return false; }
// Indicates if pass should run multiple times
virtual bool requiresMultiplePasses() const { return false; }
};
BytecodeRewriter
The BytecodeRewriter provides powerful pattern matching and transformation capabilities:
Pattern Elements:
match(opcode): Match specific opcodeanyOf(opcodes): Match any of multiple opcodeswildcard(): Match any opcodePredefined groups:
constant(),load(),store(),jump(), etc.
Advanced Features:
Capture matched instructions for analysis
Lambda-based transformations
Conditional pattern matching
Automatic jump offset adjustment
Safe bytecode rewriting
Optimization Passes
The VM includes several built-in optimization passes that run by default.
Constant Folding Pass
Purpose: Evaluate constant expressions at compile time
Name: ConstantFoldingPass
Transformations:
Arithmetic on Constants:
# Before OP_Constant 5 OP_Constant 3 OP_Add # After OP_Constant 8
Comparison on Constants:
# Before OP_Constant 10 OP_Constant 5 OP_Greater # After OP_True
Logical Operations:
# Before OP_True OP_False OP_And # After OP_False
Benefits:
Reduces instruction count
Eliminates runtime computation
Smaller constant table
Better cache utilization
Properties:
Changes bytecode size: Yes
Requires multiple passes: Yes (for nested expressions)
Increment Optimization Pass
Purpose: Convert addition patterns to specialized increment instructions
Name: IncrementOptimizationPass
Transformations:
Pre-increment pattern:
# Before
OP_Get_Local 0
OP_Constant 1
OP_Add
OP_Set_Local 0
OP_Pop
# After
OP_Incr_Local 0
Post-increment pattern:
# Before
OP_Get_Local 0
OP_Constant 1
OP_Add
OP_Set_Local 0
# After
OP_Short_Int 0
OP_Post_Incr_Local 0
Benefits:
5 instructions → 1-2 instructions
Faster execution (specialized opcode)
Smaller bytecode size
Common in loops
Fuse OP_Pop Pass
Purpose: Combine multiple consecutive pop operations
Name: FuseOpPop
Transformations:
Two pops:
# Before
OP_Pop
OP_Pop
# After
OP_PopN 2
Pop + PopN:
# Before
OP_Pop
OP_PopN 3
# After
OP_PopN 4
PopN + PopN:
# Before
OP_PopN 5
OP_PopN 3
# After
OP_PopN 8
Benefits:
Reduces instruction count
Better branch prediction
Fewer instruction fetches
Smaller bytecode
Properties:
Changes bytecode size: Yes
Requires multiple passes: Yes (for consecutive fusions)
Long Jump Optimization Pass
Purpose: Convert short jumps to long jumps when necessary, or long jumps to short jumps when possible
Name: LongJumpOptimizationPass
Transformations:
# Before (jump offset > 65535)
OP_Jump 0x1234
# After
OP_Long_Jump 0x00012345
# Before (long jump with offset < 65535)
OP_Long_Jump 0x0000FFFF
# After
OP_Jump 0xFFFF
Benefits:
Correct jumps for large functions
Smaller bytecode when long jumps unnecessary
Handles bytecode size changes from other passes
Remove Redundant Get/Define Global Pass
Purpose: Optimize patterns where globals are defined and immediately accessed
Name: RemoveDefGetGlobalRedundancy
Transformations:
# Before
OP_Constant <value>
OP_Define_Global <name>
OP_Get_Global <name>
# After
OP_Constant <value>
OP_Define_Global_Non_Popping <name>
Benefits:
Eliminates redundant global lookup
One less instruction
Value stays on stack
Simplify Constant Pass
Purpose: Replace constant access patterns with optimized variants
Name: SimplifyConstantPass
Transformations:
Small integers:
# Before
OP_Constant <index of integer 0-255>
# After
OP_Short_Int <value>
Constant globals:
# Before
OP_Constant <name>
OP_Define_Global
# After
OP_Define_Constant_Global <name>
Benefits:
Smaller bytecode
Faster execution (no constant table lookup)
Reduced constant pool pressure
Basic Operator Local Indexing Pass
Purpose: Optimize binary operations on local variables
Name: BasicOperatorLocalIndexing
Transformations:
Addition:
# Before
OP_Get_Local 0
OP_Get_Local 1
OP_Add
# After
OP_AddLL 0 1
Subtraction:
# Before
OP_Get_Local 0
OP_Get_Local 1
OP_Subtract
# After
OP_SubtractLL 0 1
Mixed constant/local:
# Before
OP_Get_Local 0
OP_Constant 5
OP_Subtract
# After
OP_SubtractLC 0 5
Benefits:
3 instructions → 1 instruction
Direct register-style operations
Eliminates stack manipulation
Significant performance gain in tight loops
Constant Variable Access Pass
Purpose: Optimize access to variables holding constant values
Name: ConstantVarAccess
Transformations:
Detects variables that are assigned constant values and never modified, then propagates those constants directly.
Pattern Matching Engine
Advanced Pattern Matching
The BytecodeRewriter provides a powerful pattern matching API:
Pattern Elements:
// Match specific opcode
PatternElement::match(OpCode::OP_Add)
// Match with capture
PatternElement::match(OpCode::OP_Constant, true)
// Match any of multiple opcodes
PatternElement::anyOf({OP_Add, OP_Subtract, OP_Multiply})
// Predefined groups
PatternElement::constant() // OP_Constant or OP_LongConstant
PatternElement::load() // OP_Get_Local or OP_Get_Global
PatternElement::store() // OP_Set_Local or OP_Set_Global
PatternElement::jump() // Any jump instruction
PatternElement::increment() // Any increment instruction
Captured Instructions:
Captured instructions provide access to:
Opcode
Operands (raw bytes)
Bytecode offset
Helper methods (e.g.,
getConstantIndex())
Example Pattern Matching
Simple pattern:
std::vector<PatternElement> pattern = {
PatternElement::match(OpCode::OP_Pop),
PatternElement::match(OpCode::OP_Pop),
};
rewriter->addAdvancedRule(pattern,
[](const std::vector<CapturedInstruction>&) {
std::vector<uint8_t> newCode;
newCode.push_back(static_cast<uint8_t>(OpCode::OP_PopN));
newCode.push_back(2);
return newCode;
});
Complex pattern with capture and condition:
// Detect: GET_LOCAL x, CONSTANT 1, ADD, SET_LOCAL x
std::vector<PatternElement> pattern = {
PatternElement::match(OpCode::OP_Get_Local, true), // Capture
PatternElement::constant(true), // Capture
PatternElement::match(OpCode::OP_Add),
PatternElement::match(OpCode::OP_Set_Local, true), // Capture
};
auto condition = [&chunk](const std::vector<CapturedInstruction>& captured) {
// Verify same local variable
if (captured[0].operands[0] != captured[2].operands[0]) {
return false;
}
// Verify constant is 1
Value constValue = chunk.constants[captured[1].getConstantIndex()];
return IS_INT(constValue) && AS_INT(constValue) == 1;
};
auto transform = [](const std::vector<CapturedInstruction>& captured) {
std::vector<uint8_t> newCode;
newCode.push_back(static_cast<uint8_t>(OpCode::OP_Incr_Local));
newCode.push_back(captured[0].operands[0]); // Local index
return newCode;
};
rewriter->addAdvancedRule(pattern, transform, condition);
Jump Offset Handling
The rewriter automatically handles jump offset adjustments when bytecode size changes:
Tracks all jump targets before rewriting
Adjusts forward and backward jumps
Handles both short (16-bit) and long (32-bit) jumps
Converts between short/long jumps as needed
Updates loop instructions (
OP_Loop)
This ensures that control flow remains correct after optimization.
Writing Custom Passes
Creating a New Pass
Inherit from BytecodePass:
class MyOptimizationPass : public BytecodePass {
public:
std::string getName() const override {
return "MyOptimizationPass";
}
bool runPass(Chunk& chunk, BytecodeRewriter* rewriter) override;
bool changesSize() const override { return true; }
bool requiresMultiplePasses() const override { return false; }
};
Implement the Pass Logic:
bool MyOptimizationPass::runPass(Chunk& chunk, BytecodeRewriter* rewriter) {
// Define pattern to match
std::vector<PatternElement> pattern = {
// Your pattern here
};
// Define transformation
auto transform = [](const std::vector<CapturedInstruction>& captured) {
std::vector<uint8_t> newCode;
// Generate optimized bytecode
return newCode;
};
// Optional: Add condition
auto condition = [&chunk](const std::vector<CapturedInstruction>& captured) {
// Return true if transformation should apply
return true;
};
// Register rule with rewriter
rewriter->addAdvancedRule(pattern, transform, condition);
// Apply rewrite rules
return rewriter->rewrite(chunk);
}
Register the Pass:
vm->addOptimizationPass(std::make_unique<MyOptimizationPass>());
Best Practices
Pattern Design:
Keep patterns specific enough to avoid false matches
Use captures only when needed (better performance)
Leverage predefined pattern groups
Test patterns with edge cases
Transformation Safety:
Verify operand bounds before access
Check constant table indices are valid
Ensure generated bytecode is well-formed
Consider side effects of original instructions
Conditions:
Use conditions to validate semantic correctness
Check for same variable indices
Verify constant values
Ensure no live values are discarded
Multiple Passes:
Set
requiresMultiplePasses()if pass can enable more optimizationsPassManager limits iterations to prevent infinite loops
Each iteration should make progress
Size Changes:
Set
changesSize()if instruction count changesRewriter handles jump offset updates automatically
Be aware of constant table impact
Debugging Optimizations
Enable Debug Output
vm->enableOptimizationDebugging();
This prints:
Original bytecode before all passes
Bytecode after each pass
Pass names and iteration counts
Which passes made changes
Example output:
[INFO] PassManager: Running 6 passes
[INFO] PassManager: Running pass: ConstantFoldingPass
=== BYTECODE BEFORE ConstantFoldingPass ===
0000 1 OP_Constant 0 '5'
0002 1 OP_Constant 1 '3'
0004 1 OP_Add
0005 1 OP_Debug_Print
=== BYTECODE AFTER ConstantFoldingPass ===
0000 1 OP_Constant 0 '8'
0002 1 OP_Debug_Print
[INFO] PassManager: Pass ConstantFoldingPass made changes (iteration 1)
Manual Pass Execution
Run specific passes for testing:
bool changed = passManager.runPass("ConstantFoldingPass", chunk);
Disassemble Bytecode
View bytecode at any point:
disassembleChunk(vm, chunk, "After My Pass");
Verification
After optimization, verify:
All jump targets are valid
Constant indices are in bounds
Stack depth is consistent
Local variable indices are valid
Line number information preserved
Performance Impact
Optimization Overhead
Pattern matching: ~1-5ms per 10,000 instructions
Constant folding: ~10-20% bytecode size reduction
Overall optimization: ~5-10ms for typical scripts
Significant runtime performance gains (10-50% faster)
Typical Improvements
Bytecode Size:
10-30% smaller after all passes
Fewer instructions to fetch and decode
Better instruction cache utilization
Execution Speed:
10-50% faster depending on code patterns
Loops benefit most (increment optimization)
Arithmetic-heavy code sees large gains
Function call overhead unchanged
Memory Usage:
Smaller constant tables
Fewer temporary stack values
Better cache locality
When to Disable
Consider disabling optimization for:
Very short scripts (overhead > benefit)
Scripts run once (no amortization of compilation cost)
Debugging (clearer bytecode correspondence to source)
Testing (verify optimizer correctness)
Example: Complete Custom Pass
Here’s a complete example implementing a pass that optimizes x = x * 2 to a left shift:
class MultiplyByTwoPass : public BytecodePass {
public:
std::string getName() const override {
return "MultiplyByTwoPass";
}
bool runPass(Chunk& chunk, BytecodeRewriter* rewriter) override {
if (!rewriter) return false;
// Pattern: GET_LOCAL x, CONSTANT 2, MULTIPLY, SET_LOCAL x
std::vector<PatternElement> pattern = {
PatternElement::match(OpCode::OP_Get_Local, true),
PatternElement::constant(true),
PatternElement::match(OpCode::OP_Multiply),
PatternElement::match(OpCode::OP_Set_Local, true),
};
// Condition: same local, constant is 2
auto condition = [&chunk](const std::vector<CapturedInstruction>& captured) {
// Same local variable?
if (captured[0].operands[0] != captured[2].operands[0]) {
return false;
}
// Constant is 2?
Value constValue = chunk.constants[captured[1].getConstantIndex()];
if (!IS_INT(constValue)) return false;
return AS_INT(constValue) == 2;
};
// Transform: use custom left shift operation
auto transform = [](const std::vector<CapturedInstruction>& captured) {
std::vector<uint8_t> newCode;
// OP_LeftShift_Local_1 <local_index>
newCode.push_back(static_cast<uint8_t>(OpCode::OP_LeftShift_Local_1));
newCode.push_back(captured[0].operands[0]);
return newCode;
};
rewriter->addAdvancedRule(pattern, transform, condition);
return rewriter->rewrite(chunk);
}
bool changesSize() const override { return true; }
bool requiresMultiplePasses() const override { return false; }
};
// Register the pass
vm->addOptimizationPass(std::make_unique<MultiplyByTwoPass>());
Pass Execution Order
The default pass order is optimized for maximum benefit:
Simplify Constant Pass: Enable other optimizations
Constant Folding Pass: Reduce expressions
Remove Redundant Get/Define Global: Simplify global access
Basic Operator Local Indexing: Optimize arithmetic
Increment Optimization Pass: Optimize loops
Fuse OP_Pop Pass: Combine stack operations
Constant Variable Access: Propagate constants
Long Jump Optimization Pass: Fix jumps (always last)
Pass order matters because:
Early passes enable later passes
Size-changing passes affect jump offsets
Jump optimization must run last
Future Enhancements
Potential Future Optimizations
Dead Code Elimination: Remove unreachable code
Loop Invariant Code Motion: Hoist constant expressions from loops
Inline Small Functions: Eliminate call overhead
Tail Call Optimization: Convert recursion to iteration
Peephole Optimizations: More specialized instruction patterns
Global Value Numbering: Eliminate redundant computations
Register Allocation: For potential JIT compiler
See Also
Virtual Machine (VM) Internals - Detailed VM architecture
Script Language API Reference - Script language API reference
Script Language Overview - Language overview