Bytecode Optimizer

The PgEngine bytecode optimizer is a multi-pass optimization system that transforms bytecode after compilation but before execution. It performs pattern-based transformations to improve runtime performance and reduce bytecode size.

Overview

The optimization system consists of:

PassManager: Orchestrates multiple optimization passes
BytecodePass: Base class for individual optimization passes
BytecodeRewriter: Pattern matching and bytecode transformation engine
Optimization Passes: Specific optimization implementations

The optimizer runs automatically after compilation (unless disabled) and can apply multiple iterations of each pass until no more optimizations are found.

Architecture

PassManager

The PassManager coordinates the execution of optimization passes:

Responsibilities:

Maintains ordered list of optimization passes
Executes passes in sequence
Handles multi-pass iterations
Provides debug output for optimization process
Manages the shared BytecodeRewriter instance

Configuration:

// Enable/disable optimization
vm->enableBytecodeOptimization();
vm->disableBytecodeOptimization();

// Enable debug output to see transformations
vm->enableOptimizationDebugging();

// List all registered passes
vm->listOptimizationPasses();

BytecodePass Interface

All optimization passes inherit from BytecodePass:

class BytecodePass {
public:
    virtual ~BytecodePass() = default;

    // Pass identification
    virtual std::string getName() const = 0;

    // Execute the optimization pass
    virtual bool runPass(Chunk& chunk, BytecodeRewriter* rewriter) = 0;

    // Indicates if pass changes bytecode size
    virtual bool changesSize() const { return false; }

    // Indicates if pass should run multiple times
    virtual bool requiresMultiplePasses() const { return false; }
};

BytecodeRewriter

The BytecodeRewriter provides powerful pattern matching and transformation capabilities:

Pattern Elements:

match(opcode): Match specific opcode
anyOf(opcodes): Match any of multiple opcodes
wildcard(): Match any opcode
Predefined groups: constant(), load(), store(), jump(), etc.

Advanced Features:

Capture matched instructions for analysis
Lambda-based transformations
Conditional pattern matching
Automatic jump offset adjustment
Safe bytecode rewriting

Optimization Passes

The VM includes several built-in optimization passes that run by default.

Constant Folding Pass

Purpose: Evaluate constant expressions at compile time

Name: ConstantFoldingPass

Transformations:

Arithmetic on Constants:

# Before
OP_Constant 5
OP_Constant 3
OP_Add

# After
OP_Constant 8

Comparison on Constants:

# Before
OP_Constant 10
OP_Constant 5
OP_Greater

# After
OP_True

Logical Operations:

# Before
OP_True
OP_False
OP_And

# After
OP_False

Benefits:

Reduces instruction count
Eliminates runtime computation
Smaller constant table
Better cache utilization

Properties:

Changes bytecode size: Yes
Requires multiple passes: Yes (for nested expressions)

Increment Optimization Pass

Purpose: Convert addition patterns to specialized increment instructions

Name: IncrementOptimizationPass

Transformations:

Pre-increment pattern:

# Before
OP_Get_Local 0
OP_Constant 1
OP_Add
OP_Set_Local 0
OP_Pop

# After
OP_Incr_Local 0

Post-increment pattern:

# Before
OP_Get_Local 0
OP_Constant 1
OP_Add
OP_Set_Local 0

# After
OP_Short_Int 0
OP_Post_Incr_Local 0

Benefits:

5 instructions → 1-2 instructions
Faster execution (specialized opcode)
Smaller bytecode size
Common in loops

Fuse OP_Pop Pass

Purpose: Combine multiple consecutive pop operations

Name: FuseOpPop

Transformations:

Two pops:

# Before
OP_Pop
OP_Pop

# After
OP_PopN 2

Pop + PopN:

# Before
OP_Pop
OP_PopN 3

# After
OP_PopN 4

PopN + PopN:

# Before
OP_PopN 5
OP_PopN 3

# After
OP_PopN 8

Benefits:

Reduces instruction count
Better branch prediction
Fewer instruction fetches
Smaller bytecode

Properties:

Changes bytecode size: Yes
Requires multiple passes: Yes (for consecutive fusions)

Long Jump Optimization Pass

Purpose: Convert short jumps to long jumps when necessary, or long jumps to short jumps when possible

Name: LongJumpOptimizationPass

Transformations:

# Before (jump offset > 65535)
OP_Jump 0x1234

# After
OP_Long_Jump 0x00012345

# Before (long jump with offset < 65535)
OP_Long_Jump 0x0000FFFF

# After
OP_Jump 0xFFFF

Benefits:

Correct jumps for large functions
Smaller bytecode when long jumps unnecessary
Handles bytecode size changes from other passes

Remove Redundant Get/Define Global Pass

Purpose: Optimize patterns where globals are defined and immediately accessed

Name: RemoveDefGetGlobalRedundancy

Transformations:

# Before
OP_Constant <value>
OP_Define_Global <name>
OP_Get_Global <name>

# After
OP_Constant <value>
OP_Define_Global_Non_Popping <name>

Benefits:

Eliminates redundant global lookup
One less instruction
Value stays on stack

Simplify Constant Pass

Purpose: Replace constant access patterns with optimized variants

Name: SimplifyConstantPass

Transformations:

Small integers:

# Before
OP_Constant <index of integer 0-255>

# After
OP_Short_Int <value>

Constant globals:

# Before
OP_Constant <name>
OP_Define_Global

# After
OP_Define_Constant_Global <name>

Benefits:

Smaller bytecode
Faster execution (no constant table lookup)
Reduced constant pool pressure

Basic Operator Local Indexing Pass

Purpose: Optimize binary operations on local variables

Name: BasicOperatorLocalIndexing

Transformations:

Addition:

# Before
OP_Get_Local 0
OP_Get_Local 1
OP_Add

# After
OP_AddLL 0 1

Subtraction:

# Before
OP_Get_Local 0
OP_Get_Local 1
OP_Subtract

# After
OP_SubtractLL 0 1

Mixed constant/local:

# Before
OP_Get_Local 0
OP_Constant 5
OP_Subtract

# After
OP_SubtractLC 0 5

Benefits:

3 instructions → 1 instruction
Direct register-style operations
Eliminates stack manipulation
Significant performance gain in tight loops

Constant Variable Access Pass

Purpose: Optimize access to variables holding constant values

Name: ConstantVarAccess

Transformations:

Detects variables that are assigned constant values and never modified, then propagates those constants directly.

Pattern Matching Engine

Advanced Pattern Matching

The BytecodeRewriter provides a powerful pattern matching API:

Pattern Elements:

// Match specific opcode
PatternElement::match(OpCode::OP_Add)

// Match with capture
PatternElement::match(OpCode::OP_Constant, true)

// Match any of multiple opcodes
PatternElement::anyOf({OP_Add, OP_Subtract, OP_Multiply})

// Predefined groups
PatternElement::constant()      // OP_Constant or OP_LongConstant
PatternElement::load()          // OP_Get_Local or OP_Get_Global
PatternElement::store()         // OP_Set_Local or OP_Set_Global
PatternElement::jump()          // Any jump instruction
PatternElement::increment()     // Any increment instruction

Captured Instructions:

Captured instructions provide access to:

Opcode
Operands (raw bytes)
Bytecode offset
Helper methods (e.g., getConstantIndex())

Example Pattern Matching

Simple pattern:

std::vector<PatternElement> pattern = {
    PatternElement::match(OpCode::OP_Pop),
    PatternElement::match(OpCode::OP_Pop),
};

rewriter->addAdvancedRule(pattern,
    [](const std::vector<CapturedInstruction>&) {
        std::vector<uint8_t> newCode;
        newCode.push_back(static_cast<uint8_t>(OpCode::OP_PopN));
        newCode.push_back(2);
        return newCode;
    });

Complex pattern with capture and condition:

// Detect: GET_LOCAL x, CONSTANT 1, ADD, SET_LOCAL x
std::vector<PatternElement> pattern = {
    PatternElement::match(OpCode::OP_Get_Local, true),    // Capture
    PatternElement::constant(true),                       // Capture
    PatternElement::match(OpCode::OP_Add),
    PatternElement::match(OpCode::OP_Set_Local, true),    // Capture
};

auto condition = [&chunk](const std::vector<CapturedInstruction>& captured) {
    // Verify same local variable
    if (captured[0].operands[0] != captured[2].operands[0]) {
        return false;
    }

    // Verify constant is 1
    Value constValue = chunk.constants[captured[1].getConstantIndex()];
    return IS_INT(constValue) && AS_INT(constValue) == 1;
};

auto transform = [](const std::vector<CapturedInstruction>& captured) {
    std::vector<uint8_t> newCode;
    newCode.push_back(static_cast<uint8_t>(OpCode::OP_Incr_Local));
    newCode.push_back(captured[0].operands[0]); // Local index
    return newCode;
};

rewriter->addAdvancedRule(pattern, transform, condition);

Jump Offset Handling

The rewriter automatically handles jump offset adjustments when bytecode size changes:

Tracks all jump targets before rewriting
Adjusts forward and backward jumps
Handles both short (16-bit) and long (32-bit) jumps
Converts between short/long jumps as needed
Updates loop instructions (OP_Loop)

This ensures that control flow remains correct after optimization.

Writing Custom Passes

Creating a New Pass

Inherit from BytecodePass:

class MyOptimizationPass : public BytecodePass {
public:
    std::string getName() const override {
        return "MyOptimizationPass";
    }

    bool runPass(Chunk& chunk, BytecodeRewriter* rewriter) override;

    bool changesSize() const override { return true; }
    bool requiresMultiplePasses() const override { return false; }
};

Implement the Pass Logic:

bool MyOptimizationPass::runPass(Chunk& chunk, BytecodeRewriter* rewriter) {
    // Define pattern to match
    std::vector<PatternElement> pattern = {
        // Your pattern here
    };

    // Define transformation
    auto transform = [](const std::vector<CapturedInstruction>& captured) {
        std::vector<uint8_t> newCode;
        // Generate optimized bytecode
        return newCode;
    };

    // Optional: Add condition
    auto condition = [&chunk](const std::vector<CapturedInstruction>& captured) {
        // Return true if transformation should apply
        return true;
    };

    // Register rule with rewriter
    rewriter->addAdvancedRule(pattern, transform, condition);

    // Apply rewrite rules
    return rewriter->rewrite(chunk);
}

Register the Pass:

vm->addOptimizationPass(std::make_unique<MyOptimizationPass>());

Best Practices

Pattern Design:

Keep patterns specific enough to avoid false matches
Use captures only when needed (better performance)
Leverage predefined pattern groups
Test patterns with edge cases

Transformation Safety:

Verify operand bounds before access
Check constant table indices are valid
Ensure generated bytecode is well-formed
Consider side effects of original instructions

Conditions:

Use conditions to validate semantic correctness
Check for same variable indices
Verify constant values
Ensure no live values are discarded

Multiple Passes:

Set requiresMultiplePasses() if pass can enable more optimizations
PassManager limits iterations to prevent infinite loops
Each iteration should make progress

Size Changes:

Set changesSize() if instruction count changes
Rewriter handles jump offset updates automatically
Be aware of constant table impact

Debugging Optimizations

Enable Debug Output

vm->enableOptimizationDebugging();

This prints:

Original bytecode before all passes
Bytecode after each pass
Pass names and iteration counts
Which passes made changes

Example output:

[INFO] PassManager: Running 6 passes
[INFO] PassManager: Running pass: ConstantFoldingPass

=== BYTECODE BEFORE ConstantFoldingPass ===
0000    1 OP_Constant         0 '5'
0002    1 OP_Constant         1 '3'
0004    1 OP_Add
0005    1 OP_Debug_Print

=== BYTECODE AFTER ConstantFoldingPass ===
0000    1 OP_Constant         0 '8'
0002    1 OP_Debug_Print

[INFO] PassManager: Pass ConstantFoldingPass made changes (iteration 1)

Manual Pass Execution

Run specific passes for testing:

bool changed = passManager.runPass("ConstantFoldingPass", chunk);

Disassemble Bytecode

View bytecode at any point:

disassembleChunk(vm, chunk, "After My Pass");

Verification

After optimization, verify:

All jump targets are valid
Constant indices are in bounds
Stack depth is consistent
Local variable indices are valid
Line number information preserved

Performance Impact

Optimization Overhead

Pattern matching: ~1-5ms per 10,000 instructions
Constant folding: ~10-20% bytecode size reduction
Overall optimization: ~5-10ms for typical scripts
Significant runtime performance gains (10-50% faster)

Typical Improvements

Bytecode Size:

10-30% smaller after all passes
Fewer instructions to fetch and decode
Better instruction cache utilization

Execution Speed:

10-50% faster depending on code patterns
Loops benefit most (increment optimization)
Arithmetic-heavy code sees large gains
Function call overhead unchanged

Memory Usage:

Smaller constant tables
Fewer temporary stack values
Better cache locality

When to Disable

Consider disabling optimization for:

Very short scripts (overhead > benefit)
Scripts run once (no amortization of compilation cost)
Debugging (clearer bytecode correspondence to source)
Testing (verify optimizer correctness)

Example: Complete Custom Pass

Here’s a complete example implementing a pass that optimizes x = x * 2 to a left shift:

class MultiplyByTwoPass : public BytecodePass {
public:
    std::string getName() const override {
        return "MultiplyByTwoPass";
    }

    bool runPass(Chunk& chunk, BytecodeRewriter* rewriter) override {
        if (!rewriter) return false;

        // Pattern: GET_LOCAL x, CONSTANT 2, MULTIPLY, SET_LOCAL x
        std::vector<PatternElement> pattern = {
            PatternElement::match(OpCode::OP_Get_Local, true),
            PatternElement::constant(true),
            PatternElement::match(OpCode::OP_Multiply),
            PatternElement::match(OpCode::OP_Set_Local, true),
        };

        // Condition: same local, constant is 2
        auto condition = [&chunk](const std::vector<CapturedInstruction>& captured) {
            // Same local variable?
            if (captured[0].operands[0] != captured[2].operands[0]) {
                return false;
            }

            // Constant is 2?
            Value constValue = chunk.constants[captured[1].getConstantIndex()];
            if (!IS_INT(constValue)) return false;
            return AS_INT(constValue) == 2;
        };

        // Transform: use custom left shift operation
        auto transform = [](const std::vector<CapturedInstruction>& captured) {
            std::vector<uint8_t> newCode;
            // OP_LeftShift_Local_1 <local_index>
            newCode.push_back(static_cast<uint8_t>(OpCode::OP_LeftShift_Local_1));
            newCode.push_back(captured[0].operands[0]);
            return newCode;
        };

        rewriter->addAdvancedRule(pattern, transform, condition);
        return rewriter->rewrite(chunk);
    }

    bool changesSize() const override { return true; }
    bool requiresMultiplePasses() const override { return false; }
};

// Register the pass
vm->addOptimizationPass(std::make_unique<MultiplyByTwoPass>());

Pass Execution Order

The default pass order is optimized for maximum benefit:

Simplify Constant Pass: Enable other optimizations
Constant Folding Pass: Reduce expressions
Remove Redundant Get/Define Global: Simplify global access
Basic Operator Local Indexing: Optimize arithmetic
Increment Optimization Pass: Optimize loops
Fuse OP_Pop Pass: Combine stack operations
Constant Variable Access: Propagate constants
Long Jump Optimization Pass: Fix jumps (always last)

Pass order matters because:

Early passes enable later passes
Size-changing passes affect jump offsets
Jump optimization must run last

Future Enhancements

Potential Future Optimizations

Dead Code Elimination: Remove unreachable code
Loop Invariant Code Motion: Hoist constant expressions from loops
Inline Small Functions: Eliminate call overhead
Tail Call Optimization: Convert recursion to iteration
Peephole Optimizations: More specialized instruction patterns
Global Value Numbering: Eliminate redundant computations
Register Allocation: For potential JIT compiler

Bytecode Optimizer

Overview

Architecture

PassManager

BytecodePass Interface

BytecodeRewriter

Optimization Passes

Constant Folding Pass

Increment Optimization Pass

Fuse OP_Pop Pass

Long Jump Optimization Pass

Remove Redundant Get/Define Global Pass

Simplify Constant Pass

Basic Operator Local Indexing Pass

Constant Variable Access Pass

Pattern Matching Engine

Advanced Pattern Matching

Example Pattern Matching

Jump Offset Handling

Writing Custom Passes

Creating a New Pass

Best Practices

Debugging Optimizations

Enable Debug Output

Manual Pass Execution

Disassemble Bytecode

Verification

Performance Impact

Optimization Overhead

Typical Improvements

When to Disable

Example: Complete Custom Pass

Pass Execution Order

Future Enhancements

Potential Future Optimizations

See Also

Related Files