Bytecode Optimizer

The PgEngine bytecode optimizer is a multi-pass optimization system that transforms bytecode after compilation but before execution. It performs pattern-based transformations to improve runtime performance and reduce bytecode size.

Overview

The optimization system consists of:

  • PassManager: Orchestrates multiple optimization passes

  • BytecodePass: Base class for individual optimization passes

  • BytecodeRewriter: Pattern matching and bytecode transformation engine

  • Optimization Passes: Specific optimization implementations

The optimizer runs automatically after compilation (unless disabled) and can apply multiple iterations of each pass until no more optimizations are found.

Architecture

PassManager

The PassManager coordinates the execution of optimization passes:

Responsibilities:

  • Maintains ordered list of optimization passes

  • Executes passes in sequence

  • Handles multi-pass iterations

  • Provides debug output for optimization process

  • Manages the shared BytecodeRewriter instance

Configuration:

// Enable/disable optimization
vm->enableBytecodeOptimization();
vm->disableBytecodeOptimization();

// Enable debug output to see transformations
vm->enableOptimizationDebugging();

// List all registered passes
vm->listOptimizationPasses();

BytecodePass Interface

All optimization passes inherit from BytecodePass:

class BytecodePass {
public:
    virtual ~BytecodePass() = default;

    // Pass identification
    virtual std::string getName() const = 0;

    // Execute the optimization pass
    virtual bool runPass(Chunk& chunk, BytecodeRewriter* rewriter) = 0;

    // Indicates if pass changes bytecode size
    virtual bool changesSize() const { return false; }

    // Indicates if pass should run multiple times
    virtual bool requiresMultiplePasses() const { return false; }
};

BytecodeRewriter

The BytecodeRewriter provides powerful pattern matching and transformation capabilities:

Pattern Elements:

  • match(opcode): Match specific opcode

  • anyOf(opcodes): Match any of multiple opcodes

  • wildcard(): Match any opcode

  • Predefined groups: constant(), load(), store(), jump(), etc.

Advanced Features:

  • Capture matched instructions for analysis

  • Lambda-based transformations

  • Conditional pattern matching

  • Automatic jump offset adjustment

  • Safe bytecode rewriting

Optimization Passes

The VM includes several built-in optimization passes that run by default.

Constant Folding Pass

Purpose: Evaluate constant expressions at compile time

Name: ConstantFoldingPass

Transformations:

  1. Arithmetic on Constants:

    # Before
    OP_Constant 5
    OP_Constant 3
    OP_Add
    
    # After
    OP_Constant 8
    
  2. Comparison on Constants:

    # Before
    OP_Constant 10
    OP_Constant 5
    OP_Greater
    
    # After
    OP_True
    
  3. Logical Operations:

    # Before
    OP_True
    OP_False
    OP_And
    
    # After
    OP_False
    

Benefits:

  • Reduces instruction count

  • Eliminates runtime computation

  • Smaller constant table

  • Better cache utilization

Properties:

  • Changes bytecode size: Yes

  • Requires multiple passes: Yes (for nested expressions)

Increment Optimization Pass

Purpose: Convert addition patterns to specialized increment instructions

Name: IncrementOptimizationPass

Transformations:

Pre-increment pattern:

# Before
OP_Get_Local 0
OP_Constant 1
OP_Add
OP_Set_Local 0
OP_Pop

# After
OP_Incr_Local 0

Post-increment pattern:

# Before
OP_Get_Local 0
OP_Constant 1
OP_Add
OP_Set_Local 0

# After
OP_Short_Int 0
OP_Post_Incr_Local 0

Benefits:

  • 5 instructions → 1-2 instructions

  • Faster execution (specialized opcode)

  • Smaller bytecode size

  • Common in loops

Fuse OP_Pop Pass

Purpose: Combine multiple consecutive pop operations

Name: FuseOpPop

Transformations:

Two pops:

# Before
OP_Pop
OP_Pop

# After
OP_PopN 2

Pop + PopN:

# Before
OP_Pop
OP_PopN 3

# After
OP_PopN 4

PopN + PopN:

# Before
OP_PopN 5
OP_PopN 3

# After
OP_PopN 8

Benefits:

  • Reduces instruction count

  • Better branch prediction

  • Fewer instruction fetches

  • Smaller bytecode

Properties:

  • Changes bytecode size: Yes

  • Requires multiple passes: Yes (for consecutive fusions)

Long Jump Optimization Pass

Purpose: Convert short jumps to long jumps when necessary, or long jumps to short jumps when possible

Name: LongJumpOptimizationPass

Transformations:

# Before (jump offset > 65535)
OP_Jump 0x1234

# After
OP_Long_Jump 0x00012345

# Before (long jump with offset < 65535)
OP_Long_Jump 0x0000FFFF

# After
OP_Jump 0xFFFF

Benefits:

  • Correct jumps for large functions

  • Smaller bytecode when long jumps unnecessary

  • Handles bytecode size changes from other passes

Remove Redundant Get/Define Global Pass

Purpose: Optimize patterns where globals are defined and immediately accessed

Name: RemoveDefGetGlobalRedundancy

Transformations:

# Before
OP_Constant <value>
OP_Define_Global <name>
OP_Get_Global <name>

# After
OP_Constant <value>
OP_Define_Global_Non_Popping <name>

Benefits:

  • Eliminates redundant global lookup

  • One less instruction

  • Value stays on stack

Simplify Constant Pass

Purpose: Replace constant access patterns with optimized variants

Name: SimplifyConstantPass

Transformations:

Small integers:

# Before
OP_Constant <index of integer 0-255>

# After
OP_Short_Int <value>

Constant globals:

# Before
OP_Constant <name>
OP_Define_Global

# After
OP_Define_Constant_Global <name>

Benefits:

  • Smaller bytecode

  • Faster execution (no constant table lookup)

  • Reduced constant pool pressure

Basic Operator Local Indexing Pass

Purpose: Optimize binary operations on local variables

Name: BasicOperatorLocalIndexing

Transformations:

Addition:

# Before
OP_Get_Local 0
OP_Get_Local 1
OP_Add

# After
OP_AddLL 0 1

Subtraction:

# Before
OP_Get_Local 0
OP_Get_Local 1
OP_Subtract

# After
OP_SubtractLL 0 1

Mixed constant/local:

# Before
OP_Get_Local 0
OP_Constant 5
OP_Subtract

# After
OP_SubtractLC 0 5

Benefits:

  • 3 instructions → 1 instruction

  • Direct register-style operations

  • Eliminates stack manipulation

  • Significant performance gain in tight loops

Constant Variable Access Pass

Purpose: Optimize access to variables holding constant values

Name: ConstantVarAccess

Transformations:

Detects variables that are assigned constant values and never modified, then propagates those constants directly.

Pattern Matching Engine

Advanced Pattern Matching

The BytecodeRewriter provides a powerful pattern matching API:

Pattern Elements:

// Match specific opcode
PatternElement::match(OpCode::OP_Add)

// Match with capture
PatternElement::match(OpCode::OP_Constant, true)

// Match any of multiple opcodes
PatternElement::anyOf({OP_Add, OP_Subtract, OP_Multiply})

// Predefined groups
PatternElement::constant()      // OP_Constant or OP_LongConstant
PatternElement::load()          // OP_Get_Local or OP_Get_Global
PatternElement::store()         // OP_Set_Local or OP_Set_Global
PatternElement::jump()          // Any jump instruction
PatternElement::increment()     // Any increment instruction

Captured Instructions:

Captured instructions provide access to:

  • Opcode

  • Operands (raw bytes)

  • Bytecode offset

  • Helper methods (e.g., getConstantIndex())

Example Pattern Matching

Simple pattern:

std::vector<PatternElement> pattern = {
    PatternElement::match(OpCode::OP_Pop),
    PatternElement::match(OpCode::OP_Pop),
};

rewriter->addAdvancedRule(pattern,
    [](const std::vector<CapturedInstruction>&) {
        std::vector<uint8_t> newCode;
        newCode.push_back(static_cast<uint8_t>(OpCode::OP_PopN));
        newCode.push_back(2);
        return newCode;
    });

Complex pattern with capture and condition:

// Detect: GET_LOCAL x, CONSTANT 1, ADD, SET_LOCAL x
std::vector<PatternElement> pattern = {
    PatternElement::match(OpCode::OP_Get_Local, true),    // Capture
    PatternElement::constant(true),                       // Capture
    PatternElement::match(OpCode::OP_Add),
    PatternElement::match(OpCode::OP_Set_Local, true),    // Capture
};

auto condition = [&chunk](const std::vector<CapturedInstruction>& captured) {
    // Verify same local variable
    if (captured[0].operands[0] != captured[2].operands[0]) {
        return false;
    }

    // Verify constant is 1
    Value constValue = chunk.constants[captured[1].getConstantIndex()];
    return IS_INT(constValue) && AS_INT(constValue) == 1;
};

auto transform = [](const std::vector<CapturedInstruction>& captured) {
    std::vector<uint8_t> newCode;
    newCode.push_back(static_cast<uint8_t>(OpCode::OP_Incr_Local));
    newCode.push_back(captured[0].operands[0]); // Local index
    return newCode;
};

rewriter->addAdvancedRule(pattern, transform, condition);

Jump Offset Handling

The rewriter automatically handles jump offset adjustments when bytecode size changes:

  • Tracks all jump targets before rewriting

  • Adjusts forward and backward jumps

  • Handles both short (16-bit) and long (32-bit) jumps

  • Converts between short/long jumps as needed

  • Updates loop instructions (OP_Loop)

This ensures that control flow remains correct after optimization.

Writing Custom Passes

Creating a New Pass

  1. Inherit from BytecodePass:

class MyOptimizationPass : public BytecodePass {
public:
    std::string getName() const override {
        return "MyOptimizationPass";
    }

    bool runPass(Chunk& chunk, BytecodeRewriter* rewriter) override;

    bool changesSize() const override { return true; }
    bool requiresMultiplePasses() const override { return false; }
};
  1. Implement the Pass Logic:

bool MyOptimizationPass::runPass(Chunk& chunk, BytecodeRewriter* rewriter) {
    // Define pattern to match
    std::vector<PatternElement> pattern = {
        // Your pattern here
    };

    // Define transformation
    auto transform = [](const std::vector<CapturedInstruction>& captured) {
        std::vector<uint8_t> newCode;
        // Generate optimized bytecode
        return newCode;
    };

    // Optional: Add condition
    auto condition = [&chunk](const std::vector<CapturedInstruction>& captured) {
        // Return true if transformation should apply
        return true;
    };

    // Register rule with rewriter
    rewriter->addAdvancedRule(pattern, transform, condition);

    // Apply rewrite rules
    return rewriter->rewrite(chunk);
}
  1. Register the Pass:

vm->addOptimizationPass(std::make_unique<MyOptimizationPass>());

Best Practices

Pattern Design:

  • Keep patterns specific enough to avoid false matches

  • Use captures only when needed (better performance)

  • Leverage predefined pattern groups

  • Test patterns with edge cases

Transformation Safety:

  • Verify operand bounds before access

  • Check constant table indices are valid

  • Ensure generated bytecode is well-formed

  • Consider side effects of original instructions

Conditions:

  • Use conditions to validate semantic correctness

  • Check for same variable indices

  • Verify constant values

  • Ensure no live values are discarded

Multiple Passes:

  • Set requiresMultiplePasses() if pass can enable more optimizations

  • PassManager limits iterations to prevent infinite loops

  • Each iteration should make progress

Size Changes:

  • Set changesSize() if instruction count changes

  • Rewriter handles jump offset updates automatically

  • Be aware of constant table impact

Debugging Optimizations

Enable Debug Output

vm->enableOptimizationDebugging();

This prints:

  • Original bytecode before all passes

  • Bytecode after each pass

  • Pass names and iteration counts

  • Which passes made changes

Example output:

[INFO] PassManager: Running 6 passes
[INFO] PassManager: Running pass: ConstantFoldingPass

=== BYTECODE BEFORE ConstantFoldingPass ===
0000    1 OP_Constant         0 '5'
0002    1 OP_Constant         1 '3'
0004    1 OP_Add
0005    1 OP_Debug_Print

=== BYTECODE AFTER ConstantFoldingPass ===
0000    1 OP_Constant         0 '8'
0002    1 OP_Debug_Print

[INFO] PassManager: Pass ConstantFoldingPass made changes (iteration 1)

Manual Pass Execution

Run specific passes for testing:

bool changed = passManager.runPass("ConstantFoldingPass", chunk);

Disassemble Bytecode

View bytecode at any point:

disassembleChunk(vm, chunk, "After My Pass");

Verification

After optimization, verify:

  • All jump targets are valid

  • Constant indices are in bounds

  • Stack depth is consistent

  • Local variable indices are valid

  • Line number information preserved

Performance Impact

Optimization Overhead

  • Pattern matching: ~1-5ms per 10,000 instructions

  • Constant folding: ~10-20% bytecode size reduction

  • Overall optimization: ~5-10ms for typical scripts

  • Significant runtime performance gains (10-50% faster)

Typical Improvements

Bytecode Size:

  • 10-30% smaller after all passes

  • Fewer instructions to fetch and decode

  • Better instruction cache utilization

Execution Speed:

  • 10-50% faster depending on code patterns

  • Loops benefit most (increment optimization)

  • Arithmetic-heavy code sees large gains

  • Function call overhead unchanged

Memory Usage:

  • Smaller constant tables

  • Fewer temporary stack values

  • Better cache locality

When to Disable

Consider disabling optimization for:

  • Very short scripts (overhead > benefit)

  • Scripts run once (no amortization of compilation cost)

  • Debugging (clearer bytecode correspondence to source)

  • Testing (verify optimizer correctness)

Example: Complete Custom Pass

Here’s a complete example implementing a pass that optimizes x = x * 2 to a left shift:

class MultiplyByTwoPass : public BytecodePass {
public:
    std::string getName() const override {
        return "MultiplyByTwoPass";
    }

    bool runPass(Chunk& chunk, BytecodeRewriter* rewriter) override {
        if (!rewriter) return false;

        // Pattern: GET_LOCAL x, CONSTANT 2, MULTIPLY, SET_LOCAL x
        std::vector<PatternElement> pattern = {
            PatternElement::match(OpCode::OP_Get_Local, true),
            PatternElement::constant(true),
            PatternElement::match(OpCode::OP_Multiply),
            PatternElement::match(OpCode::OP_Set_Local, true),
        };

        // Condition: same local, constant is 2
        auto condition = [&chunk](const std::vector<CapturedInstruction>& captured) {
            // Same local variable?
            if (captured[0].operands[0] != captured[2].operands[0]) {
                return false;
            }

            // Constant is 2?
            Value constValue = chunk.constants[captured[1].getConstantIndex()];
            if (!IS_INT(constValue)) return false;
            return AS_INT(constValue) == 2;
        };

        // Transform: use custom left shift operation
        auto transform = [](const std::vector<CapturedInstruction>& captured) {
            std::vector<uint8_t> newCode;
            // OP_LeftShift_Local_1 <local_index>
            newCode.push_back(static_cast<uint8_t>(OpCode::OP_LeftShift_Local_1));
            newCode.push_back(captured[0].operands[0]);
            return newCode;
        };

        rewriter->addAdvancedRule(pattern, transform, condition);
        return rewriter->rewrite(chunk);
    }

    bool changesSize() const override { return true; }
    bool requiresMultiplePasses() const override { return false; }
};

// Register the pass
vm->addOptimizationPass(std::make_unique<MultiplyByTwoPass>());

Pass Execution Order

The default pass order is optimized for maximum benefit:

  1. Simplify Constant Pass: Enable other optimizations

  2. Constant Folding Pass: Reduce expressions

  3. Remove Redundant Get/Define Global: Simplify global access

  4. Basic Operator Local Indexing: Optimize arithmetic

  5. Increment Optimization Pass: Optimize loops

  6. Fuse OP_Pop Pass: Combine stack operations

  7. Constant Variable Access: Propagate constants

  8. Long Jump Optimization Pass: Fix jumps (always last)

Pass order matters because:

  • Early passes enable later passes

  • Size-changing passes affect jump offsets

  • Jump optimization must run last

Future Enhancements

Potential Future Optimizations

  • Dead Code Elimination: Remove unreachable code

  • Loop Invariant Code Motion: Hoist constant expressions from loops

  • Inline Small Functions: Eliminate call overhead

  • Tail Call Optimization: Convert recursion to iteration

  • Peephole Optimizations: More specialized instruction patterns

  • Global Value Numbering: Eliminate redundant computations

  • Register Allocation: For potential JIT compiler

See Also