Bytecode Optimizer ================== The PgEngine bytecode optimizer is a multi-pass optimization system that transforms bytecode after compilation but before execution. It performs pattern-based transformations to improve runtime performance and reduce bytecode size. Overview -------- The optimization system consists of: * **PassManager**: Orchestrates multiple optimization passes * **BytecodePass**: Base class for individual optimization passes * **BytecodeRewriter**: Pattern matching and bytecode transformation engine * **Optimization Passes**: Specific optimization implementations The optimizer runs automatically after compilation (unless disabled) and can apply multiple iterations of each pass until no more optimizations are found. Architecture ------------ PassManager ~~~~~~~~~~~ The ``PassManager`` coordinates the execution of optimization passes: **Responsibilities**: * Maintains ordered list of optimization passes * Executes passes in sequence * Handles multi-pass iterations * Provides debug output for optimization process * Manages the shared ``BytecodeRewriter`` instance **Configuration**:: // Enable/disable optimization vm->enableBytecodeOptimization(); vm->disableBytecodeOptimization(); // Enable debug output to see transformations vm->enableOptimizationDebugging(); // List all registered passes vm->listOptimizationPasses(); BytecodePass Interface ~~~~~~~~~~~~~~~~~~~~~~ All optimization passes inherit from ``BytecodePass``: .. code-block:: cpp class BytecodePass { public: virtual ~BytecodePass() = default; // Pass identification virtual std::string getName() const = 0; // Execute the optimization pass virtual bool runPass(Chunk& chunk, BytecodeRewriter* rewriter) = 0; // Indicates if pass changes bytecode size virtual bool changesSize() const { return false; } // Indicates if pass should run multiple times virtual bool requiresMultiplePasses() const { return false; } }; BytecodeRewriter ~~~~~~~~~~~~~~~~ The ``BytecodeRewriter`` provides powerful pattern matching and transformation capabilities: **Pattern Elements**: * ``match(opcode)``: Match specific opcode * ``anyOf(opcodes)``: Match any of multiple opcodes * ``wildcard()``: Match any opcode * Predefined groups: ``constant()``, ``load()``, ``store()``, ``jump()``, etc. **Advanced Features**: * Capture matched instructions for analysis * Lambda-based transformations * Conditional pattern matching * Automatic jump offset adjustment * Safe bytecode rewriting Optimization Passes ------------------- The VM includes several built-in optimization passes that run by default. Constant Folding Pass ~~~~~~~~~~~~~~~~~~~~~~ **Purpose**: Evaluate constant expressions at compile time **Name**: ``ConstantFoldingPass`` **Transformations**: 1. **Arithmetic on Constants**:: # Before OP_Constant 5 OP_Constant 3 OP_Add # After OP_Constant 8 2. **Comparison on Constants**:: # Before OP_Constant 10 OP_Constant 5 OP_Greater # After OP_True 3. **Logical Operations**:: # Before OP_True OP_False OP_And # After OP_False **Benefits**: * Reduces instruction count * Eliminates runtime computation * Smaller constant table * Better cache utilization **Properties**: * Changes bytecode size: Yes * Requires multiple passes: Yes (for nested expressions) Increment Optimization Pass ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose**: Convert addition patterns to specialized increment instructions **Name**: ``IncrementOptimizationPass`` **Transformations**: **Pre-increment pattern**:: # Before OP_Get_Local 0 OP_Constant 1 OP_Add OP_Set_Local 0 OP_Pop # After OP_Incr_Local 0 **Post-increment pattern**:: # Before OP_Get_Local 0 OP_Constant 1 OP_Add OP_Set_Local 0 # After OP_Short_Int 0 OP_Post_Incr_Local 0 **Benefits**: * 5 instructions → 1-2 instructions * Faster execution (specialized opcode) * Smaller bytecode size * Common in loops Fuse OP_Pop Pass ~~~~~~~~~~~~~~~~ **Purpose**: Combine multiple consecutive pop operations **Name**: ``FuseOpPop`` **Transformations**: **Two pops**:: # Before OP_Pop OP_Pop # After OP_PopN 2 **Pop + PopN**:: # Before OP_Pop OP_PopN 3 # After OP_PopN 4 **PopN + PopN**:: # Before OP_PopN 5 OP_PopN 3 # After OP_PopN 8 **Benefits**: * Reduces instruction count * Better branch prediction * Fewer instruction fetches * Smaller bytecode **Properties**: * Changes bytecode size: Yes * Requires multiple passes: Yes (for consecutive fusions) Long Jump Optimization Pass ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose**: Convert short jumps to long jumps when necessary, or long jumps to short jumps when possible **Name**: ``LongJumpOptimizationPass`` **Transformations**:: # Before (jump offset > 65535) OP_Jump 0x1234 # After OP_Long_Jump 0x00012345 # Before (long jump with offset < 65535) OP_Long_Jump 0x0000FFFF # After OP_Jump 0xFFFF **Benefits**: * Correct jumps for large functions * Smaller bytecode when long jumps unnecessary * Handles bytecode size changes from other passes Remove Redundant Get/Define Global Pass ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose**: Optimize patterns where globals are defined and immediately accessed **Name**: ``RemoveDefGetGlobalRedundancy`` **Transformations**:: # Before OP_Constant OP_Define_Global OP_Get_Global # After OP_Constant OP_Define_Global_Non_Popping **Benefits**: * Eliminates redundant global lookup * One less instruction * Value stays on stack Simplify Constant Pass ~~~~~~~~~~~~~~~~~~~~~~~ **Purpose**: Replace constant access patterns with optimized variants **Name**: ``SimplifyConstantPass`` **Transformations**: **Small integers**:: # Before OP_Constant # After OP_Short_Int **Constant globals**:: # Before OP_Constant OP_Define_Global # After OP_Define_Constant_Global **Benefits**: * Smaller bytecode * Faster execution (no constant table lookup) * Reduced constant pool pressure Basic Operator Local Indexing Pass ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose**: Optimize binary operations on local variables **Name**: ``BasicOperatorLocalIndexing`` **Transformations**: **Addition**:: # Before OP_Get_Local 0 OP_Get_Local 1 OP_Add # After OP_AddLL 0 1 **Subtraction**:: # Before OP_Get_Local 0 OP_Get_Local 1 OP_Subtract # After OP_SubtractLL 0 1 **Mixed constant/local**:: # Before OP_Get_Local 0 OP_Constant 5 OP_Subtract # After OP_SubtractLC 0 5 **Benefits**: * 3 instructions → 1 instruction * Direct register-style operations * Eliminates stack manipulation * Significant performance gain in tight loops Constant Variable Access Pass ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Purpose**: Optimize access to variables holding constant values **Name**: ``ConstantVarAccess`` **Transformations**: Detects variables that are assigned constant values and never modified, then propagates those constants directly. Pattern Matching Engine ----------------------- Advanced Pattern Matching ~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``BytecodeRewriter`` provides a powerful pattern matching API: **Pattern Elements**: .. code-block:: cpp // Match specific opcode PatternElement::match(OpCode::OP_Add) // Match with capture PatternElement::match(OpCode::OP_Constant, true) // Match any of multiple opcodes PatternElement::anyOf({OP_Add, OP_Subtract, OP_Multiply}) // Predefined groups PatternElement::constant() // OP_Constant or OP_LongConstant PatternElement::load() // OP_Get_Local or OP_Get_Global PatternElement::store() // OP_Set_Local or OP_Set_Global PatternElement::jump() // Any jump instruction PatternElement::increment() // Any increment instruction **Captured Instructions**: Captured instructions provide access to: * Opcode * Operands (raw bytes) * Bytecode offset * Helper methods (e.g., ``getConstantIndex()``) Example Pattern Matching ~~~~~~~~~~~~~~~~~~~~~~~~~ **Simple pattern**: .. code-block:: cpp std::vector pattern = { PatternElement::match(OpCode::OP_Pop), PatternElement::match(OpCode::OP_Pop), }; rewriter->addAdvancedRule(pattern, [](const std::vector&) { std::vector newCode; newCode.push_back(static_cast(OpCode::OP_PopN)); newCode.push_back(2); return newCode; }); **Complex pattern with capture and condition**: .. code-block:: cpp // Detect: GET_LOCAL x, CONSTANT 1, ADD, SET_LOCAL x std::vector pattern = { PatternElement::match(OpCode::OP_Get_Local, true), // Capture PatternElement::constant(true), // Capture PatternElement::match(OpCode::OP_Add), PatternElement::match(OpCode::OP_Set_Local, true), // Capture }; auto condition = [&chunk](const std::vector& captured) { // Verify same local variable if (captured[0].operands[0] != captured[2].operands[0]) { return false; } // Verify constant is 1 Value constValue = chunk.constants[captured[1].getConstantIndex()]; return IS_INT(constValue) && AS_INT(constValue) == 1; }; auto transform = [](const std::vector& captured) { std::vector newCode; newCode.push_back(static_cast(OpCode::OP_Incr_Local)); newCode.push_back(captured[0].operands[0]); // Local index return newCode; }; rewriter->addAdvancedRule(pattern, transform, condition); Jump Offset Handling ~~~~~~~~~~~~~~~~~~~~ The rewriter automatically handles jump offset adjustments when bytecode size changes: * Tracks all jump targets before rewriting * Adjusts forward and backward jumps * Handles both short (16-bit) and long (32-bit) jumps * Converts between short/long jumps as needed * Updates loop instructions (``OP_Loop``) This ensures that control flow remains correct after optimization. Writing Custom Passes ---------------------- Creating a New Pass ~~~~~~~~~~~~~~~~~~~ 1. **Inherit from BytecodePass**: .. code-block:: cpp class MyOptimizationPass : public BytecodePass { public: std::string getName() const override { return "MyOptimizationPass"; } bool runPass(Chunk& chunk, BytecodeRewriter* rewriter) override; bool changesSize() const override { return true; } bool requiresMultiplePasses() const override { return false; } }; 2. **Implement the Pass Logic**: .. code-block:: cpp bool MyOptimizationPass::runPass(Chunk& chunk, BytecodeRewriter* rewriter) { // Define pattern to match std::vector pattern = { // Your pattern here }; // Define transformation auto transform = [](const std::vector& captured) { std::vector newCode; // Generate optimized bytecode return newCode; }; // Optional: Add condition auto condition = [&chunk](const std::vector& captured) { // Return true if transformation should apply return true; }; // Register rule with rewriter rewriter->addAdvancedRule(pattern, transform, condition); // Apply rewrite rules return rewriter->rewrite(chunk); } 3. **Register the Pass**: .. code-block:: cpp vm->addOptimizationPass(std::make_unique()); Best Practices ~~~~~~~~~~~~~~ **Pattern Design**: * Keep patterns specific enough to avoid false matches * Use captures only when needed (better performance) * Leverage predefined pattern groups * Test patterns with edge cases **Transformation Safety**: * Verify operand bounds before access * Check constant table indices are valid * Ensure generated bytecode is well-formed * Consider side effects of original instructions **Conditions**: * Use conditions to validate semantic correctness * Check for same variable indices * Verify constant values * Ensure no live values are discarded **Multiple Passes**: * Set ``requiresMultiplePasses()`` if pass can enable more optimizations * PassManager limits iterations to prevent infinite loops * Each iteration should make progress **Size Changes**: * Set ``changesSize()`` if instruction count changes * Rewriter handles jump offset updates automatically * Be aware of constant table impact Debugging Optimizations ----------------------- Enable Debug Output ~~~~~~~~~~~~~~~~~~~ .. code-block:: cpp vm->enableOptimizationDebugging(); This prints: * Original bytecode before all passes * Bytecode after each pass * Pass names and iteration counts * Which passes made changes Example output:: [INFO] PassManager: Running 6 passes [INFO] PassManager: Running pass: ConstantFoldingPass === BYTECODE BEFORE ConstantFoldingPass === 0000 1 OP_Constant 0 '5' 0002 1 OP_Constant 1 '3' 0004 1 OP_Add 0005 1 OP_Debug_Print === BYTECODE AFTER ConstantFoldingPass === 0000 1 OP_Constant 0 '8' 0002 1 OP_Debug_Print [INFO] PassManager: Pass ConstantFoldingPass made changes (iteration 1) Manual Pass Execution ~~~~~~~~~~~~~~~~~~~~~ Run specific passes for testing: .. code-block:: cpp bool changed = passManager.runPass("ConstantFoldingPass", chunk); Disassemble Bytecode ~~~~~~~~~~~~~~~~~~~~ View bytecode at any point: .. code-block:: cpp disassembleChunk(vm, chunk, "After My Pass"); Verification ~~~~~~~~~~~~ After optimization, verify: * All jump targets are valid * Constant indices are in bounds * Stack depth is consistent * Local variable indices are valid * Line number information preserved Performance Impact ------------------ Optimization Overhead ~~~~~~~~~~~~~~~~~~~~~ * Pattern matching: ~1-5ms per 10,000 instructions * Constant folding: ~10-20% bytecode size reduction * Overall optimization: ~5-10ms for typical scripts * Significant runtime performance gains (10-50% faster) Typical Improvements ~~~~~~~~~~~~~~~~~~~~ **Bytecode Size**: * 10-30% smaller after all passes * Fewer instructions to fetch and decode * Better instruction cache utilization **Execution Speed**: * 10-50% faster depending on code patterns * Loops benefit most (increment optimization) * Arithmetic-heavy code sees large gains * Function call overhead unchanged **Memory Usage**: * Smaller constant tables * Fewer temporary stack values * Better cache locality When to Disable ~~~~~~~~~~~~~~~ Consider disabling optimization for: * Very short scripts (overhead > benefit) * Scripts run once (no amortization of compilation cost) * Debugging (clearer bytecode correspondence to source) * Testing (verify optimizer correctness) Example: Complete Custom Pass ------------------------------ Here's a complete example implementing a pass that optimizes ``x = x * 2`` to a left shift: .. code-block:: cpp class MultiplyByTwoPass : public BytecodePass { public: std::string getName() const override { return "MultiplyByTwoPass"; } bool runPass(Chunk& chunk, BytecodeRewriter* rewriter) override { if (!rewriter) return false; // Pattern: GET_LOCAL x, CONSTANT 2, MULTIPLY, SET_LOCAL x std::vector pattern = { PatternElement::match(OpCode::OP_Get_Local, true), PatternElement::constant(true), PatternElement::match(OpCode::OP_Multiply), PatternElement::match(OpCode::OP_Set_Local, true), }; // Condition: same local, constant is 2 auto condition = [&chunk](const std::vector& captured) { // Same local variable? if (captured[0].operands[0] != captured[2].operands[0]) { return false; } // Constant is 2? Value constValue = chunk.constants[captured[1].getConstantIndex()]; if (!IS_INT(constValue)) return false; return AS_INT(constValue) == 2; }; // Transform: use custom left shift operation auto transform = [](const std::vector& captured) { std::vector newCode; // OP_LeftShift_Local_1 newCode.push_back(static_cast(OpCode::OP_LeftShift_Local_1)); newCode.push_back(captured[0].operands[0]); return newCode; }; rewriter->addAdvancedRule(pattern, transform, condition); return rewriter->rewrite(chunk); } bool changesSize() const override { return true; } bool requiresMultiplePasses() const override { return false; } }; // Register the pass vm->addOptimizationPass(std::make_unique()); Pass Execution Order -------------------- The default pass order is optimized for maximum benefit: 1. **Simplify Constant Pass**: Enable other optimizations 2. **Constant Folding Pass**: Reduce expressions 3. **Remove Redundant Get/Define Global**: Simplify global access 4. **Basic Operator Local Indexing**: Optimize arithmetic 5. **Increment Optimization Pass**: Optimize loops 6. **Fuse OP_Pop Pass**: Combine stack operations 7. **Constant Variable Access**: Propagate constants 8. **Long Jump Optimization Pass**: Fix jumps (always last) Pass order matters because: * Early passes enable later passes * Size-changing passes affect jump offsets * Jump optimization must run last Future Enhancements ------------------- Potential Future Optimizations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * **Dead Code Elimination**: Remove unreachable code * **Loop Invariant Code Motion**: Hoist constant expressions from loops * **Inline Small Functions**: Eliminate call overhead * **Tail Call Optimization**: Convert recursion to iteration * **Peephole Optimizations**: More specialized instruction patterns * **Global Value Numbering**: Eliminate redundant computations * **Register Allocation**: For potential JIT compiler See Also -------- * :doc:`vm_internals` - Detailed VM architecture * :doc:`api` - Script language API reference * :doc:`overview` - Language overview Related Files ------------- Source code locations: * Pass manager: `bytecode_pass.cpp `_ * Pass interface: `bytecode_pass.h `_ * Bytecode rewriter: `bytecode_rewriter.cpp `_ * Rewriter header: `bytecode_rewriter.h `_ * Optimization passes directory: `pass/ `_