Virtual Machine (VM) Internals

This document provides detailed information about the internal architecture of the Columba Virtual Machine, including its execution model, memory management, and optimization strategies.

Overview

The Columba VM is a stack-based bytecode interpreter that executes compiled script code. It features:

Stack-based architecture: Operations manipulate values on a runtime stack
NaN-boxing: Efficient 64-bit value representation
Pool-based memory management: Object pools with reference counting
Bytecode optimization: Multi-pass optimization pipeline
Function pointer dispatch: Fast opcode execution
Profiling support: Built-in performance analysis tools

Architecture Components

Core VM Structure

The VM maintains several key components:

Call Frames

Each function call creates a call frame that stores:

Closure reference (function + captured upvalues)
Instruction pointer (IP) - current bytecode position
Stack slots pointer - function’s local variable base
Maximum of 64 nested call frames (FRAMES_MAX)

Value Stack

A fixed-size stack (FRAMES_MAX * 4096 elements) storing all runtime values:

Function arguments and return values
Local variables
Temporary computation results
Optimized for single 64-bit MOV instructions

Global Variables

Hash map storing global variable names and their values

Pools System

Separate memory pools for each object type (see Memory Management section)

Execution Model

Bytecode Interpretation

The VM executes bytecode through a function pointer dispatch system:

Instruction Fetch: Read opcode from current IP position
Dispatch: Call registered handler function for that opcode
Execute: Handler manipulates stack and updates VM state
Advance: Move IP to next instruction
Repeat: Continue until OP_Return or error

Example bytecode execution for addition:

OP_Get_Local  0    # Push local variable 0 onto stack
OP_Get_Local  1    # Push local variable 1 onto stack
OP_Add             # Pop two values, add them, push result
OP_Set_Local  2    # Pop result and store in local variable 2

Function Calls

Function calls follow this sequence:

Argument Setup: Arguments pushed onto stack
Call Instruction: OP_Call with argument count
Frame Creation: New call frame allocated
Closure Setup: Function closure and captured upvalues loaded
IP Update: Instruction pointer set to function start
Execution: Function body executes
Return: OP_Return pops frame and restores previous context

Closure Mechanics

Closures capture variables from their enclosing scope:

Upvalues

Reference variables from parent scopes
Can be “open” (pointing to stack) or “closed” (moved to heap)
Automatically closed when stack frame exits

Capture Process:

fun outer()
{
    var x = 10;        # Local variable in outer's frame
    fun inner()
    {
        return x;      # Captures x as upvalue
    }
    return inner;
}

When inner is created:

OP_Closure instruction creates closure object
Upvalue created pointing to x on outer’s stack frame
When outer returns, upvalue is “closed” - x moved to heap
Inner function maintains reference to closed upvalue

Value Representation (NaN-Boxing)

All values are represented as 64-bit integers using NaN-boxing, allowing efficient storage and type checking.

Encoding Scheme

Doubles: Standard IEEE 754 representation (any non-NaN bit pattern)

Tagged Values: Use IEEE 754 quiet NaN space:

[Sign][0x7FF][1][unused][TAG(3)][INDEX/PAYLOAD(47)]

Sign=0: Primary types (int, bool, string, closure, function, upvalue, class, native)
Sign=1: Extended types (instance, bound_method, vector, small_string)
Bits [49:47]: 3-bit type tag
Bits [46:0]: 47-bit index or payload

Primary Types (Sign bit = 0)

Type	Tag	Payload
Integer	0	47-bit signed integer (±70 trillion)
Boolean	1	0 (false) or 1 (true)
String	2	Index into string pool
Closure	3	Index into closure pool
Function	4	Index into function pool
Upvalue	5	Index into upvalue pool
Class	6	Index into class pool
Native	7	Index into native function pool

Extended Types (Sign bit = 1)

Type	Tag	Payload
Instance	0	Index into instance pool
BoundMethod	1	Index into bound method pool
Vector	2	Index into vector pool
SmallString	3	Inline string (up to 5 characters)

Benefits of NaN-Boxing

Space Efficient: All values fit in 64 bits (8 bytes)
Fast Type Checking: Single bit mask operation
Cache Friendly: Compact representation improves cache locality
Register Passing: Values fit in CPU registers
No Boxing Overhead: No separate heap allocation for primitives

Memory Management

Pool-Based Allocation

The VM uses separate memory pools for each object type:

String Pool: ElementType objects (variable-length strings)
Closure Pool: Function + captured upvalues
Function Pool: Compiled function objects (ObjFunction)
Upvalue Pool: Upvalue objects
Class Pool: Class definitions (Klass)
Native Function Pool: Native C++ function wrappers
Instance Pool: Class instances (ObjInstance)
Bound Method Pool: Instance + method pairs
Vector Pool: Dynamic arrays (tables)

Pool Advantages

Fast Allocation: No system malloc calls for common objects
Better Cache Locality: Objects of same type stored contiguously
Reduced Fragmentation: Fixed-size blocks per pool
Efficient Iteration: Pools can be traversed linearly

Reference Counting

Heap objects use reference counting for automatic memory management:

Tracking:

Value trackNewValue(Value v)
{
    pools.getRefCountVector(v)[index] = 1;
    return v;
}

Retaining:

Value retainValue(Value v)
{
    if (!requiresRefCount(v)) return v;
    if (pools.isConstant(v)) return v;
    pools.getRefCountVector(v)[index]++;
    return v;
}

Releasing:

bool releaseValue(Value v)
{
    if (!requiresRefCount(v)) return false;
    if (pools.isConstant(v)) return false;
    pools.getRefCountVector(v)[index]--;
    return (refCount == 0); // Should delete
}

Reference Counting Rules

Constants: Never ref-counted (live forever in constant table)
Primitives: No ref-counting needed (int, bool, double)
Stack Values: Retained when pushed, released when popped
Global Variables: Retained when stored, released on VM shutdown
Upvalues: Special handling for open vs closed states

String Interning

Strings are automatically interned to avoid duplicates:

Hash table maps string content → pool index
Duplicate strings reuse existing pool entry
String comparison becomes integer comparison
Significantly reduces memory usage

Small String Optimization: Strings ≤5 characters stored inline in the Value itself, avoiding pool allocation entirely.

Bytecode Format

Instruction Set

The VM executes a compact bytecode instruction set. Each instruction consists of:

Opcode: 1 byte identifying the operation
Operands: 0-4 bytes depending on instruction

Core Instructions

Stack Operations:

OP_Constant <index>      # Push constant from constant table
OP_LongConstant <i24>    # Push constant (24-bit index)
OP_Pop                   # Pop and discard top of stack
OP_PopN <count>          # Pop N values from stack

Arithmetic:

OP_Add                   # Pop b, pop a, push a+b
OP_Subtract              # Pop b, pop a, push a-b
OP_Multiply              # Pop b, pop a, push a*b
OP_Divide                # Pop b, pop a, push a/b
OP_Negate                # Pop a, push -a

Comparison:

OP_Equal                 # Pop b, pop a, push a==b
OP_NotEqual              # Pop b, pop a, push a!=b
OP_Greater               # Pop b, pop a, push a>b
OP_GreaterEqual          # Pop b, pop a, push a>=b
OP_Less                  # Pop b, pop a, push a<b
OP_LessEqual             # Pop b, pop a, push a<=b

Logical:

OP_Not                   # Pop a, push !a
OP_And                   # Logical AND (short-circuit)
OP_Or                    # Logical OR (short-circuit)

Variables:

OP_Get_Local <index>     # Push local variable
OP_Set_Local <index>     # Pop and store in local variable
OP_Get_Global <index>    # Push global variable
OP_Set_Global <index>    # Pop and store in global variable
OP_Define_Global <idx>   # Pop and define new global variable

Control Flow:

OP_Jump <offset>         # Unconditional jump (16-bit offset)
OP_Long_Jump <offset>    # Unconditional jump (32-bit offset)
OP_Jump_If_False <off>   # Jump if top of stack is false
OP_Loop <offset>         # Jump backward (for loops)

Functions:

OP_Call <argCount>       # Call function with N arguments
OP_Return                # Return from current function
OP_Closure <funcIdx>     # Create closure from function

Classes and Objects:

OP_Class <nameIdx>       # Define new class
OP_Get_Property <name>   # Get instance property
OP_Set_Property <name>   # Set instance property
OP_Method <nameIdx>      # Define method on class
OP_Invoke <name><argc>   # Optimized method call

Metamethods

The VM supports metamethods that allow customization of property access behavior. Metamethods are special methods that intercept property operations.

Supported Metamethods:

__get(self, propertyName): Called when accessing a non-existent property. Returns the value to use.
__set(self, propertyName, value): Called when setting a property. Should return the value that was set.

Metamethod Types

Metamethods can be defined in two ways:

Class Methods: Defined on the class, available to all instances
Dynamic Metamethods: Stored as instance fields, allows per-instance customization

Implementation Details

Property Access (`OP_Get_Property`)

When accessing a property (instance.property):

Check real fields first: If property exists in instance fields, return it immediately
Check for ``__get``: Look in class methods, then instance fields
Call metamethod: If found, invoke __get(instance, propertyName)
Check class methods: Try to bind a class method
Error: If nothing found, runtime error

Property Assignment (`OP_Set_Property`)

When setting a property (instance.property = value):

Internal field bypass: Fields starting with __ bypass metamethods
Internal access detection: Direct field access from bound methods bypasses metamethods
Check for ``__set``: Look in class methods, then instance fields
Call metamethod: If found, invoke __set(instance, propertyName, value)
Direct assignment: If no metamethod, store directly in instance fields

Double-Underscore Convention

Fields starting with __ (double underscore) are treated as “internal” and always bypass metamethods. This convention prevents infinite recursion in metamethod implementations.

Example:

class ProxyClass
{
    init(x, y)
    {
        this.__data_x = x  // Direct field access, no __set called
        this.__data_y = y
    }
}

fun createProxy()
{
    return fun(self, prop, value) {
        // Store data internally without triggering __set recursion
        if (prop == "x")
        {
            self.__data_x = value  // Bypasses __set
        }
    }
}

var obj = ProxyClass(10, 20)
obj.__set = createProxy()
obj.x = 100  // Calls __set, which stores to __data_x

Internal Access Detection

The VM detects when property access occurs from within a bound method of the same instance. In this case, metamethods are bypassed to allow direct field manipulation.

Detection criteria:

Current frame is a bound method call (stackBase == slots)
The accessed instance matches slots[0] (the receiver)
Frame count > 1 (not the global script)

This allows methods to directly access their own instance fields without metamethod interception:

class MyClass {
    init() {
        this.value = 0
    }

    increment() {
        this.value = this.value + 1  // Direct access, no __set called
    }
}

Stack Layout for Dynamic Metamethods

When calling a dynamic metamethod (stored in instance field), the VM sets up the stack as:

[closure] [instance] [arg1] [arg2] ...

For __get(self, propertyName):: Stack: [__get_closure] [instance] [propertyName]
For __set(self, propertyName, value):: Stack: [__set_closure] [instance] [propertyName] [value]

This differs from class method metamethods which use bound method calls.

Index Operations

Index operations (instance[key] and instance[key] = value) also support metamethods:

OP_Get_Index: Checks for __get metamethod if key not found
OP_Set_Index: Checks for __set metamethod before assignment
Same double-underscore bypass rules apply

Performance Considerations

Real fields are checked before metamethods (O(1) hash lookup)
Metamethod lookup requires two hash lookups (class methods, then instance fields)
Internal field access (__ prefix) is fastest (single hash lookup, no metamethod check)
Bound method internal access bypasses metamethod checks entirely

Tables:

OP_Build_Table <pairs>   # Create table from N key-value pairs
OP_Build_Vector <count>  # Create vector from N values
OP_Get_Index             # table[key] - pop key, pop table, push value
OP_Set_Index             # table[key]=val - pop val, pop key, pop table

Iterators:

OP_Get_Iterator          # Create iterator for table
OP_Iterator_Next         # Advance iterator, push next key (or nil)
OP_Table_Size            # Get table size
OP_Table_At              # Get key at specific index

Optimized Instructions:

OP_Incr_Local <idx>      # ++local (pre-increment)
OP_Post_Incr_Local <idx> # local++ (post-increment)
OP_Decr_Local <idx>      # --local (pre-decrement)
OP_Post_Decr_Local <idx> # local-- (post-decrement)
OP_AddLL <i1><i2>        # Add two local variables
OP_SubtractLL <i1><i2>   # Subtract two local variables
OP_Short_Int <value>     # Push small integer directly

Chunk Structure

A Chunk represents a compiled function’s bytecode:

struct Chunk
{
    std::vector<uint8_t> code;           # Bytecode instructions
    std::vector<Value> constants;         # Constant pool
    std::vector<int> lines;               # Line numbers for debugging
    std::vector<std::string> importedModules;  # Module dependencies
};

Function Objects

Compiled functions are stored as ObjFunction objects:

struct ObjFunction
{
    Chunk chunk;             # Function's bytecode
    int arity;               # Number of parameters
    std::string name;        # Function name (for debugging)
    int upvalueCount;        # Number of captured variables
};

Optimization System

The VM includes a multi-pass bytecode optimization system that transforms bytecode after compilation but before execution.

See Bytecode Optimizer for detailed information about the optimization passes and how they work.

Profiling and Debugging

VM Profiler

The VM includes a built-in profiler to measure bytecode execution performance:

Enable profiling:

vm->enableProfiling();
// ... run script ...
vm->printProfilingReport(true);  // Sort by time
vm->printProfilingBytecodeReport();

Profiler Tracks:

Execution count per opcode
Total time spent per opcode
Time per opcode occurrence
Hottest code paths

Debug Output

Enable debug output for optimization passes:

vm->enableOptimizationDebugging();

This prints:

Bytecode before each pass
Bytecode after each pass
Changes made by each pass
Pattern matches and rewrites

Disassembler

The compiler includes a bytecode disassembler for debugging:

disassembleChunk(vm, chunk, "FunctionName");

Output shows:

Bytecode offset
Source line number
Opcode name
Operands (with constant values)
Jump targets

Performance Considerations

Hot Path Optimizations

The VM is optimized for critical hot paths:

Stack Operations

Direct array access (no bounds checking in release builds)
Inline functions for push/pop
64-bit values passed in registers
Cache-aligned stack array

Instruction Dispatch

Function pointer table for O(1) dispatch
Cached chunk data pointer
Cached chunk end pointer for fast loop exit
No switch statement overhead

Value Operations

NaN-boxing for fast type checks (single bitwise AND)
Reference counting for heap objects only
Constant values never ref-counted
Small string optimization avoids pool lookups

Memory Access

Pool-based allocation reduces cache misses
Contiguous storage for same-type objects
Reference counting vectors parallel to object pools
String interning reduces memory and comparison costs

Best Practices for VM Extension

Adding Native Functions

Register native functions during VM initialization:

vm->registerNative("myFunction", [](VM* vm, int argCount, Value* args) -> Value {
    // Validate argument count
    if (argCount != 2)
    {
        return makeBoolValue(false);
    }

    // Extract arguments
    int a = AS_INT(args[0]);
    int b = AS_INT(args[1]);

    // Perform operation
    int result = a + b;

    // Return result
    return makeIntValue(result);
});

Creating Native Modules

Define native modules with exported functions and variables:

NativeModule mathModule;

mathModule.exportedFunctions["sqrt"] = [](VM* vm, int argCount, Value* args) {
    if (argCount != 1 || !IS_DOUBLE(args[0]))
    {
        return makeIntValue(0); // Error handling
    }
    double result = std::sqrt(AS_DOUBLE(args[0]));
    return makeDoubleValue(result);
};

mathModule.exportedVariables["PI"] = ElementType(3.14159265358979323846);

vm->addNativeModule("math", mathModule);

Module System

Import Resolution

When a script uses import "moduleName":

Check Native Modules: Look for registered native module
Check File System: Look for moduleName.pg or moduleName.pgc
Load and Execute: Import module and add exports to global scope
Cache: Module loaded once, subsequent imports reuse cached version

Bytecode Caching

Compiled bytecode can be saved and loaded:

Compilation:

vm->interpretFromFile("script.pg", false, "script.pgc");
// Creates script.pgc bytecode file

Execution:

vm->interpretFromBytecodeFile("script.pgc");
// Faster loading, no parsing/compilation

Cached bytecode includes:

All bytecode instructions
Constant table
Function definitions
Imported module list

Error Handling

The VM provides detailed error information:

Compile Errors

Syntax errors with line numbers
Type mismatches
Undefined variables

Runtime Errors

Full stack trace
Line numbers for each frame
Function names
Error message

Example error output:

[ERROR] VM: Undefined variable 'foo'
[ERROR] VM: [line 15] in myFunction
[ERROR] VM: [line 23] in main

Error Recovery

Runtime errors use setjmp/longjmp for fast unwinding:

Error detected during execution
longjmp to error handler
Stack automatically unwound
All reference counts decremented
VM state reset for next execution

Advanced Topics

JIT Considerations

While the current VM is an interpreter, the architecture supports future JIT compilation:

Bytecode designed for linear execution
Pool indices can be patched for direct pointers
Function pointer dispatch can be inlined
Stack-based model maps well to register allocation

Threading Model

Each VM instance is single-threaded:

No locks needed during execution
Multiple VMs can run in separate threads
Native functions must be thread-safe
Shared data requires external synchronization

Extensions

The VM can be extended with:

Custom native modules
Additional bytecode instructions
New optimization passes
Custom object types in pools
Specialized calling conventions