Virtual Machine (VM) Internals
This document provides detailed information about the internal architecture of the Columba Virtual Machine, including its execution model, memory management, and optimization strategies.
Overview
The Columba VM is a stack-based bytecode interpreter that executes compiled script code. It features:
Stack-based architecture: Operations manipulate values on a runtime stack
NaN-boxing: Efficient 64-bit value representation
Pool-based memory management: Object pools with reference counting
Bytecode optimization: Multi-pass optimization pipeline
Function pointer dispatch: Fast opcode execution
Profiling support: Built-in performance analysis tools
Architecture Components
Core VM Structure
The VM maintains several key components:
- Call Frames
Each function call creates a call frame that stores:
Closure reference (function + captured upvalues)
Instruction pointer (IP) - current bytecode position
Stack slots pointer - function’s local variable base
Maximum of 64 nested call frames (
FRAMES_MAX)
- Value Stack
A fixed-size stack (
FRAMES_MAX * 4096elements) storing all runtime values:Function arguments and return values
Local variables
Temporary computation results
Optimized for single 64-bit MOV instructions
- Global Variables
Hash map storing global variable names and their values
- Pools System
Separate memory pools for each object type (see Memory Management section)
Execution Model
Bytecode Interpretation
The VM executes bytecode through a function pointer dispatch system:
Instruction Fetch: Read opcode from current IP position
Dispatch: Call registered handler function for that opcode
Execute: Handler manipulates stack and updates VM state
Advance: Move IP to next instruction
Repeat: Continue until
OP_Returnor error
Example bytecode execution for addition:
OP_Get_Local 0 # Push local variable 0 onto stack
OP_Get_Local 1 # Push local variable 1 onto stack
OP_Add # Pop two values, add them, push result
OP_Set_Local 2 # Pop result and store in local variable 2
Function Calls
Function calls follow this sequence:
Argument Setup: Arguments pushed onto stack
Call Instruction:
OP_Callwith argument countFrame Creation: New call frame allocated
Closure Setup: Function closure and captured upvalues loaded
IP Update: Instruction pointer set to function start
Execution: Function body executes
Return:
OP_Returnpops frame and restores previous context
Closure Mechanics
Closures capture variables from their enclosing scope:
- Upvalues
Reference variables from parent scopes
Can be “open” (pointing to stack) or “closed” (moved to heap)
Automatically closed when stack frame exits
Capture Process:
fun outer()
{
var x = 10; # Local variable in outer's frame
fun inner()
{
return x; # Captures x as upvalue
}
return inner;
}
When inner is created:
OP_Closureinstruction creates closure objectUpvalue created pointing to
xon outer’s stack frameWhen outer returns, upvalue is “closed” -
xmoved to heapInner function maintains reference to closed upvalue
Value Representation (NaN-Boxing)
All values are represented as 64-bit integers using NaN-boxing, allowing efficient storage and type checking.
Encoding Scheme
Doubles: Standard IEEE 754 representation (any non-NaN bit pattern)
Tagged Values: Use IEEE 754 quiet NaN space:
[Sign][0x7FF][1][unused][TAG(3)][INDEX/PAYLOAD(47)] Sign=0: Primary types (int, bool, string, closure, function, upvalue, class, native) Sign=1: Extended types (instance, bound_method, vector, small_string) Bits [49:47]: 3-bit type tag Bits [46:0]: 47-bit index or payload
Primary Types (Sign bit = 0)
Type |
Tag |
Payload |
|---|---|---|
Integer |
0 |
47-bit signed integer (±70 trillion) |
Boolean |
1 |
0 (false) or 1 (true) |
String |
2 |
Index into string pool |
Closure |
3 |
Index into closure pool |
Function |
4 |
Index into function pool |
Upvalue |
5 |
Index into upvalue pool |
Class |
6 |
Index into class pool |
Native |
7 |
Index into native function pool |
Extended Types (Sign bit = 1)
Type |
Tag |
Payload |
|---|---|---|
Instance |
0 |
Index into instance pool |
BoundMethod |
1 |
Index into bound method pool |
Vector |
2 |
Index into vector pool |
SmallString |
3 |
Inline string (up to 5 characters) |
Benefits of NaN-Boxing
Space Efficient: All values fit in 64 bits (8 bytes)
Fast Type Checking: Single bit mask operation
Cache Friendly: Compact representation improves cache locality
Register Passing: Values fit in CPU registers
No Boxing Overhead: No separate heap allocation for primitives
Memory Management
Pool-Based Allocation
The VM uses separate memory pools for each object type:
String Pool:
ElementTypeobjects (variable-length strings)Closure Pool: Function + captured upvalues
Function Pool: Compiled function objects (
ObjFunction)Upvalue Pool: Upvalue objects
Class Pool: Class definitions (
Klass)Native Function Pool: Native C++ function wrappers
Instance Pool: Class instances (
ObjInstance)Bound Method Pool: Instance + method pairs
Vector Pool: Dynamic arrays (tables)
Pool Advantages
Fast Allocation: No system malloc calls for common objects
Better Cache Locality: Objects of same type stored contiguously
Reduced Fragmentation: Fixed-size blocks per pool
Efficient Iteration: Pools can be traversed linearly
Reference Counting
Heap objects use reference counting for automatic memory management:
Tracking:
Value trackNewValue(Value v)
{
pools.getRefCountVector(v)[index] = 1;
return v;
}
Retaining:
Value retainValue(Value v)
{
if (!requiresRefCount(v)) return v;
if (pools.isConstant(v)) return v;
pools.getRefCountVector(v)[index]++;
return v;
}
Releasing:
bool releaseValue(Value v)
{
if (!requiresRefCount(v)) return false;
if (pools.isConstant(v)) return false;
pools.getRefCountVector(v)[index]--;
return (refCount == 0); // Should delete
}
Reference Counting Rules
Constants: Never ref-counted (live forever in constant table)
Primitives: No ref-counting needed (int, bool, double)
Stack Values: Retained when pushed, released when popped
Global Variables: Retained when stored, released on VM shutdown
Upvalues: Special handling for open vs closed states
String Interning
Strings are automatically interned to avoid duplicates:
Hash table maps string content → pool index
Duplicate strings reuse existing pool entry
String comparison becomes integer comparison
Significantly reduces memory usage
Small String Optimization: Strings ≤5 characters stored inline in the Value itself, avoiding pool allocation entirely.
Bytecode Format
Instruction Set
The VM executes a compact bytecode instruction set. Each instruction consists of:
Opcode: 1 byte identifying the operation
Operands: 0-4 bytes depending on instruction
Core Instructions
Stack Operations:
OP_Constant <index> # Push constant from constant table
OP_LongConstant <i24> # Push constant (24-bit index)
OP_Pop # Pop and discard top of stack
OP_PopN <count> # Pop N values from stack
Arithmetic:
OP_Add # Pop b, pop a, push a+b
OP_Subtract # Pop b, pop a, push a-b
OP_Multiply # Pop b, pop a, push a*b
OP_Divide # Pop b, pop a, push a/b
OP_Negate # Pop a, push -a
Comparison:
OP_Equal # Pop b, pop a, push a==b
OP_NotEqual # Pop b, pop a, push a!=b
OP_Greater # Pop b, pop a, push a>b
OP_GreaterEqual # Pop b, pop a, push a>=b
OP_Less # Pop b, pop a, push a<b
OP_LessEqual # Pop b, pop a, push a<=b
Logical:
OP_Not # Pop a, push !a
OP_And # Logical AND (short-circuit)
OP_Or # Logical OR (short-circuit)
Variables:
OP_Get_Local <index> # Push local variable
OP_Set_Local <index> # Pop and store in local variable
OP_Get_Global <index> # Push global variable
OP_Set_Global <index> # Pop and store in global variable
OP_Define_Global <idx> # Pop and define new global variable
Control Flow:
OP_Jump <offset> # Unconditional jump (16-bit offset)
OP_Long_Jump <offset> # Unconditional jump (32-bit offset)
OP_Jump_If_False <off> # Jump if top of stack is false
OP_Loop <offset> # Jump backward (for loops)
Functions:
OP_Call <argCount> # Call function with N arguments
OP_Return # Return from current function
OP_Closure <funcIdx> # Create closure from function
Classes and Objects:
OP_Class <nameIdx> # Define new class
OP_Get_Property <name> # Get instance property
OP_Set_Property <name> # Set instance property
OP_Method <nameIdx> # Define method on class
OP_Invoke <name><argc> # Optimized method call
Metamethods
The VM supports metamethods that allow customization of property access behavior. Metamethods are special methods that intercept property operations.
Supported Metamethods:
__get(self, propertyName)Called when accessing a non-existent property. Returns the value to use.
__set(self, propertyName, value)Called when setting a property. Should return the value that was set.
Metamethod Types
Metamethods can be defined in two ways:
Class Methods: Defined on the class, available to all instances
Dynamic Metamethods: Stored as instance fields, allows per-instance customization
Implementation Details
Property Access (OP_Get_Property)
When accessing a property (instance.property):
Check real fields first: If property exists in instance fields, return it immediately
Check for ``__get``: Look in class methods, then instance fields
Call metamethod: If found, invoke
__get(instance, propertyName)Check class methods: Try to bind a class method
Error: If nothing found, runtime error
Property Assignment (OP_Set_Property)
When setting a property (instance.property = value):
Internal field bypass: Fields starting with
__bypass metamethodsInternal access detection: Direct field access from bound methods bypasses metamethods
Check for ``__set``: Look in class methods, then instance fields
Call metamethod: If found, invoke
__set(instance, propertyName, value)Direct assignment: If no metamethod, store directly in instance fields
Double-Underscore Convention
Fields starting with __ (double underscore) are treated as “internal” and always bypass metamethods. This convention prevents infinite recursion in metamethod implementations.
Example:
class ProxyClass
{
init(x, y)
{
this.__data_x = x // Direct field access, no __set called
this.__data_y = y
}
}
fun createProxy()
{
return fun(self, prop, value) {
// Store data internally without triggering __set recursion
if (prop == "x")
{
self.__data_x = value // Bypasses __set
}
}
}
var obj = ProxyClass(10, 20)
obj.__set = createProxy()
obj.x = 100 // Calls __set, which stores to __data_x
Internal Access Detection
The VM detects when property access occurs from within a bound method of the same instance. In this case, metamethods are bypassed to allow direct field manipulation.
Detection criteria:
Current frame is a bound method call (
stackBase == slots)The accessed instance matches
slots[0](the receiver)Frame count > 1 (not the global script)
This allows methods to directly access their own instance fields without metamethod interception:
class MyClass {
init() {
this.value = 0
}
increment() {
this.value = this.value + 1 // Direct access, no __set called
}
}
Stack Layout for Dynamic Metamethods
When calling a dynamic metamethod (stored in instance field), the VM sets up the stack as:
[closure] [instance] [arg1] [arg2] ...
- For
__get(self, propertyName): Stack:
[__get_closure] [instance] [propertyName]- For
__set(self, propertyName, value): Stack:
[__set_closure] [instance] [propertyName] [value]
This differs from class method metamethods which use bound method calls.
Index Operations
Index operations (instance[key] and instance[key] = value) also support metamethods:
OP_Get_Index: Checks for__getmetamethod if key not foundOP_Set_Index: Checks for__setmetamethod before assignmentSame double-underscore bypass rules apply
Performance Considerations
Real fields are checked before metamethods (O(1) hash lookup)
Metamethod lookup requires two hash lookups (class methods, then instance fields)
Internal field access (
__prefix) is fastest (single hash lookup, no metamethod check)Bound method internal access bypasses metamethod checks entirely
Tables:
OP_Build_Table <pairs> # Create table from N key-value pairs
OP_Build_Vector <count> # Create vector from N values
OP_Get_Index # table[key] - pop key, pop table, push value
OP_Set_Index # table[key]=val - pop val, pop key, pop table
Iterators:
OP_Get_Iterator # Create iterator for table
OP_Iterator_Next # Advance iterator, push next key (or nil)
OP_Table_Size # Get table size
OP_Table_At # Get key at specific index
Optimized Instructions:
OP_Incr_Local <idx> # ++local (pre-increment)
OP_Post_Incr_Local <idx> # local++ (post-increment)
OP_Decr_Local <idx> # --local (pre-decrement)
OP_Post_Decr_Local <idx> # local-- (post-decrement)
OP_AddLL <i1><i2> # Add two local variables
OP_SubtractLL <i1><i2> # Subtract two local variables
OP_Short_Int <value> # Push small integer directly
Chunk Structure
A Chunk represents a compiled function’s bytecode:
struct Chunk
{
std::vector<uint8_t> code; # Bytecode instructions
std::vector<Value> constants; # Constant pool
std::vector<int> lines; # Line numbers for debugging
std::vector<std::string> importedModules; # Module dependencies
};
Function Objects
Compiled functions are stored as ObjFunction objects:
struct ObjFunction
{
Chunk chunk; # Function's bytecode
int arity; # Number of parameters
std::string name; # Function name (for debugging)
int upvalueCount; # Number of captured variables
};
Optimization System
The VM includes a multi-pass bytecode optimization system that transforms bytecode after compilation but before execution.
See Bytecode Optimizer for detailed information about the optimization passes and how they work.
Profiling and Debugging
VM Profiler
The VM includes a built-in profiler to measure bytecode execution performance:
Enable profiling:
vm->enableProfiling();
// ... run script ...
vm->printProfilingReport(true); // Sort by time
vm->printProfilingBytecodeReport();
Profiler Tracks:
Execution count per opcode
Total time spent per opcode
Time per opcode occurrence
Hottest code paths
Debug Output
Enable debug output for optimization passes:
vm->enableOptimizationDebugging();
This prints:
Bytecode before each pass
Bytecode after each pass
Changes made by each pass
Pattern matches and rewrites
Disassembler
The compiler includes a bytecode disassembler for debugging:
disassembleChunk(vm, chunk, "FunctionName");
Output shows:
Bytecode offset
Source line number
Opcode name
Operands (with constant values)
Jump targets
Performance Considerations
Hot Path Optimizations
The VM is optimized for critical hot paths:
- Stack Operations
Direct array access (no bounds checking in release builds)
Inline functions for push/pop
64-bit values passed in registers
Cache-aligned stack array
- Instruction Dispatch
Function pointer table for O(1) dispatch
Cached chunk data pointer
Cached chunk end pointer for fast loop exit
No switch statement overhead
- Value Operations
NaN-boxing for fast type checks (single bitwise AND)
Reference counting for heap objects only
Constant values never ref-counted
Small string optimization avoids pool lookups
- Memory Access
Pool-based allocation reduces cache misses
Contiguous storage for same-type objects
Reference counting vectors parallel to object pools
String interning reduces memory and comparison costs
Best Practices for VM Extension
Adding Native Functions
Register native functions during VM initialization:
vm->registerNative("myFunction", [](VM* vm, int argCount, Value* args) -> Value {
// Validate argument count
if (argCount != 2)
{
return makeBoolValue(false);
}
// Extract arguments
int a = AS_INT(args[0]);
int b = AS_INT(args[1]);
// Perform operation
int result = a + b;
// Return result
return makeIntValue(result);
});
Creating Native Modules
Define native modules with exported functions and variables:
NativeModule mathModule;
mathModule.exportedFunctions["sqrt"] = [](VM* vm, int argCount, Value* args) {
if (argCount != 1 || !IS_DOUBLE(args[0]))
{
return makeIntValue(0); // Error handling
}
double result = std::sqrt(AS_DOUBLE(args[0]));
return makeDoubleValue(result);
};
mathModule.exportedVariables["PI"] = ElementType(3.14159265358979323846);
vm->addNativeModule("math", mathModule);
Module System
Import Resolution
When a script uses import "moduleName":
Check Native Modules: Look for registered native module
Check File System: Look for
moduleName.pgormoduleName.pgcLoad and Execute: Import module and add exports to global scope
Cache: Module loaded once, subsequent imports reuse cached version
Bytecode Caching
Compiled bytecode can be saved and loaded:
Compilation:
vm->interpretFromFile("script.pg", false, "script.pgc");
// Creates script.pgc bytecode file
Execution:
vm->interpretFromBytecodeFile("script.pgc");
// Faster loading, no parsing/compilation
Cached bytecode includes:
All bytecode instructions
Constant table
Function definitions
Imported module list
Error Handling
The VM provides detailed error information:
- Compile Errors
Syntax errors with line numbers
Type mismatches
Undefined variables
- Runtime Errors
Full stack trace
Line numbers for each frame
Function names
Error message
Example error output:
[ERROR] VM: Undefined variable 'foo'
[ERROR] VM: [line 15] in myFunction
[ERROR] VM: [line 23] in main
Error Recovery
Runtime errors use setjmp/longjmp for fast unwinding:
Error detected during execution
longjmpto error handlerStack automatically unwound
All reference counts decremented
VM state reset for next execution
Advanced Topics
JIT Considerations
While the current VM is an interpreter, the architecture supports future JIT compilation:
Bytecode designed for linear execution
Pool indices can be patched for direct pointers
Function pointer dispatch can be inlined
Stack-based model maps well to register allocation
Threading Model
Each VM instance is single-threaded:
No locks needed during execution
Multiple VMs can run in separate threads
Native functions must be thread-safe
Shared data requires external synchronization
Extensions
The VM can be extended with:
Custom native modules
Additional bytecode instructions
New optimization passes
Custom object types in pools
Specialized calling conventions
Further Reading
Script Language API Reference - Complete language API reference
Bytecode Optimizer - Bytecode optimization system
Script Language Overview - Language overview and features