Memory-Aware Serialization (Lazy Loading & Quantization)
Overview
This plan details the implementation of "Memory-Aware" serialization in lodum. The goal is to allow Data Analysts to work with large, quantized datasets (e.g., ML weights, high-frequency sensor data) in memory-constrained environments like Pyodide/WASM without triggering Out-Of-Memory (OOM) errors.
Problem Statement
Standard Python serialization (and many modern alternatives) expands data upon loading. For example, 1GB of 4-bit quantized data often expands to 8GB or more when converted to standard Python floats or NumPy arrays. In environments like WASM with a 4GB memory ceiling, this makes many data science tasks impossible.
Implementation Approach
1. Lazy Field & Partial Object Loading
Introduce a lazy=True option to field(). This provides two tiers of memory optimization:
- Lazy Tensor/Array Access: For large arrays (NumPy/Tensors), the loader captures the byte-offset and length, returning a proxy. Materialization occurs only when slices are accessed.
- Lazy Field Access (Complex Objects): For nested
@lodumobjects or large dictionaries, the field is skipped during initial parse. Accessing the attribute triggers a targeted "sub-parse" of only that section of the buffer.
@lodum
class LargeModel:
metadata: Dict[str, Any]
weights: np.ndarray = field(lazy=True) # Tensor proxy
extra_data: ComplexInfo = field(lazy=True) # Object proxy
- Generated Bytecode: The compiler generates a "skip-and-record" instruction that stores the buffer reference and bounds for the lazy field.
- Seekable Requirements: This feature requires a seekable data source (e.g.,
BytesIOormmap) to allow random access to lazy segments.
2. Quantization-Aware Handlers (lodum.ext.ml)
To maintain a lean core, advanced ML-specific logic (bit-packing, quantization scales) will reside in the lodum.ext.ml namespace. This will be an optional "extra" (pip install "lodum[ml]").
- Bit-Packed Dtypes: Support for
q4_0,q4_k,int8, etc. - AST Generation: The
load_codegenengine will generate specialized bit-shifting loops ((val >> 4) & 0x0F) to unpack data directly into the target representation. - Metadata Coupling: Support for block-level quantization parameters (scales and zero-points) that are read and applied during access.
3. Zero-Copy Architecture & Memory Mapping
Leverage Python's memoryview and mmap to avoid memory expansion.
- Buffer Management: Use
memoryviewto slice the input buffer without copying. - mmap Integration: When loading from a file,
lodumcan automatically usemmapto map the file into address space, allowing the OS to handle paging and keeping the Python heap footprint minimal. - Interleaved Data: Support formats like GGUF and Safetensors, allowing the parser to jump between descriptors and large data blocks.
Use Cases
- Browser-based LLMs: Managing model metadata and weights in Pyodide without crashing the tab.
- Edge Computing: Analyzing high-frequency sensor streams on resource-constrained hardware.
- Dequantize-on-Access: Providing a NumPy-compatible interface that only performs floating-point math on the specific slice being accessed.
Relationship to Streaming Support
While both this plan and the Streaming Support Plan address memory constraints, they solve different problems:
- Streaming (
load_stream): Focuses on horizontal scale (iterating over millions of objects). It works on sequential, non-seekable streams (like network sockets). - Memory-Aware (
lazy=True): Focuses on vertical depth (handling massive fields within a single object). It requires a seekable source (like a local file orBytesIO) to allow the proxy to "jump back" and read data on-demand.
Synergy: When combined, lodum can stream a large list of objects where each individual object is also lazily loaded, providing maximum memory efficiency for complex data science pipelines.
Milestones
- Phase 1: Prototype lazy field skipping in the AST compiler.
- Phase 2: Implement basic
QuantizedArrayproxy for 8-bit data. - Phase 3: Add bit-packed (4-bit) AST generation logic.
- Phase 4: Full integration with NumPy/Polars for zero-copy views.