Skip to content

Protocol Buffers Support Implementation Plan

Overview

This plan details the implementation of native Protocol Buffers (Protobuf) support within lodum via its AST compiler, addressing issue #47. The goal is to enable lodum to efficiently serialize and deserialize objects that correspond to Protobuf message definitions, allowing for cross-format compatibility and leveraging lodum's performance optimizations.

Current State Analysis

Our research into the Python Protobuf library reveals: * Protobuf messages are defined in .proto files. * The protoc compiler generates Python classes (_pb2.py files) from .proto definitions. * These generated Python classes provide methods like SerializeToString() and ParseFromString() for binary serialization, and allow direct attribute access for fields. * lodum currently operates on standard Python classes decorated with @lodum, relying on __init__ type hints for schema inference.

Implementation Approach

The approach will involve creating a ProtobufLoader and ProtobufDumper that interact with the Python classes generated by protoc. lodum's AST compiler will then generate optimized handlers to efficiently transfer data between @lodum objects and these Protobuf message instances. A key prerequisite will be a clear mapping from Protobuf message definitions to @lodum class definitions.


Phase 1: Protobuf Dumper and Loader for existing Protobuf objects

Overview

This phase will implement ProtobufDumper and ProtobufLoader classes that can serialize @lodum objects to existing _pb2 Protobuf message instances, and deserialize _pb2 Protobuf message instances into @lodum objects. The lodum compiler will generate handlers to optimize this data transfer.

Changes Required:

1. src/lodum/ext/protobuf.py: Create new module

Changes: - Create a new file src/lodum/ext/protobuf.py. - Define ProtobufDumper and ProtobufLoader classes that inherit from lodum.core.BaseDumper and lodum.core.BaseLoader respectively. - These classes will be adapted to interact with _pb2.Message objects. The ProtobufDumper would populate a _pb2.Message instance, and the ProtobufLoader would read from one.

# src/lodum/ext/protobuf.py

from typing import Any, Type
from google.protobuf.message import Message as ProtobufMessage # Protobuf base message class
from ..core import BaseDumper, BaseLoader, T
from ..internal import dump, load

class ProtobufDumper(BaseDumper):
    def __init__(self, message: ProtobufMessage):
        self._message = message

    def dump_int(self, value: int) -> Any:
        # Placeholder: This will eventually be optimized by codegen to set directly
        return value

    # ... implement other dump_ methods to populate self._message ...

    def begin_struct(self, cls: Type) -> Any:
        # For nested messages, create a new ProtobufMessage instance
        # This will be passed to subsequent recursive dump calls
        return self._message.DESCRIPTOR.fields_by_name[cls.__name__.lower()].message_type._concrete_class()

    def end_struct(self) -> Any:
        pass # Protobuf messages don't have an explicit end_struct

# --- Public API for lodum.protobuf.dumps ---
def dumps_to_message(obj: Any, message: ProtobufMessage) -> ProtobufMessage:
    dumper = ProtobufDumper(message)
    # The internal dump will use the dumper to populate the message
    dump(obj, dumper)
    return message


class ProtobufLoader(BaseLoader):
    def __init__(self, message: ProtobufMessage):
        self._data = message # Treat the ProtobufMessage as the source data

    def load_int(self) -> int:
        # Placeholder: This will eventually be optimized by codegen to read directly
        return self._data # This needs to be adapted to read the correct field

    # ... implement other load_ methods to read from self._data (ProtobufMessage) ...

    def get_dict(self) -> Optional[Dict[str, Any]]:
        # This can be used to convert Protobuf message to a dict-like structure for lodum's codegen
        return self._data.AsDict(preserving_proto_field_name=True) # Or iterate fields

# --- Public API for lodum.protobuf.loads ---
def loads_from_message(cls: Type[T], message: ProtobufMessage) -> T:
    loader = ProtobufLoader(message)
    return load(cls, loader)

2. tests/test_protobuf.py: Add new test file for Protobuf support

Changes: - Create a new test file tests/test_protobuf.py. - Define a sample .proto file and generate _pb2.py during test setup (or use a pre-generated one). - Add test cases covering: - Serialization of a @lodum object to a Protobuf message. - Deserialization of a Protobuf message to a @lodum object. - Handling of nested messages, repeated fields, and enums.

# tests/test_protobuf.py

import pytest
import subprocess
import os
from lodum import lodum
from lodum.ext.protobuf import dumps_to_message, loads_from_message

# Assume person_pb2 is generated from a .proto file during test setup
# from . import person_pb2 # Example

# Setup: Generate protobuf code if it doesn't exist
@pytest.fixture(scope="module", autouse=True)
def generate_protobuf_code():
    proto_dir = os.path.join(os.path.dirname(__file__), "proto_test")
    os.makedirs(proto_dir, exist_ok=True)
    proto_file_path = os.path.join(proto_dir, "person.proto")
    pb2_file_path = os.path.join(proto_dir, "person_pb2.py")

    if not os.path.exists(pb2_file_path):
        proto_content = """
            syntax = "proto3";
            package test_proto;
            message Person {
              string name = 1;
              int32 id = 2;
            }
        """
        with open(proto_file_path, "w") as f:
            f.write(proto_content)

        # Run protoc
        subprocess.run(["protoc", "--python_out=.", proto_file_path], check=True, cwd=proto_dir)

    # Add proto_test directory to sys.path for import
    import sys
    sys.path.insert(0, proto_dir)
    yield
    sys.path.remove(proto_dir)
    # Clean up generated files (optional)
    # shutil.rmtree(proto_dir)

# Import generated code after setup
from proto_test import person_pb2

@lodum
class LodumPerson:
    def __init__(self, name: str, id: int):
        self.name = name
        self.id = id

def test_dumps_to_protobuf_message():
    lodum_person = LodumPerson(name="Alice", id=123)
    proto_message = person_pb2.Person()
    dumps_to_message(lodum_person, proto_message)
    assert proto_message.name == "Alice"
    assert proto_message.id == 123

def test_loads_from_protobuf_message():
    proto_message = person_pb2.Person(name="Bob", id=456)
    lodum_person = loads_from_message(LodumPerson, proto_message)
    assert lodum_person.name == "Bob"
    assert lodum_person.id == 456

Success Criteria:

Automated:

  • [ ] PYTHONPATH=src pytest tests/test_protobuf.py passes.

Manual:

  • [ ] Verify that @lodum objects can be correctly converted to and from _pb2 Protobuf messages.

Phase 2: Compiler Optimizations for Protobuf Dumper/Loader

Overview

Integrate the ProtobufDumper and ProtobufLoader into lodum's AST compilation pipeline. This will generate specialized code paths for interacting with Protobuf message attributes directly, bypassing generic reflection and AsDict() calls for maximum performance.

Changes Required:

  1. src/lodum/compiler/dump_codegen.py / load_codegen.py: Changes:
    • Modify the codegen to recognize google.protobuf.message.Message types.
    • For these types, generate AST that directly accesses _pb2.Message attributes (e.g., message.name = value) instead of using generic dumper.dump_str() or loader.load_str().
  2. src/lodum/ext/protobuf.py: Changes:
    • Update ProtobufDumper and ProtobufLoader to have their dump_/load_ methods optimized by the lodum compiler. This might involve custom logic for repeated fields and nested messages.
  3. tests/test_protobuf.py: Changes:
    • Add benchmarks comparing lodum-optimized Protobuf conversion performance against direct _pb2 message manipulation.

Success Criteria:

Automated:

  • [ ] New benchmarks demonstrate significant performance improvement for lodum-optimized Protobuf conversions.

Manual:

  • [ ] Verify that the generated lodum handlers for Protobuf types are more efficient than reflection-based approaches.

Review Criteria (Self-Critique)

  • Specificity: High, providing explicit code modifications, test examples, and addressing the integration with _pb2 messages and lodum's compiler.
  • Verification: Includes both automated and manual success criteria.
  • Phasing: Logically separates basic functionality from compiler optimizations.