Protocol Buffers Support Implementation Plan
Overview
This plan details the implementation of native Protocol Buffers (Protobuf) support within lodum via its AST compiler, addressing issue #47. The goal is to enable lodum to efficiently serialize and deserialize objects that correspond to Protobuf message definitions, allowing for cross-format compatibility and leveraging lodum's performance optimizations.
Current State Analysis
Our research into the Python Protobuf library reveals:
* Protobuf messages are defined in .proto files.
* The protoc compiler generates Python classes (_pb2.py files) from .proto definitions.
* These generated Python classes provide methods like SerializeToString() and ParseFromString() for binary serialization, and allow direct attribute access for fields.
* lodum currently operates on standard Python classes decorated with @lodum, relying on __init__ type hints for schema inference.
Implementation Approach
The approach will involve creating a ProtobufLoader and ProtobufDumper that interact with the Python classes generated by protoc. lodum's AST compiler will then generate optimized handlers to efficiently transfer data between @lodum objects and these Protobuf message instances. A key prerequisite will be a clear mapping from Protobuf message definitions to @lodum class definitions.
Phase 1: Protobuf Dumper and Loader for existing Protobuf objects
Overview
This phase will implement ProtobufDumper and ProtobufLoader classes that can serialize @lodum objects to existing _pb2 Protobuf message instances, and deserialize _pb2 Protobuf message instances into @lodum objects. The lodum compiler will generate handlers to optimize this data transfer.
Changes Required:
1. src/lodum/ext/protobuf.py: Create new module
Changes:
- Create a new file src/lodum/ext/protobuf.py.
- Define ProtobufDumper and ProtobufLoader classes that inherit from lodum.core.BaseDumper and lodum.core.BaseLoader respectively.
- These classes will be adapted to interact with _pb2.Message objects. The ProtobufDumper would populate a _pb2.Message instance, and the ProtobufLoader would read from one.
# src/lodum/ext/protobuf.py
from typing import Any, Type
from google.protobuf.message import Message as ProtobufMessage # Protobuf base message class
from ..core import BaseDumper, BaseLoader, T
from ..internal import dump, load
class ProtobufDumper(BaseDumper):
def __init__(self, message: ProtobufMessage):
self._message = message
def dump_int(self, value: int) -> Any:
# Placeholder: This will eventually be optimized by codegen to set directly
return value
# ... implement other dump_ methods to populate self._message ...
def begin_struct(self, cls: Type) -> Any:
# For nested messages, create a new ProtobufMessage instance
# This will be passed to subsequent recursive dump calls
return self._message.DESCRIPTOR.fields_by_name[cls.__name__.lower()].message_type._concrete_class()
def end_struct(self) -> Any:
pass # Protobuf messages don't have an explicit end_struct
# --- Public API for lodum.protobuf.dumps ---
def dumps_to_message(obj: Any, message: ProtobufMessage) -> ProtobufMessage:
dumper = ProtobufDumper(message)
# The internal dump will use the dumper to populate the message
dump(obj, dumper)
return message
class ProtobufLoader(BaseLoader):
def __init__(self, message: ProtobufMessage):
self._data = message # Treat the ProtobufMessage as the source data
def load_int(self) -> int:
# Placeholder: This will eventually be optimized by codegen to read directly
return self._data # This needs to be adapted to read the correct field
# ... implement other load_ methods to read from self._data (ProtobufMessage) ...
def get_dict(self) -> Optional[Dict[str, Any]]:
# This can be used to convert Protobuf message to a dict-like structure for lodum's codegen
return self._data.AsDict(preserving_proto_field_name=True) # Or iterate fields
# --- Public API for lodum.protobuf.loads ---
def loads_from_message(cls: Type[T], message: ProtobufMessage) -> T:
loader = ProtobufLoader(message)
return load(cls, loader)
2. tests/test_protobuf.py: Add new test file for Protobuf support
Changes:
- Create a new test file tests/test_protobuf.py.
- Define a sample .proto file and generate _pb2.py during test setup (or use a pre-generated one).
- Add test cases covering:
- Serialization of a @lodum object to a Protobuf message.
- Deserialization of a Protobuf message to a @lodum object.
- Handling of nested messages, repeated fields, and enums.
# tests/test_protobuf.py
import pytest
import subprocess
import os
from lodum import lodum
from lodum.ext.protobuf import dumps_to_message, loads_from_message
# Assume person_pb2 is generated from a .proto file during test setup
# from . import person_pb2 # Example
# Setup: Generate protobuf code if it doesn't exist
@pytest.fixture(scope="module", autouse=True)
def generate_protobuf_code():
proto_dir = os.path.join(os.path.dirname(__file__), "proto_test")
os.makedirs(proto_dir, exist_ok=True)
proto_file_path = os.path.join(proto_dir, "person.proto")
pb2_file_path = os.path.join(proto_dir, "person_pb2.py")
if not os.path.exists(pb2_file_path):
proto_content = """
syntax = "proto3";
package test_proto;
message Person {
string name = 1;
int32 id = 2;
}
"""
with open(proto_file_path, "w") as f:
f.write(proto_content)
# Run protoc
subprocess.run(["protoc", "--python_out=.", proto_file_path], check=True, cwd=proto_dir)
# Add proto_test directory to sys.path for import
import sys
sys.path.insert(0, proto_dir)
yield
sys.path.remove(proto_dir)
# Clean up generated files (optional)
# shutil.rmtree(proto_dir)
# Import generated code after setup
from proto_test import person_pb2
@lodum
class LodumPerson:
def __init__(self, name: str, id: int):
self.name = name
self.id = id
def test_dumps_to_protobuf_message():
lodum_person = LodumPerson(name="Alice", id=123)
proto_message = person_pb2.Person()
dumps_to_message(lodum_person, proto_message)
assert proto_message.name == "Alice"
assert proto_message.id == 123
def test_loads_from_protobuf_message():
proto_message = person_pb2.Person(name="Bob", id=456)
lodum_person = loads_from_message(LodumPerson, proto_message)
assert lodum_person.name == "Bob"
assert lodum_person.id == 456
Success Criteria:
Automated:
- [ ]
PYTHONPATH=src pytest tests/test_protobuf.pypasses.
Manual:
- [ ] Verify that
@lodumobjects can be correctly converted to and from_pb2Protobuf messages.
Phase 2: Compiler Optimizations for Protobuf Dumper/Loader
Overview
Integrate the ProtobufDumper and ProtobufLoader into lodum's AST compilation pipeline. This will generate specialized code paths for interacting with Protobuf message attributes directly, bypassing generic reflection and AsDict() calls for maximum performance.
Changes Required:
src/lodum/compiler/dump_codegen.py/load_codegen.py: Changes:- Modify the codegen to recognize
google.protobuf.message.Messagetypes. - For these types, generate AST that directly accesses
_pb2.Messageattributes (e.g.,message.name = value) instead of using genericdumper.dump_str()orloader.load_str().
- Modify the codegen to recognize
src/lodum/ext/protobuf.py: Changes:- Update
ProtobufDumperandProtobufLoaderto have theirdump_/load_methods optimized by thelodumcompiler. This might involve custom logic for repeated fields and nested messages.
- Update
tests/test_protobuf.py: Changes:- Add benchmarks comparing
lodum-optimized Protobuf conversion performance against direct_pb2message manipulation.
- Add benchmarks comparing
Success Criteria:
Automated:
- [ ] New benchmarks demonstrate significant performance improvement for
lodum-optimized Protobuf conversions.
Manual:
- [ ] Verify that the generated
lodumhandlers for Protobuf types are more efficient than reflection-based approaches.
Review Criteria (Self-Critique)
- Specificity: High, providing explicit code modifications, test examples, and addressing the integration with
_pb2messages andlodum's compiler. - Verification: Includes both automated and manual success criteria.
- Phasing: Logically separates basic functionality from compiler optimizations.