Rust Port: Architecture Guide
Reference for key data structures, patterns, and constraints in the Rust compiler port. See rust-port-research.md for detailed per-pass analysis and rust-port-notes.md for the original design decisions.
Arenas and ID Types
All shared mutable data is stored in arenas on Environment, referenced by copyable ID types. This replaces JavaScript's shared object references.
| Arena | ID Type | Stored On | Replaces |
|---|---|---|---|
identifiers: Vec<Identifier> |
IdentifierId |
Environment |
Shared Identifier object references across Place values |
scopes: Vec<ReactiveScope> |
ScopeId |
Environment |
Shared ReactiveScope references across identifiers |
functions: Vec<HIRFunction> |
FunctionId |
Environment |
Inline HIRFunction on FunctionExpression/ObjectMethod |
types: Vec<Type> |
TypeId |
Environment |
Inline Type on Identifier |
All ID types are Copy + Clone + Hash + Eq + PartialEq newtypes wrapping u32.
Instructions and EvaluationOrder
HirFunction.instructions: Vec<Instruction>— flat instruction tableBasicBlock.instructions: Vec<InstructionId>— indices into the table above- The old TypeScript
InstructionIdis renamed toEvaluationOrder— it represents evaluation order and appears on both instructions and terminals - The new
InstructionIdis an index intoHirFunction.instructions, giving passes a single copyable ID to reference any instruction
Place is Clone, MutableRange is on Identifier/Scope
Place stores an IdentifierId (not a shared reference), making it small and cheap to clone. Mutation of mutable_range goes through the identifier arena:
env.identifiers[place.identifier].mutable_range.end = new_end;
After InferReactiveScopeVariables, an identifier's effective mutable range is its scope's range. Downstream passes access this through the scope arena:
let range = match env.identifiers[id].scope {
Some(scope_id) => env.scopes[scope_id].range,
None => env.identifiers[id].mutable_range,
};
Function Arena and FunctionId
FunctionExpression and ObjectMethod instruction values store a FunctionId instead of an inline HIRFunction. Inner functions are accessed via the arena:
let inner = &env.functions[function_id]; // read
let inner = &mut env.functions[function_id]; // write
This makes CreateFunction aliasing effects store FunctionId, and function signature caches key by FunctionId.
AliasingEffect
Effects own cloned Place values (cheap since Place contains IdentifierId). Key variants:
Apply— clones the argsVec<PlaceOrSpreadOrHole>from the instruction valueCreateFunction— storesFunctionId(not theFunctionExpressionitself), plus clonedcaptures: Vec<Place>
Effect interning uses content hashing. The interned EffectId serves as both dedup key and allocation-site identity for abstract interpretation in InferMutationAliasingEffects.
Environment: Separate from HirFunction
HirFunction does not store env. Passes receive env: &mut Environment as a separate parameter. Fields are flat (no sub-structs) to allow precise sliced borrows:
// Simultaneous borrow of different fields is fine:
let id = &env.identifiers[some_id];
let scope = &env.scopes[some_scope_id];
Ordered Maps
Use IndexMap/IndexSet (from the indexmap crate) wherever the TypeScript uses Map/Set and iteration order matters. The primary case is HIR.blocks: IndexMap<BlockId, BasicBlock> which maintains reverse postorder.
Side Maps
Side maps fall into four categories:
- ID-only maps —
HashMap<IdType, T>/HashSet<IdType>. No borrow issues. Most passes use this. - Reference-identity maps — TypeScript
Map<Identifier, T>becomesHashMap<IdentifierId, T>. SimilarlyDisjointSet<Identifier>becomesDisjointSet<IdentifierId>,DisjointSet<ReactiveScope>becomesDisjointSet<ScopeId>. - Instruction/value reference maps — Store
InstructionIdorFunctionIdinstead of references. Access the actual data through the instruction table or function arena when needed. - Scope reference sets with mutation — Store
ScopeIdin sets. Mutate through the arena:env.scopes[scope_id].range.start = new_start.
When a pass needs to both iterate over data and mutate the HIR, use two-phase collect/apply: collect IDs or updates into a Vec, then apply mutations in a second loop.
Error Handling
| TypeScript Pattern | Rust Approach |
|---|---|
Non-null assertion (!) |
.unwrap() (panic) |
CompilerError.invariant(), CompilerError.throwTodo(), throw ... |
Return Err(CompilerDiagnostic) via Result |
env.recordError() or pushDiagnostic() with an invariant error |
Return Err(CompilerDiagnostic) |
env.recordError() or pushDiagnostic() with a NON invariant error |
Keep as-is — accumulate on Environment |
Preserve full error details: reason, description, location, suggestions, category.
JS→Rust Boundary
The JS side serializes the Babel AST and Babel's scope information (scope tree, bindings, reference-to-binding map) to Rust. Keep this serialization thin: only send the core data structures that Babel already computed during parsing. Any derived analysis — identifier source locations, JSX classification, captured variables, etc. — should be computed on the Rust side by walking the AST. See scope.ts.
Pipeline and Pass Structure
fn compile(ast: Ast, scope: Scope, env: &mut Environment)
-> Result<CompileResult, CompilerDiagnostic>
{
let mut hir = lower(ast, scope, env)?;
some_pass(&mut hir, env)?;
// ...
let ast = codegen(...)?;
if env.has_errors() {
Ok(CompileResult::Failure(env.take_errors()))
} else {
Ok(CompileResult::Success(ast))
}
}
Pass signatures follow these patterns:
// Most passes: mutable HIR + mutable environment
fn pass(func: &mut HirFunction, env: &mut Environment) -> Result<(), CompilerDiagnostic>;
// Passes that don't need env
fn pass(func: &mut HirFunction);
// Validation passes: read-only HIR, env for error recording
fn validate(func: &HirFunction, env: &mut Environment) -> Result<(), CompilerDiagnostic>;
Use ? to propagate errors that would have thrown or short-circuited in TypeScript. Non-fatal errors are accumulated on env and checked at the end via env.has_errors().
Structural Similarity to TypeScript
Target ~85-95% structural correspondence. A developer should be able to view TypeScript and Rust side-by-side and trace the logic. The ported code should preserve:
- Same high-level data flow through the code. Only deviate where strictly necessary due to data model differences (arenas, borrow checker workarounds, etc.).
- Same grouping of types, functions, and "classes" (structs with methods) into files. A TypeScript file maps to a Rust file with the same logical contents.
- Similar filenames, type names, and identifier names, adjusted for Rust naming conventions (
camelCase->snake_casefor functions/variables,PascalCasepreserved for types). - Crate structure: The monolithic
babel-plugin-react-compilerpackage is split into crates, roughly 1:1 by top-level folder (e.g.,src/HIR/-> a crate,src/Inference/-> a crate, etc.). We split the lowering logic (BuildHIR and HIRBuilder) into react_compiler_lowering bc of its complexity.
Key mechanical translations:
| TypeScript | Rust |
|---|---|
switch (value.kind) |
match &value (exhaustive) |
Map<Identifier, T> |
HashMap<IdentifierId, T> |
for...of with Set.delete() |
set.retain(|x| ...) |
instr.value = { kind: 'X', ... } |
std::mem::replace + reconstruct |
{ ...place, effect: Effect.Read } |
Place { effect: Effect::Read, ..place.clone() } |
array.filter(x => ...) |
vec.retain(|x| ...) |
identifier.mutableRange.end = x |
env.identifiers[id].mutable_range.end = x |
| Builder closures setting outer variables | Return values from closures |