Skip to content

Relationships

AMSDAL has three relation types — all bidirectional and all version-aware on lakehouse. This page is the single source of truth for how they're declared, accessed, written, and queried.

Direction Declared by Auto-installed accessor Returns
Forward FK book.author: Author or ReferenceField book.author resolved Author (or coroutine in async)
Reverse FK implicit on FK target author.book_set (default name) RelatedSet[Book]
Many-to-Many tags: list[Tag] or ManyReferenceField post.tags (instance) + Post.tags_through (class) RelatedSet[Tag] (instance) / type[ThroughModel] (class)

See Field Types for the field declarations. This doc covers runtime behavior — accessing, writing, querying, and the lakehouse-specific historical semantics.


Forward References (FK)

Sync access

book.author triggers ReferenceLoader.load_reference() on first access; the result is cached on the instance under __fk_resolved_cache__. Repeated reads of the same FK on the same instance are free.

book = Book.objects.get(title='Dune').execute()
print(book.author.name)        # one extra SELECT
print(book.author.email)       # cache hit, no query

Async access

In async mode book.author returns a coroutine — sync ReferenceLoader.load_reference() is forbidden in async context. Await it once and reuse:

book = await Book.objects.get(title='Dune').aexecute()
author = await book.author      # one await
print(author.name, author.email)

Raw reference access — _<fk>_ref

Each forward FK exposes two attributes: the public descriptor (resolves to a Model) and a private slot _<fk>_ref (holds the raw Reference without I/O).

Use cases where you'd touch _<fk>_ref directly:

from amsdal_utils.models.data_models.reference import Reference

# 1. Avoid lazy resolution in async hot paths (always sync, any mode)
raw = book._author_ref

# 2. Distinguish unresolved Reference vs resolved Model — no I/O
if isinstance(raw, Reference):
    schedule_batch_resolve(raw)
elif raw is not None:
    print(raw.name)   # already resolved (e.g. via prefetch_related)

# 3. Serialization without resolving — model_dump(by_alias=False) yields raw refs
data = book.model_dump(by_alias=False)
# {'_author_ref': {'ref': {...}}, 'title': 'Dune'}

Caveat — soft-deleted FK target

A forward FK whose target was soft-deleted behaves inconsistently across backends:

  • State backendReferenceLoader.load_reference() fails internally (object not visible in state). The descriptor swallows the failure and leaves book.author as a dangling Reference. Accessing a field on it (book.author.name) raises AttributeError.
  • Lakehouse backend — the historical row is loaded successfully; book.author returns a tombstone Model with _metadata.is_deleted == True. Attribute access does not raise.

Defensive workaround:

def safe_fk(instance, attr):
    raw = getattr(instance, attr, None)
    if raw is None:
        return None
    from amsdal_utils.models.data_models.reference import Reference
    if isinstance(raw, Reference):
        return None        # unresolved / dangling on state
    metadata = getattr(raw, '_metadata', None)
    if metadata is not None and getattr(metadata, 'is_deleted', False):
        return None        # tombstone on lakehouse
    return raw

loc = safe_fk(employee, 'location')

Lakehouse-only — filter at query level via a per-hop directive (see Field Lookups — Per-Hop Version Overrides):

Employee.objects.using('lakehouse').filter(
    first_name='E',
    location___metadata__is_deleted=False,
).execute()

Reverse References (Reverse FK)

Whenever you declare Book.author: Author = ReferenceField(...), the framework auto-installs author.book_set on Author returning RelatedSet[Book]. Naming is via related_name='custom' or related_name='+' to disable (see Field Types — Reverse accessor naming).

author.book_set follows the RelatedSet contract (see below). Writes (add/set) re-point children's FK column to the parent; remove/clear/shrinking set try to set the child's FK to None.

Non-nullable detach behavior

When the child's FK is non-nullable, remove() / clear() / shrinking set() silently buffer the detach; at parent.save() the flush attempts child.fk = None → Pydantic validation fails → pydantic.ValidationError is raised (with an error entry pointing to the FK field, input is None).

import pytest
from pydantic import ValidationError

c.employee_set.remove(e)            # buffered, no DB hit
with pytest.raises(ValidationError):
    c.save()                        # flush fails: company is required

# ✅ Reparent
c2.employee_set.add(e); c2.save()

# ✅ Hard-delete the child
e.delete()

Many-to-Many

Auto-through vs custom-through

# Auto-through (no extras)
class Post(Model):
    tags: list[Tag]

# Custom-through (extra columns on the link table)
class PostTag(Model):
    post: 'Post' = ReferenceField(db_field='post_id')
    tag: Tag = ReferenceField(db_field='tag_id')
    weight: int = 0

class Post(Model):
    tags: list[Tag] = ManyReferenceField(through=PostTag, through_fields=('post', 'tag'))

Both expose post.tags (a RelatedSet[Tag]) and Post.tags_through (the through-model class itself — see below).

Constructor M2M assignment

post = Post(title='Hello', tags=[t1, t2, t3])         # pre-saved Models
post = Post(title='Hello', tags=[ref_t1, t2])         # mix of References / Models
post = Post(title='Hello', tags=[Tag(name='new')])    # unsaved Tag — auto-saved on parent.save()

Custom-through restriction

For a custom-through M2M, add / remove / set raise AmsdalError:

Cannot {op}() on M2M 'tags' with a custom through-model; create PostTag rows directly.

clear() is allowed — it operates on through rows generically and does not need to construct them.

Workaround — write through rows directly:

# Add: instantiate the through model and save it
PostTag(post=p, tag=t, weight=5).save()
# (or .asave() in async mode)

# Remove a specific link
links = PostTag.objects.filter(post=p, tag=t).execute()
for link in links:
    link.delete()

# Bulk-remove all of a parent's links — clear() is OK even on custom-through
p.tags.clear()
p.save()

add() idempotency

Dedup is by object_id (the target's identity), not by Python id(). Two Tag instances loaded twice from DB (same PK) collapse to one through-row. add → remove → add ends with the link present (set semantics).

add() non-latest reference guard

post.tags.add(ref) rejects an explicit non-LATEST Reference with ValueError('Cannot add a non-latest reference to M2M ...'). Similarly for a Model whose _metadata.is_latest is False. Rationale: an M2M link points at the target's identity, not at a specific historical version.

Workaround — link to the LATEST version:

from amsdal_utils.models.data_models.reference import Reference, ReferenceData
from amsdal_utils.models.enums import Versions

# Option 1 — build a LATEST-pinned Reference with the same object_id
latest_ref = Reference(ref=ReferenceData(
    resource='', class_name='Tag', class_version=Versions.LATEST,
    object_id=old_ref.ref.object_id,
    object_version=Versions.LATEST,
))
post.tags.add(latest_ref)

# Option 2 — re-fetch the model and add the instance
tag = Tag.objects.get(_address__object_id=old_ref.ref.object_id).execute()
post.tags.add(tag)

# Option 3 (custom-through) — write the through row yourself; you can carry
# the frozen target version on the through-row's target FK if needed
PostTag(post=p, tag=old_ref, weight=5).save()

Through-model class-level access — Post.tags_through

Post.tags_through is the through-model class (not a wrapper). It exposes .objects and can be queried like any model. Useful for active-vs-historical link queries, filtering by extra through columns, or working with explicit version scopes.

# Current active through-rows for a given post (state default)
links = Post.tags_through.objects.filter(post=p1).execute()

# All historical link versions (lakehouse)
from amsdal_utils.models.enums import Versions

historical = (
    Post.tags_through.objects
    .using('lakehouse')
    .filter(
        _address__object_version=Versions.ALL,
        _metadata__is_deleted=False,
    )
    .execute()
)

# Eagerly join both sides
joined = Post.tags_through.objects.select_related('post', 'tag').filter(tag=t1).execute()

RelatedSet — detailed reference

RelatedSet[T] is the runtime object returned by both reverse-FK (author.book_set) and M2M (post.tags) accessors. It's a list subclass with cache-aware deferred semantics and a unit-of-work write buffer.

Cache states

State Cache loaded? Pending writes? Read behavior
Fresh no no Reads issue minimal SQL on demand (SELECT COUNT(*), SELECT 1 LIMIT 1, etc.)
Prefetched yes no Reads use cache, no DB hit
Dirty maybe yes Reads use cache ± pending diffs
Cleared yes (empty) maybe _pending_clear short-circuits to "empty" until flush

Read operations and emitted SQL

Operation SQL when cache empty
len(rs) / rs.count() SELECT COUNT(*)
bool(rs) / rs.exists() SELECT 1 ... LIMIT 1
rs[i] / rs[i:j] SELECT ... LIMIT/OFFSET
for x in rs / list(rs) SELECT * (full materialization)

bool(rs) short-circuits to True if _pending_add is non-empty (no DB hit); to False if _pending_clear is set with no pending adds.

Sync vs async — mode rules

  • Writes (add / remove / set / clear) are synchronous in any mode. They mutate in-memory only; no I/O.
  • Magic methods (len, iter, [i], in, bool) work in any mode when cache is populated; in async mode without cache they raise — call await rs (or async for x in rs) first to populate the cache.
  • Named reads are strict by mode: use count() / exists() in sync mode, acount() / aexists() in async mode.

set() semantics

rs.set([a, b, c]) is equivalent to clear() + add(a, b, c). The flush emits a DELETE-all-then-INSERT pattern — not an optimal diff. If your collection is large and the difference is small, prefer paired remove() / add() calls.

filter / exclude / order_by chain

Calling these on a RelatedSet returns a target-model QuerySet scoped by parent membership (an EXISTS subquery is wired in automatically):

tags = post.tags.filter(name__icontains='py').order_by('name').execute()
count = post.tags.filter(deprecated=False).count().execute()

Atomicity of parent.save()

save() is wrapped in @transaction (sync) / @async_transaction (async). The entire body — parent insert/update, M2M flush, reverse-FK reparent — runs in a single DB transaction. If any step fails, the whole save is rolled back (no partial state).


Lakehouse historical semantics — divergence between accessor and root

This is the single most important section for anyone reading from using('lakehouse'). The instance accessor and the root filter use different defaults.

Accessor — "current active" by default

post.tags.filter(...) on lakehouse auto-injects two axes onto the target queryset:

  • _address__object_version=Versions.LATEST (latest tag version)
  • _metadata__is_deleted=False (exclude soft-deleted tags)

Plus an M2M-membership EXISTS pinned to the parent's frozen object_version (so you see "tags linked at this exact post version").

User-supplied predicates on either axis replace the default (do not AND). Axes are detected by segment match — _address__* for the version axis, _metadata__is_deleted* for the deletion axis.

from amsdal_utils.models.enums import Versions

# All historical versions of the linked target (keeps is_deleted=False default)
post.tags.filter(_address__object_version=Versions.ALL).execute()

# Soft-deleted targets only (keeps .latest() default)
post.tags.filter(_metadata__is_deleted=True).execute()

# Both axes overridden — everything historical + deleted
post.tags.filter(
    _address__object_version=Versions.ALL,
    _metadata__is_deleted=True,
).execute()

Root filter — "raw has-or-had"

Post.objects.filter(tags=t) (and tags__in, tags___address__object_id, tags__name='X') emits a raw through-table predicate. It matches every post that was ever linked to a tag with the given object_id, including links that were later removed or whose tag was soft-deleted. No version pin, no deletion clamp on the target.

Side-by-side

Query Returns
post.tags Tags currently linked to this Post version, not deleted
post.tags.filter(_address__object_version=Versions.ALL) All historical Tag versions linked to this Post version, not deleted
post.tags.filter(_metadata__is_deleted=True) Soft-deleted Tags currently linked
Post.objects.filter(tags=t) Every Post that ever had this Tag linked
Post.tags_through.objects.filter(tag=t) Every through-row ever recorded for this Tag

M2M version snapshotting on save

Every Author.save() / asave() on lakehouse re-snapshots the parent's M2M through-rows under the new parent version. Author.objects.using('lakehouse').latest().execute()[0].books always returns the books linked as of that author version.

INSERTs skip the copy (no previous version). pending_clear skips it too (the user explicitly wiped). UoW removes are honored via exclude_target_refs.

Per-hop version overrides

On historical queries you can pin / scope individual relation hops. See Field Lookups — Per-Hop Version Overrides for the full table.

Caveat: <hop>___address__class_version=Versions.ALL raises ValueError — the planner cannot collapse "all class versions" on a single hop. Workaround — query the through-model directly:

all_class_versions = (
    Post.tags_through.objects
    .using('lakehouse')
    .filter(_address__class_version=Versions.ALL)
    .execute()
)

(Root-level _address__class_version=Versions.ALL is allowed — the guard fires only on hop paths.)


Querying through relations

Need Use
Filter by reverse-FK or M2M attributes Author.objects.filter(book_set__title='Dune') — see Field Lookups
Reverse-M2M traversal at root Declare related_name='posts' on the M2M; then Tag.objects.filter(posts__title='X') (see below)
Eager-load forward-FK to avoid N+1 select_related('author') — see Performance
Eager-load reverse-FK / M2M children to avoid N+1 prefetch_related('book_set') or Prefetch(...) — see Performance
Sort by an M2M attribute Post.objects.order_by('tags__name')
Sort by a reverse-FK attribute Author.objects.order_by('book_set__year')

Without related_name no reverse accessor is installed on the target, and Tag.objects.filter(posts=p1) will silently emit a SQL predicate against a non-existent posts column. The correct fix is to add related_name='posts' to the ManyReferenceField. As a one-off workaround you can use the through table:

# Equivalent of Tag.objects.filter(posts__title__icontains='django')
tag_ids = (
    Post.tags_through.objects
    .filter(post__title__icontains='django')
    .values('tag')
)
tags = Tag.objects.filter(_address__object_id__in=tag_ids).execute()