Relationships¶
AMSDAL has three relation types — all bidirectional and all version-aware on lakehouse. This page is the single source of truth for how they're declared, accessed, written, and queried.
| Direction | Declared by | Auto-installed accessor | Returns |
|---|---|---|---|
| Forward FK | book.author: Author or ReferenceField |
book.author |
resolved Author (or coroutine in async) |
| Reverse FK | implicit on FK target | author.book_set (default name) |
RelatedSet[Book] |
| Many-to-Many | tags: list[Tag] or ManyReferenceField |
post.tags (instance) + Post.tags_through (class) |
RelatedSet[Tag] (instance) / type[ThroughModel] (class) |
See Field Types for the field declarations. This doc covers runtime behavior — accessing, writing, querying, and the lakehouse-specific historical semantics.
Forward References (FK)¶
Sync access¶
book.author triggers ReferenceLoader.load_reference() on first access; the result is cached on the instance under __fk_resolved_cache__. Repeated reads of the same FK on the same instance are free.
book = Book.objects.get(title='Dune').execute()
print(book.author.name) # one extra SELECT
print(book.author.email) # cache hit, no query
Async access¶
In async mode book.author returns a coroutine — sync ReferenceLoader.load_reference() is forbidden in async context. Await it once and reuse:
book = await Book.objects.get(title='Dune').aexecute()
author = await book.author # one await
print(author.name, author.email)
Raw reference access — _<fk>_ref¶
Each forward FK exposes two attributes: the public descriptor (resolves to a Model) and a private slot _<fk>_ref (holds the raw Reference without I/O).
Use cases where you'd touch _<fk>_ref directly:
from amsdal_utils.models.data_models.reference import Reference
# 1. Avoid lazy resolution in async hot paths (always sync, any mode)
raw = book._author_ref
# 2. Distinguish unresolved Reference vs resolved Model — no I/O
if isinstance(raw, Reference):
schedule_batch_resolve(raw)
elif raw is not None:
print(raw.name) # already resolved (e.g. via prefetch_related)
# 3. Serialization without resolving — model_dump(by_alias=False) yields raw refs
data = book.model_dump(by_alias=False)
# {'_author_ref': {'ref': {...}}, 'title': 'Dune'}
Caveat — soft-deleted FK target¶
A forward FK whose target was soft-deleted behaves inconsistently across backends:
- State backend —
ReferenceLoader.load_reference()fails internally (object not visible in state). The descriptor swallows the failure and leavesbook.authoras a danglingReference. Accessing a field on it (book.author.name) raisesAttributeError. - Lakehouse backend — the historical row is loaded successfully;
book.authorreturns a tombstone Model with_metadata.is_deleted == True. Attribute access does not raise.
Defensive workaround:
def safe_fk(instance, attr):
raw = getattr(instance, attr, None)
if raw is None:
return None
from amsdal_utils.models.data_models.reference import Reference
if isinstance(raw, Reference):
return None # unresolved / dangling on state
metadata = getattr(raw, '_metadata', None)
if metadata is not None and getattr(metadata, 'is_deleted', False):
return None # tombstone on lakehouse
return raw
loc = safe_fk(employee, 'location')
Lakehouse-only — filter at query level via a per-hop directive (see Field Lookups — Per-Hop Version Overrides):
Employee.objects.using('lakehouse').filter(
first_name='E',
location___metadata__is_deleted=False,
).execute()
Reverse References (Reverse FK)¶
Whenever you declare Book.author: Author = ReferenceField(...), the framework auto-installs author.book_set on Author returning RelatedSet[Book]. Naming is via related_name='custom' or related_name='+' to disable (see Field Types — Reverse accessor naming).
author.book_set follows the RelatedSet contract (see below). Writes (add/set) re-point children's FK column to the parent; remove/clear/shrinking set try to set the child's FK to None.
Non-nullable detach behavior¶
When the child's FK is non-nullable, remove() / clear() / shrinking set() silently buffer the detach; at parent.save() the flush attempts child.fk = None → Pydantic validation fails → pydantic.ValidationError is raised (with an error entry pointing to the FK field, input is None).
import pytest
from pydantic import ValidationError
c.employee_set.remove(e) # buffered, no DB hit
with pytest.raises(ValidationError):
c.save() # flush fails: company is required
# ✅ Reparent
c2.employee_set.add(e); c2.save()
# ✅ Hard-delete the child
e.delete()
Many-to-Many¶
Auto-through vs custom-through¶
# Auto-through (no extras)
class Post(Model):
tags: list[Tag]
# Custom-through (extra columns on the link table)
class PostTag(Model):
post: 'Post' = ReferenceField(db_field='post_id')
tag: Tag = ReferenceField(db_field='tag_id')
weight: int = 0
class Post(Model):
tags: list[Tag] = ManyReferenceField(through=PostTag, through_fields=('post', 'tag'))
Both expose post.tags (a RelatedSet[Tag]) and Post.tags_through (the through-model class itself — see below).
Constructor M2M assignment¶
post = Post(title='Hello', tags=[t1, t2, t3]) # pre-saved Models
post = Post(title='Hello', tags=[ref_t1, t2]) # mix of References / Models
post = Post(title='Hello', tags=[Tag(name='new')]) # unsaved Tag — auto-saved on parent.save()
Custom-through restriction¶
For a custom-through M2M, add / remove / set raise AmsdalError:
Cannot {op}() on M2M 'tags' with a custom through-model; create PostTag rows directly.
clear() is allowed — it operates on through rows generically and does not need to construct them.
Workaround — write through rows directly:
# Add: instantiate the through model and save it
PostTag(post=p, tag=t, weight=5).save()
# (or .asave() in async mode)
# Remove a specific link
links = PostTag.objects.filter(post=p, tag=t).execute()
for link in links:
link.delete()
# Bulk-remove all of a parent's links — clear() is OK even on custom-through
p.tags.clear()
p.save()
add() idempotency¶
Dedup is by object_id (the target's identity), not by Python id(). Two Tag instances loaded twice from DB (same PK) collapse to one through-row. add → remove → add ends with the link present (set semantics).
add() non-latest reference guard¶
post.tags.add(ref) rejects an explicit non-LATEST Reference with ValueError('Cannot add a non-latest reference to M2M ...'). Similarly for a Model whose _metadata.is_latest is False. Rationale: an M2M link points at the target's identity, not at a specific historical version.
Workaround — link to the LATEST version:
from amsdal_utils.models.data_models.reference import Reference, ReferenceData
from amsdal_utils.models.enums import Versions
# Option 1 — build a LATEST-pinned Reference with the same object_id
latest_ref = Reference(ref=ReferenceData(
resource='', class_name='Tag', class_version=Versions.LATEST,
object_id=old_ref.ref.object_id,
object_version=Versions.LATEST,
))
post.tags.add(latest_ref)
# Option 2 — re-fetch the model and add the instance
tag = Tag.objects.get(_address__object_id=old_ref.ref.object_id).execute()
post.tags.add(tag)
# Option 3 (custom-through) — write the through row yourself; you can carry
# the frozen target version on the through-row's target FK if needed
PostTag(post=p, tag=old_ref, weight=5).save()
Through-model class-level access — Post.tags_through¶
Post.tags_through is the through-model class (not a wrapper). It exposes .objects and can be queried like any model. Useful for active-vs-historical link queries, filtering by extra through columns, or working with explicit version scopes.
# Current active through-rows for a given post (state default)
links = Post.tags_through.objects.filter(post=p1).execute()
# All historical link versions (lakehouse)
from amsdal_utils.models.enums import Versions
historical = (
Post.tags_through.objects
.using('lakehouse')
.filter(
_address__object_version=Versions.ALL,
_metadata__is_deleted=False,
)
.execute()
)
# Eagerly join both sides
joined = Post.tags_through.objects.select_related('post', 'tag').filter(tag=t1).execute()
RelatedSet — detailed reference¶
RelatedSet[T] is the runtime object returned by both reverse-FK (author.book_set) and M2M (post.tags) accessors. It's a list subclass with cache-aware deferred semantics and a unit-of-work write buffer.
Cache states¶
| State | Cache loaded? | Pending writes? | Read behavior |
|---|---|---|---|
| Fresh | no | no | Reads issue minimal SQL on demand (SELECT COUNT(*), SELECT 1 LIMIT 1, etc.) |
| Prefetched | yes | no | Reads use cache, no DB hit |
| Dirty | maybe | yes | Reads use cache ± pending diffs |
| Cleared | yes (empty) | maybe | _pending_clear short-circuits to "empty" until flush |
Read operations and emitted SQL¶
| Operation | SQL when cache empty |
|---|---|
len(rs) / rs.count() |
SELECT COUNT(*) |
bool(rs) / rs.exists() |
SELECT 1 ... LIMIT 1 |
rs[i] / rs[i:j] |
SELECT ... LIMIT/OFFSET |
for x in rs / list(rs) |
SELECT * (full materialization) |
bool(rs) short-circuits to True if _pending_add is non-empty (no DB hit); to False if _pending_clear is set with no pending adds.
Sync vs async — mode rules¶
- Writes (
add/remove/set/clear) are synchronous in any mode. They mutate in-memory only; no I/O. - Magic methods (
len,iter,[i],in,bool) work in any mode when cache is populated; in async mode without cache they raise — callawait rs(orasync for x in rs) first to populate the cache. - Named reads are strict by mode: use
count()/exists()in sync mode,acount()/aexists()in async mode.
set() semantics¶
rs.set([a, b, c]) is equivalent to clear() + add(a, b, c). The flush emits a DELETE-all-then-INSERT pattern — not an optimal diff. If your collection is large and the difference is small, prefer paired remove() / add() calls.
filter / exclude / order_by chain¶
Calling these on a RelatedSet returns a target-model QuerySet scoped by parent membership (an EXISTS subquery is wired in automatically):
tags = post.tags.filter(name__icontains='py').order_by('name').execute()
count = post.tags.filter(deprecated=False).count().execute()
Atomicity of parent.save()¶
save() is wrapped in @transaction (sync) / @async_transaction (async). The entire body — parent insert/update, M2M flush, reverse-FK reparent — runs in a single DB transaction. If any step fails, the whole save is rolled back (no partial state).
Lakehouse historical semantics — divergence between accessor and root¶
This is the single most important section for anyone reading from using('lakehouse'). The instance accessor and the root filter use different defaults.
Accessor — "current active" by default¶
post.tags.filter(...) on lakehouse auto-injects two axes onto the target queryset:
_address__object_version=Versions.LATEST(latest tag version)_metadata__is_deleted=False(exclude soft-deleted tags)
Plus an M2M-membership EXISTS pinned to the parent's frozen object_version (so you see "tags linked at this exact post version").
User-supplied predicates on either axis replace the default (do not AND). Axes are detected by segment match — _address__* for the version axis, _metadata__is_deleted* for the deletion axis.
from amsdal_utils.models.enums import Versions
# All historical versions of the linked target (keeps is_deleted=False default)
post.tags.filter(_address__object_version=Versions.ALL).execute()
# Soft-deleted targets only (keeps .latest() default)
post.tags.filter(_metadata__is_deleted=True).execute()
# Both axes overridden — everything historical + deleted
post.tags.filter(
_address__object_version=Versions.ALL,
_metadata__is_deleted=True,
).execute()
Root filter — "raw has-or-had"¶
Post.objects.filter(tags=t) (and tags__in, tags___address__object_id, tags__name='X') emits a raw through-table predicate. It matches every post that was ever linked to a tag with the given object_id, including links that were later removed or whose tag was soft-deleted. No version pin, no deletion clamp on the target.
Side-by-side¶
| Query | Returns |
|---|---|
post.tags |
Tags currently linked to this Post version, not deleted |
post.tags.filter(_address__object_version=Versions.ALL) |
All historical Tag versions linked to this Post version, not deleted |
post.tags.filter(_metadata__is_deleted=True) |
Soft-deleted Tags currently linked |
Post.objects.filter(tags=t) |
Every Post that ever had this Tag linked |
Post.tags_through.objects.filter(tag=t) |
Every through-row ever recorded for this Tag |
M2M version snapshotting on save¶
Every Author.save() / asave() on lakehouse re-snapshots the parent's M2M through-rows under the new parent version. Author.objects.using('lakehouse').latest().execute()[0].books always returns the books linked as of that author version.
INSERTs skip the copy (no previous version). pending_clear skips it too (the user explicitly wiped). UoW removes are honored via exclude_target_refs.
Per-hop version overrides¶
On historical queries you can pin / scope individual relation hops. See Field Lookups — Per-Hop Version Overrides for the full table.
Caveat: <hop>___address__class_version=Versions.ALL raises ValueError — the planner cannot collapse "all class versions" on a single hop. Workaround — query the through-model directly:
all_class_versions = (
Post.tags_through.objects
.using('lakehouse')
.filter(_address__class_version=Versions.ALL)
.execute()
)
(Root-level _address__class_version=Versions.ALL is allowed — the guard fires only on hop paths.)
Querying through relations¶
| Need | Use |
|---|---|
| Filter by reverse-FK or M2M attributes | Author.objects.filter(book_set__title='Dune') — see Field Lookups |
| Reverse-M2M traversal at root | Declare related_name='posts' on the M2M; then Tag.objects.filter(posts__title='X') (see below) |
| Eager-load forward-FK to avoid N+1 | select_related('author') — see Performance |
| Eager-load reverse-FK / M2M children to avoid N+1 | prefetch_related('book_set') or Prefetch(...) — see Performance |
| Sort by an M2M attribute | Post.objects.order_by('tags__name') |
| Sort by a reverse-FK attribute | Author.objects.order_by('book_set__year') |
Reverse-M2M without related_name — workaround¶
Without related_name no reverse accessor is installed on the target, and Tag.objects.filter(posts=p1) will silently emit a SQL predicate against a non-existent posts column. The correct fix is to add related_name='posts' to the ManyReferenceField. As a one-off workaround you can use the through table:
# Equivalent of Tag.objects.filter(posts__title__icontains='django')
tag_ids = (
Post.tags_through.objects
.filter(post__title__icontains='django')
.values('tag')
)
tags = Tag.objects.filter(_address__object_id__in=tag_ids).execute()