tadata for routing, logging, or client responses. Override __init__ to accept structured parameters, but always delegate the message to the parent class.
class TransactionDeclinedError(PaymentDomainError):
def __init__(self, transaction_id: str, decline_code: str, message: str = "Transaction declined"):
self.transaction_id = transaction_id
self.decline_code = decline_code
super().__init__(message)
Rationale: Calling super().__init__(message) preserves standard traceback rendering and str(err) behavior. Custom attributes remain accessible via err.transaction_id and err.decline_code, eliminating the need for string parsing. Type hints on __init__ improve IDE support and static analysis.
Step 3: Build a Branching Hierarchy
Flat exception lists become unmanageable as failure modes multiply. Group errors by handling strategy rather than by source.
class RetryablePaymentError(PaymentDomainError):
"""Transient failures that may succeed on subsequent attempts."""
def __init__(self, message: str, retry_after_seconds: int = 30):
self.retry_after_seconds = retry_after_seconds
super().__init__(message)
class PermanentPaymentError(PaymentDomainError):
"""Failures that will not resolve without external intervention."""
pass
class InsufficientFundsError(PermanentPaymentError):
def __init__(self, transaction_id: str, required_amount: float, available_balance: float):
self.transaction_id = transaction_id
self.required_amount = required_amount
self.available_balance = available_balance
super().__init__(
f"Insufficient funds: required {required_amount}, available {available_balance}"
)
Rationale: Grouping by handling strategy (Retryable vs Permanent) aligns exception types with infrastructure behavior. A background worker can catch RetryablePaymentError to schedule exponential backoff, catch PermanentPaymentError to route to a dead-letter queue, and let unknown exceptions bubble to a global error handler.
Step 4: Implement Exception Translation with Cause Preservation
When translating low-level infrastructure errors into domain errors, use explicit chaining to preserve the root cause.
import requests
def process_payment(payload: dict) -> dict:
try:
response = requests.post("https://api.gateway.com/v1/charge", json=payload, timeout=5)
response.raise_for_status()
return response.json()
except requests.exceptions.Timeout as exc:
raise RetryablePaymentError(
"Payment gateway timed out",
retry_after_seconds=60
) from exc
except requests.exceptions.HTTPError as exc:
if exc.response.status_code == 402:
raise InsufficientFundsError(
transaction_id=payload["id"],
required_amount=payload["amount"],
available_balance=0.0
) from exc
raise PermanentPaymentError(f"Gateway returned {exc.response.status_code}") from exc
Rationale: The from exc clause sets the __cause__ attribute. Python's traceback printer will display both the original Timeout and the translated RetryablePaymentError. This prevents root cause obfuscation while maintaining domain abstraction. PEP 3134 explicitly designed this mechanism to solve the "exception swallowing" problem in layered architectures.
Step 5: Customize String Representation for Observability
Default exception formatting often omits critical context. Override __str__ or format the message in super().__init__() to ensure tracebacks and logs contain actionable data.
class FraudDetectionError(PaymentDomainError):
def __init__(self, transaction_id: str, risk_score: float, flagged_fields: list[str]):
self.transaction_id = transaction_id
self.risk_score = risk_score
self.flagged_fields = flagged_fields
formatted_msg = (
f"Fraud check failed for {transaction_id!r} "
f"(score: {risk_score:.2f}, flags: {flagged_fields})"
)
super().__init__(formatted_msg)
Rationale: Using !r in f-strings applies repr() to strings, adding quotes and clarifying empty values. Formatting the message before passing it to super().__init__() ensures consistency across str(err), traceback output, and logging frameworks. This eliminates the need for custom log formatters to extract context.
Pitfall Guide
1. Inheriting from BaseException
Explanation: BaseException is the root of Python's entire exception tree, including SystemExit, KeyboardInterrupt, and GeneratorExit. Inheriting from it causes your custom error to bypass standard except Exception: handlers, breaking cleanup logic and signal handling.
Fix: Always inherit from Exception or a direct subclass. Reserve BaseException only for interpreter-level signals.
2. Embedding Structured Data in the Message String
Explanation: Writing "Record 4823 failed on field 'amount'" forces downstream consumers to parse strings using regex or split operations. This is fragile, slow, and breaks when message formatting changes.
Fix: Extract data into explicit attributes (err.record_id, err.field_name). Keep the message human-readable for logs, but keep data machine-readable for routing.
3. Omitting super().__init__()
Explanation: If you define __init__ without calling the parent constructor, the exception's args tuple remains empty. Tracebacks will display CustomError() with no message, and str(err) returns an empty string.
Fix: Always call super().__init__(message) as the first or last line in your custom __init__. Pass a formatted string if you need dynamic content.
4. Over-Catching the Base Class Internally
Explanation: Catching PaymentDomainError inside service methods to suppress errors or return default values defeats the purpose of a hierarchy. It masks specific failure modes and prevents proper retry or escalation logic.
Fix: Catch specific subclasses at the point where handling logic exists. Let the base class propagate to the outer error boundary where broad fallback or monitoring occurs.
5. Using Exceptions for Expected Control Flow
Explanation: Raising custom exceptions for predictable business logic (e.g., UserAlreadyExistsError during registration) incurs stack trace generation overhead and obscures code paths. Exceptions should represent exceptional conditions, not branching logic.
Fix: Use explicit return types, Result/Either patterns, or boolean flags for expected outcomes. Reserve exceptions for genuine failures, invalid states, or infrastructure faults.
6. Forgetting Exception Chaining During Translation
Explanation: Catching a low-level error and raising a new one without from original severs the diagnostic chain. Operators see only the translated error, making root cause analysis dependent on external logs.
Fix: Always use raise NewError(...) from original_error. If you intentionally want to hide implementation details, use raise NewError(...) from None, but document why.
7. Ignoring Serialization Requirements
Explanation: Exceptions are Python objects. When building REST or GraphQL APIs, you cannot serialize an exception instance directly to JSON. Teams often resort to parsing str(err) or manually mapping attributes.
Fix: Add a to_dict() or model_dump() method to your base exception. Ensure all custom attributes are JSON-serializable primitives.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Single microservice with <10 failure modes | Flat custom exceptions | Simpler to maintain, no hierarchy overhead needed | Low |
| Multi-service architecture with shared error contracts | Hierarchical exceptions with base class | Enables cross-service error routing and consistent API responses | Medium (initial design time) |
| High-frequency transaction processing | Structured attributes + to_dict() | Enables structured logging, metrics tagging, and dead-letter queue routing | Low (negligible CPU overhead) |
| External API error responses | Custom __str__ + serialization method | Guarantees consistent JSON error payloads without parsing tracebacks | Low |
| Library/Framework development | Minimal custom exceptions, prefer built-ins | Reduces coupling, allows consumers to define their own hierarchy | Low |
Configuration Template
Copy this template into your project's exceptions.py or errors.py module. It provides a production-ready base with logging hooks, serialization, and consistent formatting.
import logging
from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
class DomainError(Exception):
"""Base exception for domain-specific failures.
Provides structured context, consistent string representation,
and JSON serialization for API responses and structured logging.
"""
def __init__(
self,
message: str,
error_code: str = "DOMAIN_ERROR",
context: Optional[Dict[str, Any]] = None
):
self.error_code = error_code
self.context = context or {}
super().__init__(message)
def __str__(self) -> str:
ctx_str = f" | context: {self.context}" if self.context else ""
return f"[{self.error_code}] {super().__str__()}{ctx_str}"
def to_dict(self) -> Dict[str, Any]:
"""Serialize exception for API responses or structured logs."""
return {
"error_code": self.error_code,
"message": str(self),
"context": self.context
}
def log_error(self, level: int = logging.ERROR) -> None:
"""Emit structured log entry with exception context."""
logger.log(level, str(self), extra={"error_code": self.error_code, "context": self.context})
class RetryableDomainError(DomainError):
"""Transient failures suitable for automatic retry."""
def __init__(self, message: str, retry_after_seconds: int = 30, **kwargs):
self.retry_after_seconds = retry_after_seconds
kwargs.setdefault("error_code", "RETRYABLE_ERROR")
super().__init__(message, **kwargs)
class PermanentDomainError(DomainError):
"""Failures requiring external intervention or manual review."""
def __init__(self, message: str, **kwargs):
kwargs.setdefault("error_code", "PERMANENT_ERROR")
super().__init__(message, **kwargs)
Quick Start Guide
- Create the base module: Add
exceptions.py to your project root or core package. Paste the configuration template above.
- Define domain-specific subclasses: Create new classes inheriting from
RetryableDomainError or PermanentDomainError. Add domain-specific attributes in __init__ and call super().__init__(message, context={"key": value}).
- Translate at service boundaries: In your infrastructure layer (HTTP clients, database drivers, message queues), catch low-level exceptions and raise domain exceptions using
raise DomainError(...) from original_exc.
- Handle at the application boundary: In your entry points (FastAPI/Flask routes, Celery tasks, CLI commands), catch specific subclasses for routing logic. Catch the base
DomainError for fallback handling and structured logging.
- Verify with tests: Use
pytest.raises to assert exception types, check exc_info.value.context for metadata, and verify that str(err) contains expected fields. Mock infrastructure dependencies to trigger specific error paths.
Exception hierarchies are not about preventing crashes. They are about encoding failure semantics into the type system so that routing, logging, and recovery logic can operate deterministically. When designed correctly, they transform error handling from a defensive afterthought into a core architectural capability.