or processed twice. chunkById() mitigates this by using the primary key for pagination.
use Illuminate\Support\Collection;
use App\Models\Transaction;
public function reconcilePendingTransactions(): void
{
Transaction::query()
->where('status', 'pending')
->orderBy('id')
->chunkById(500, function (Collection $batch) {
foreach ($batch as $transaction) {
$transaction->reconcile();
$transaction->save();
}
});
}
Architecture Rationale:
- Batch Size: 500 is a conservative default. Adjust based on row size and available memory.
- Ordering: Explicit
orderBy('id') ensures deterministic pagination.
- Safety:
chunkById uses WHERE id > ? logic, making it immune to row modifications within the batch.
2. Streaming Exports with cursor()
For read-only operations like CSV generation, cursor() provides the lowest memory footprint by utilizing PHP generators. It executes a single query and yields models one by one.
use App\Models\Transaction;
public function exportToCsv(string $filePath): void
{
$handle = fopen($filePath, 'w');
// Write header
fputcsv($handle, ['ID', 'Amount', 'Currency', 'Date']);
foreach (Transaction::query()
->select(['id', 'amount', 'currency', 'created_at'])
->orderBy('id')
->cursor() as $transaction) {
fputcsv($handle, [
$transaction->id,
$transaction->amount,
$transaction->currency,
$transaction->created_at->toIso8601String()
]);
}
fclose($handle);
}
Architecture Rationale:
- Select Clause: Limiting columns reduces payload size per row.
- Generator:
cursor() returns a Generator, preventing the accumulation of models in memory.
- Constraint: Eager loading is unavailable. If relationships are needed,
lazy() must be used.
3. Balanced Processing with lazy()
lazy() offers a hybrid approach. It chunks results internally but yields items individually, allowing the use of with() for relationships while maintaining low memory usage.
use App\Models\Customer;
public function generateMonthlyStatements(): void
{
foreach (Customer::with(['orders' => function ($query) {
$query->where('status', 'completed')
->orderBy('created_at');
}])
->lazy(200) as $customer) {
// $customer has 'orders' loaded
// Memory remains low as chunks are processed sequentially
$this->statementGenerator->createFor($customer);
}
}
Architecture Rationale:
- Chunk Size: The argument to
lazy() defines the internal batch size.
- Relationship Handling: Relations are loaded per chunk. This is efficient but requires caution: if a single customer has thousands of orders,
with('orders') can still spike memory. In such cases, nested chunking or cursor-based relation processing is required.
- Syntax: Returns an iterable, enabling
foreach loops without callbacks.
Pitfall Guide
1. The Chunk Skip Trap
Explanation: Using chunk() while modifying or deleting rows can cause the database result set to shift. If a row is deleted, subsequent rows may move into the current chunk's range, causing them to be skipped in the next iteration.
Fix: Always use chunkById() for mutations. It relies on the primary key index, which is stable regardless of row content changes.
2. Cursor Connection Timeouts
Explanation: cursor() keeps the database connection open for the duration of the iteration. If the processing logic per row is slow, the connection may time out, or the database server may kill the idle connection.
Fix: Ensure row processing is lightweight. For heavy processing, switch to lazy() which opens and closes connections per chunk, or implement connection ping/reconnect logic.
3. N+1 Amplification in Chunks
Explanation: Forgetting to eager load relationships inside a chunk or lazy loop results in an N+1 query problem. Since chunks process many rows, this multiplies the query count significantly, degrading performance.
Fix: Always chain with() when relationships are accessed. Verify query counts using DB::listen() during development.
4. Missing Ordering Index
Explanation: Chunking methods rely on ordering columns to paginate. If the ordering column lacks an index, the database performs full table scans for each chunk query, causing severe performance degradation.
Fix: Ensure the column used in orderBy() (typically id) is indexed. For custom ordering columns, add a composite index if necessary.
5. Lazy Relation Explosion
Explanation: lazy() chunks the parent model, but eager-loaded relations are fetched entirely for each chunk. If a parent has a massive number of related records, memory usage can still spike.
Fix: For parents with large relation sets, avoid with() on the relation. Instead, process relations separately using their own chunking strategy or use cursor() on the relation table joined to the parent.
6. Callback vs. Iterable Confusion
Explanation: chunk() and chunkById() require a callback function, while lazy() and cursor() return iterables. Mixing these patterns leads to syntax errors or logic bugs.
Fix: Adopt a consistent mental model: Callbacks for chunk*, Iterables for lazy/cursor. Use linters to enforce usage patterns.
7. Soft Delete Interference
Explanation: Chunking queries may include soft-deleted rows if global scopes are not respected, or behavior may become unpredictable if rows are restored during processing.
Fix: Explicitly filter soft deletes using whereNull('deleted_at') or ensure global scopes are active. Test chunking behavior with soft-deleted data present.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Bulk Update/Delete Records | chunkById() | Safe against mutations; uses stable ID pagination. | Low |
| CSV/Excel Export (Read-Only) | cursor() | Minimal memory; single query; highest throughput. | Lowest |
| Processing with Relationships | lazy() | Supports with(); low memory; clean syntax. | Low |
| Email Campaign Batches | lazy() | Efficient memory; allows relation loading for personalization. | Low |
| Tiny Dataset (<1k rows) | all() | Simplicity; overhead of chunking not justified. | Negligible |
Configuration Template
A robust batch processor class template for production use.
namespace App\Jobs;
use Illuminate\Support\Collection;
use Illuminate\Support\Facades\DB;
use App\Models\Transaction;
class ProcessTransactionsJob
{
public function handle(): void
{
$this->processInBatches();
}
private function processInBatches(): void
{
Transaction::query()
->where('status', 'pending')
->orderBy('id')
->chunkById(500, function (Collection $batch) {
// Wrap batch in transaction for atomicity
DB::transaction(function () use ($batch) {
foreach ($batch as $transaction) {
try {
$this->processSingle($transaction);
} catch (\Exception $e) {
// Log error and continue or fail fast based on requirements
report($e);
}
}
});
});
}
private function processSingle(Transaction $transaction): void
{
// Business logic here
$transaction->status = 'processed';
$transaction->save();
}
}
Quick Start Guide
- Identify the Bottleneck: Locate the query causing memory issues. If it uses
all() or get(), proceed to step 2.
- Determine Requirements: Do you need to modify rows? Do you need relationships?
- Modify rows? → Use
chunkById().
- Read-only with relationships? → Use
lazy().
- Read-only, no relationships, max speed? → Use
cursor().
- Refactor the Query: Replace the retrieval method with the chosen iterator. Ensure
orderBy('id') is present for chunking methods.
- Validate: Run the code against a staging database with a representative data volume. Check memory usage and query logs.
- Deploy: Merge the changes and monitor the first production run for stability.