ntime flags. These flags must be set before the CLR initializes to affect tiered compilation, dynamic profile-guided optimization (PGO), and GC behavior.
<!-- Global.props or Directory.Build.props -->
<Project>
<PropertyGroup>
<TargetFramework>net9.0</TargetFramework>
<Nullable>enable</Nullable>
<ImplicitUsings>enable</ImplicitUsings>
</PropertyGroup>
</Project>
Runtime configuration is applied via environment variables or .runtimeconfig.json. For server workloads, enable server GC and dynamic PGO:
export DOTNET_gcServer=1
export DOTNET_TieredPGO=1
export DOTNET_ReadyToRun=0
export DOTNET_GCHeapHardLimit=0
Architecture rationale: Server GC (DOTNET_gcServer=1) uses dedicated background threads per logical processor, drastically reducing foreground thread blocking. Dynamic PGO (DOTNET_TieredPGO=1) collects runtime execution profiles and recompiles hot paths with optimized machine code, improving branch prediction and inlining decisions. Disabling ReadyToRun (DOTNET_ReadyToRun=0) is intentional for server workloads; AOT images become stale after PGO profiles evolve, and JIT+PGO consistently outperforms static AOT in dynamic request patterns.
Phase 2: JSON Serialization Optimization
System.Text.Json in .NET 9 introduces source generator caching improvements, reduced reflection overhead, and JsonSerializerOptions pooling. Migrate from runtime reflection to compile-time source generation.
// Old: Runtime reflection (slow, allocation-heavy)
var options = new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.CamelCase };
var json = JsonSerializer.Serialize(dto, options);
// New: Source generator + pooled options
[JsonSerializable(typeof(ApiResponse))]
[JsonSourceGenerationOptions(PropertyNamingPolicy = JsonKnownNamingPolicy.CamelCase)]
public partial class AppJsonContext : JsonSerializerContext { }
// Usage
var json = JsonSerializer.Serialize(dto, AppJsonContext.Default.ApiResponse);
Architecture rationale: Source generators eliminate runtime reflection, reduce initial serialization latency by 40-60%, and cut temporary allocations. Pooling JsonSerializerOptions prevents per-request object creation. For high-throughput APIs, this alone accounts for 15-20% of the total throughput gain.
Phase 3: Memory & Span-Aware Processing
.NET 9's BCL ships with allocation-aware overloads, Span<T> extensions, and improved ArrayPool<T> utilization. Replace string.Substring, IEnumerable.ToList(), and byte[] allocations with span-based equivalents.
// Old: Allocation per request
var lines = File.ReadAllText(path).Split('\n');
// New: Zero-allocation span processing
ReadOnlySpan<char> content = File.ReadAllText(path);
foreach (var line in content.Split('\n'))
{
ProcessLine(line); // line is ReadOnlySpan<char>
}
For numeric workloads, leverage hardware intrinsics introduced in .NET 9:
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
public static void ProcessFloats(float[] input, float[] output)
{
if (Avx.IsSupported)
{
var vecSize = Vector256<float>.Count;
for (int i = 0; i <= input.Length - vecSize; i += vecSize)
{
var vec = Avx.LoadVector256(&input[i]);
var result = Avx.Multiply(vec, Vector256.Create(2.0f));
Avx.Store(&output[i], result);
}
}
else
{
// Fallback to scalar
}
}
Architecture rationale: Span-based APIs eliminate intermediate collections and reduce Gen 0 pressure. Hardware intrinsics bypass JIT vectorization heuristics, guaranteeing SIMD utilization on supported CPUs. These changes directly reduce GC frequency and improve cache locality.
Phase 4: Validation & Benchmarking
Never assume gains. Validate with controlled benchmarks before and after migration.
dotnet new console -n PerfValidation -f net9.0
cd PerfValidation
dotnet add package BenchmarkDotNet
Implement baseline and optimized benchmarks. Compare throughput, memory allocation, and GC counts. Iterate on runtime flags until metrics stabilize.
Pitfall Guide
-
Upgrading without profiling
Teams assume framework upgrades deliver automatic gains. Without baseline profiling, you cannot isolate which optimizations actually impact your workload. Always capture pre-migration metrics using dotnet-counters, dotnet-trace, or BenchmarkDotNet.
-
Ignoring GC mode selection
Workstation GC (DOTNET_gcServer=0) is optimized for desktop UI responsiveness, not server throughput. Leaving it enabled on Linux containers causes frequent foreground pauses. Server GC shifts collection to background threads, reducing latency spikes by 40-60%.
-
Misusing Span<T> and Memory<T>
Spans are stack-only and cannot be stored in fields or async state machines. Attempting to capture a span across await boundaries causes compiler errors or silent data corruption. Use Memory<T> for heap-allocated buffers that cross async boundaries.
-
Skipping JSON source generator migration
Runtime reflection in System.Text.Json allocates per-type metadata, caches poorly, and degrades under concurrency. Source generators compile serialization logic at build time, eliminating reflection overhead and improving cold-start latency.
-
Disabling dynamic PGO in production
Dynamic PGO collects execution profiles during warmup and recompiles hot methods. Disabling it (DOTNET_TieredPGO=0) forces static tier-0 code paths, increasing instruction cache misses and branch mispredictions. Keep it enabled unless running in constrained environments with strict memory limits.
-
Premature SIMD optimization
Hardware intrinsics require CPU feature detection and fallback paths. Forcing Avx or Sse usage without runtime checks crashes on unsupported architectures. Use System.Runtime.Intrinsics feature flags and always provide scalar fallbacks.
-
Treating Native AOT as a drop-in replacement
Native AOT eliminates the JIT and GC, but restricts reflection, dynamic typing, and certain third-party libraries. It's ideal for stateless microservices with static dependency graphs. For apps relying on runtime code generation or plugin architectures, JIT+PGO remains the correct choice.
Production best practices:
- Enable runtime flags via deployment manifests, not code.
- Use
dotnet-gcdump to analyze GC heap fragmentation.
- Validate allocation hotspots with
dotnet-alloc before optimizing.
- Keep PGO profiles warm by simulating traffic during deployments.
- Isolate performance changes in feature branches; measure delta before merging.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-throughput JSON API | JIT + Server GC + Source Gen JSON | Maximizes concurrency, minimizes allocation overhead | Reduces compute instances by 15-25% |
| Latency-sensitive microservice | Native AOT + trimmed runtime | Eliminates GC pauses, reduces startup to <50ms | Increases binary size, reduces cold-start cost |
| Data processing pipeline | SIMD intrinsics + ArrayPool | Leverages CPU vectorization, reuses buffers | Lowers memory footprint, improves throughput |
| Legacy monolith migration | Incremental BCL updates + PGO | Minimizes risk, captures 60-70% gains safely | Near-zero infra cost, reduces refactoring debt |
| Containerized serverless | ReadyToRun + server GC | Balances startup speed and sustained performance | Optimizes cold-start billing, maintains throughput |
Configuration Template
// app.runtimeconfig.template.json
{
"runtimeOptions": {
"configProperties": {
"System.GC.Server": true,
"System.GC.Concurrent": true,
"System.GC.RetainVM": false,
"System.Threading.ThreadPool.MinThreads": 16,
"System.Threading.ThreadPool.MaxThreads": 256,
"System.Globalization.Invariant": false
}
}
}
# Dockerfile runtime configuration
ENV DOTNET_gcServer=1
ENV DOTNET_TieredPGO=1
ENV DOTNET_GCHeapHardLimit=0
ENV DOTNET_ReadyToRun=0
ENV DOTNET_EnableWriteXorExecute=1
<!-- Directory.Build.props for consistent optimization flags -->
<Project>
<PropertyGroup Condition="'$(Configuration)' == 'Release'">
<PublishTrimmed>false</PublishTrimmed>
<PublishAot>false</PublishAot>
<TieredCompilation>true</TieredCompilation>
<TieredPGO>true</TieredPGO>
</PropertyGroup>
</Project>
Quick Start Guide
- Update your project file:
<TargetFramework>net9.0</TargetFramework> and run dotnet restore.
- Set runtime flags in your deployment environment:
DOTNET_gcServer=1, DOTNET_TieredPGO=1.
- Replace reflection-based JSON serialization with
[JsonSerializable] source generators.
- Run
dotnet run -c Release and validate throughput with dotnet-counters monitor --process-id <pid>.
- Commit runtime configuration to infrastructure-as-code and monitor GC pause metrics in production.