e=shortflag normalizes timestamps toYYYY-MM-DD, simplifying daily aggregation. The --name-only` output isolates file paths without diff noise, enabling accurate file-type distribution analysis.
Step 2: Aggregation Engine (TypeScript Implementation)
The following module demonstrates a production-ready aggregation pipeline. It replaces ad-hoc parsing with a structured, type-safe approach that handles large repositories efficiently.
import { execSync } from 'child_process';
import { createInterface } from 'readline';
import { Readable } from 'stream';
interface CommitRecord {
author: string;
date: string;
files: string[];
}
interface AggregatedMetrics {
totalCommits: number;
contributors: Map<string, number>;
activeDays: Set<string>;
fileTypes: Map<string, number>;
dailyActivity: Map<string, number>;
}
export class RepoTelemetryEngine {
private metrics: AggregatedMetrics;
constructor() {
this.metrics = {
totalCommits: 0,
contributors: new Map(),
activeDays: new Set(),
fileTypes: new Map(),
dailyActivity: new Map(),
};
}
async collect(): Promise<void> {
const rawLog = execSync(
'git log --format="%an|%ad" --date=short --all --no-merges',
{ encoding: 'utf-8' }
);
const lines = rawLog.trim().split('\n');
this.metrics.totalCommits = lines.length;
for (const line of lines) {
const [author, date] = line.split('|');
this.metrics.contributors.set(author, (this.metrics.contributors.get(author) || 0) + 1);
this.metrics.activeDays.add(date);
this.metrics.dailyActivity.set(date, (this.metrics.dailyActivity.get(date) || 0) + 1);
}
await this.collectFileTypes();
}
private async collectFileTypes(): Promise<void> {
const rawFiles = execSync(
'git log --format="" --name-only --all --no-merges',
{ encoding: 'utf-8' }
);
const extensions = rawFiles
.trim()
.split('\n')
.map(f => f.split('.').pop() || 'unknown')
.filter(ext => ext !== 'unknown' && !ext.includes('/'));
for (const ext of extensions) {
this.metrics.fileTypes.set(ext, (this.metrics.fileTypes.get(ext) || 0) + 1);
}
}
render(): string {
const maxCommits = Math.max(...this.metrics.contributors.values());
const maxDaily = Math.max(...this.metrics.dailyActivity.values());
const maxFiles = Math.max(...this.metrics.fileTypes.values());
const bar = (value: number, max: number, width: number = 20) => {
const filled = Math.round((value / max) * width);
return 'β'.repeat(filled) + 'β'.repeat(width - filled);
};
let output = `\nπ REPO TELEMETRY\n`;
output += `π Total commits: ${this.metrics.totalCommits}\n`;
output += `π€ Contributors: ${this.metrics.contributors.size}\n`;
output += `π
Active days: ${this.metrics.activeDays.size}\n`;
output += `π Files touched: ${this.metrics.fileTypes.size}\n\n`;
output += `π₯ Top Contributors\n`;
for (const [author, count] of [...this.metrics.contributors.entries()].sort((a, b) => b[1] - a[1]).slice(0, 5)) {
output += ` ${author.padEnd(15)} ${bar(count, maxCommits)} ${count}\n`;
}
output += `\nπ File Type Breakdown\n`;
for (const [ext, count] of [...this.metrics.fileTypes.entries()].sort((a, b) => b[1] - a[1]).slice(0, 5)) {
output += ` .${ext.padEnd(12)} ${bar(count, maxFiles)} ${count}\n`;
}
output += `\nπ₯ Recent Activity\n`;
const sortedDays = [...this.metrics.dailyActivity.entries()].sort((a, b) => b[1] - a[1]).slice(0, 7);
for (const [day, count] of sortedDays) {
output += ` ${day} ${bar(count, maxDaily)} ${count}\n`;
}
return output;
}
}
Step 3: Architecture Rationale
- Pure
git log Invocation: Avoids REST APIs, authentication tokens, and rate limits. Execution remains deterministic and reproducible across environments.
- In-Memory Aggregation:
Map and Set structures provide O(1) lookups for contributor counts, date tracking, and extension distribution. This prevents quadratic scaling on repositories with thousands of commits.
- Terminal-Native Rendering: Proportional bar charts scale dynamically based on maximum values. No external charting libraries are required, keeping the runtime footprint minimal.
- Stream-Compatible Design: The
collect() method can be refactored to use child_process.spawn() with line-by-line streaming for repositories exceeding 50,000 commits, preventing memory pressure.
Pitfall Guide
1. Merge Commit Inflation
Explanation: Default git log includes merge commits, artificially inflating contributor counts and daily activity metrics.
Fix: Always append --no-merges to isolate substantive changes. For projects where merge strategy matters, track merges separately using --merges.
2. Timezone Drift in Daily Aggregation
Explanation: --date=short renders dates in the local timezone, causing commits near midnight to shift between calendar days.
Fix: Export TZ=UTC before execution, or use --date=format-local:'%Y-%m-%d' with explicit timezone handling. Normalize all dates to UTC during aggregation.
3. Large Repository Latency
Explanation: Parsing 100,000+ commits synchronously blocks the event loop and degrades terminal responsiveness.
Fix: Implement streaming parsers using child_process.spawn(). Add --since filters for time-bounded reports. Cache results in .git/telemetry-cache.json and invalidate on HEAD changes.
4. Binary and Generated File Noise
Explanation: Build artifacts, lockfiles, and minified bundles skew file-type distributions, masking actual source code composition.
Fix: Filter extensions against a whitelist (.ts, .js, .tsx, .css, .json, .md). Exclude paths matching node_modules/, dist/, build/, or *.lock.
5. Shallow Clone Blind Spots
Explanation: git clone --depth=1 truncates history, causing telemetry tools to report zero or near-zero metrics.
Fix: Detect shallow clones via git rev-parse --is-shallow-repository. Warn users and recommend git fetch --unshallow before analysis.
6. Terminal Width Overflow
Explanation: Fixed-width bar charts break on narrow terminals or when piped to log aggregators.
Fix: Dynamically calculate bar width using process.stdout.columns || 80. Provide a --compact flag that outputs raw JSON for machine consumption.
7. Author Name Fragmentation
Explanation: Contributors often use multiple names or emails (John Doe <john@work.com> vs J. Doe <john@personal.net>), fragmenting contribution counts.
Fix: Parse .mailmap files to normalize identities. Fall back to email domain matching or heuristic name clustering for repositories without mailmap configuration.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Local development & quick checks | CLI Telemetry Tool (@wuchunjie/gitpulse) | Instant execution, zero config, terminal-native | $0 |
| CI/CD automated reporting | Custom TypeScript engine with --json output | Scriptable, integrates with GitHub Actions/GitLab CI | Engineering time only |
| Monorepo with 100k+ commits | Streaming parser + extension filtering | Prevents memory pressure, maintains sub-second latency | Moderate engineering overhead |
| Air-gapped or restricted networks | Local git log pipeline | No external dependencies, fully offline capable | $0 |
| Executive dashboards & stakeholder views | Web dashboard + exported JSON | Polished UI, shareable links, historical tracking | SaaS subscription or self-hosted infra |
Configuration Template
Copy this template into your project root to standardize telemetry execution across teams. It includes timezone normalization, extension filtering, and cache invalidation logic.
#!/usr/bin/env bash
# telemetry.sh - Standardized Git metrics collection
set -euo pipefail
export TZ=UTC
CACHE_DIR=".git/telemetry-cache"
CACHE_FILE="${CACHE_DIR}/metrics.json"
HEAD_HASH=$(git rev-parse HEAD)
# Invalidate cache if HEAD changed
if [[ -f "$CACHE_FILE" ]]; then
CACHED_HEAD=$(jq -r '.head' "$CACHE_FILE" 2>/dev/null || echo "")
if [[ "$CACHED_HEAD" == "$HEAD_HASH" ]]; then
echo "π Using cached telemetry data"
cat "$CACHE_FILE" | jq '.metrics'
exit 0
fi
fi
mkdir -p "$CACHE_DIR"
# Execute telemetry collection
npx @wuchunjie/gitpulse --json > "$CACHE_FILE.tmp"
jq --arg head "$HEAD_HASH" '.head = $head' "$CACHE_FILE.tmp" > "$CACHE_FILE"
rm "$CACHE_FILE.tmp"
echo "π Telemetry cache updated"
jq '.metrics' "$CACHE_FILE"
Quick Start Guide
- Verify prerequisites: Ensure Node.js
>=18 and Git >=2.30 are installed. Run node --version and git --version to confirm.
- Initialize telemetry: Navigate to any Git repository and execute
npx @wuchunjie/gitpulse. The tool downloads automatically, parses local history, and renders metrics in under one second.
- Customize output: Pipe results to
grep, awk, or jq for targeted filtering. Example: npx @wuchunjie/gitpulse | grep -A 5 "Top Contributors"
- Integrate with workflows: Add the tool to
package.json scripts ("scripts": {"telemetry": "npx @wuchunjie/gitpulse"}) or CI pipelines for automated standup and retro reporting.
- Scale for large repos: If execution exceeds 2 seconds, implement the streaming parser pattern from the Core Solution section or add
--since="30 days ago" to limit historical scope.