ers.
#!/usr/bin/env bash
# diagnostic_check.sh
TARGET_URL="${1:-https://production-domain.com}"
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}:%{time_total}" "$TARGET_URL")
STATUS_CODE="${RESPONSE%%:*}"
RESPONSE_TIME="${RESPONSE##*:}"
echo "Status: $STATUS_CODE | Latency: ${RESPONSE_TIME}s"
If the status code returns 000 or times out, the failure is likely DNS or network-level. If it returns a valid HTTP code, proceed to application triage. Always verify from a secondary network path (mobile data or different ISP) to rule out local resolver caching.
Phase 2: Isolate the Failure Domain
WordPress failures typically originate in one of three layers: infrastructure, application, or data. Use WP-CLI to bypass the web server and test core functionality directly.
# Test database connectivity without loading the full application
wp --path=/var/www/production-env db check --allow-root
# Verify core file integrity against official WordPress checksums
wp --path=/var/www/production-env core verify-checksums --allow-root
If db check fails, the issue is data-layer related. If checksums report mismatches, core files have been altered. If both pass, the failure is isolated to plugins, themes, or server configuration.
Phase 3: Inspect Logs and Resources
Application logs and server metrics provide the ground truth for PHP-level failures. Enable structured debugging only when necessary, and never expose errors to end users.
// wp-config.php - Production-safe debug configuration
define( 'WP_DEBUG', true );
define( 'WP_DEBUG_LOG', true );
define( 'WP_DEBUG_DISPLAY', false );
define( 'SCRIPT_DEBUG', false );
With this configuration, PHP fatal errors route to wp-content/debug.log without rendering on-screen. Monitor the log in real-time during reproduction:
tail -f /var/www/production-env/wp-content/debug.log | grep -E "(Fatal|Parse|Allowed memory)"
Simultaneously, verify server resource constraints:
# Check disk utilization and inode exhaustion
df -h /var/www/production-env
df -i /var/www/production-env
# Monitor load average and memory pressure
uptime
free -m | grep -E "Mem|Swap"
Disk full conditions or inode exhaustion will silently break PHP execution. Load averages above 4.0 on shared environments indicate CPU throttling, which often triggers 503 responses or PHP-FPM worker exhaustion.
Remediation must match the isolated failure domain. Never apply blanket fixes.
Application Layer (500 / White Screen)
Rewrite rule corruption and plugin conflicts are the most frequent culprits. Isolate them sequentially:
# Backup and neutralize rewrite rules
cp /var/www/production-env/.htaccess /var/www/production-env/.htaccess.bak
echo "# RewriteEngine Off" > /var/www/production-env/.htaccess
# Deactivate all plugins via database query (bypasses admin UI)
wp --path=/var/www/production-env plugin deactivate --all --allow-root
# Reactivate plugins in batches to identify the conflict
wp --path=/var/www/production-env plugin activate woocommerce --allow-root
wp --path=/var/www/production-env plugin activate advanced-custom-fields --allow-root
If the site recovers after .htaccess neutralization, regenerate rules via the admin dashboard. If plugins caused the crash, isolate the faulty extension by activating them one at a time while monitoring debug.log.
Data Layer (Database Connection Failure)
Verify credentials match the actual database instance, then test connectivity outside WordPress:
# Extract credentials from wp-config.php
grep -E "DB_NAME|DB_USER|DB_PASSWORD|DB_HOST" /var/www/production-env/wp-config.php
# Test raw MySQL connectivity
mysql -u production_user -p'complex_password_here' -h db.internal.cluster -e "SELECT 1;"
If credentials are correct but the connection fails, the database server may be unreachable, the connection pool may be exhausted, or firewall rules may be blocking port 3306. Contact infrastructure support immediately; do not attempt manual database repairs without a verified backup.
Security Compromise (Unexpected Redirects)
Malware infections typically modify core files or inject payloads into wp-config.php and index.php. Never attempt manual cleanup; infections are rarely isolated.
# Identify recently modified PHP files
find /var/www/production-env -name "*.php" -mtime -1 -exec ls -la {} \;
# Inspect file headers for injected base64 or eval statements
head -n 20 /var/www/production-env/index.php
head -n 20 /var/www/production-env/wp-config.php
If core files are altered, restore from a verified, off-server backup. Manual patching leaves residual backdoors and breaks checksum verification.
Phase 5: Validate and Communicate
After applying fixes, verify recovery across multiple endpoints:
# Validate HTTP status and response time
curl -s -o /dev/null -w "%{http_code}:%{time_total}" https://production-domain.com
curl -s -o /dev/null -w "%{http_code}:%{time_total}" https://production-domain.com/wp-admin/
# Flush object cache and opcache
wp --path=/var/www/production-env cache flush --allow-root
php-fpm -t && systemctl reload php8.1-fpm
Notify stakeholders immediately upon confirmation. Provide a concise status update, a root cause summary, and a timeline for the post-incident report. Never delay communication until the fix is fully deployed; uncertainty damages trust faster than downtime.
Architecture Decisions and Rationale
- Why bypass the admin UI during triage? The WordPress dashboard loads the entire plugin ecosystem, themes, and autoloaded options. If the application is failing, the UI will likely timeout or crash, wasting time. WP-CLI operates at the PHP binary level, skipping HTTP routing and providing direct database access.
- Why neutralize
.htaccess before deactivating plugins? Rewrite rule corruption is a silent failure vector. Apache/Nginx will throw a 500 error before PHP even initializes. Testing .htaccess first eliminates a high-probability, low-effort fix.
- Why avoid manual malware cleanup? WordPress infections rarely exist in a single file. Attackers embed persistence mechanisms in cron jobs, database options, and theme functions. Restoration from a verified backup guarantees a clean state and preserves file integrity checksums.
- Why separate credential verification from connection testing? A typo in
wp-config.php is trivial to fix. A database server outage requires infrastructure intervention. Testing both independently prevents misrouting the incident to the wrong support team.
Pitfall Guide
| Pitfall | Explanation | Fix |
|---|
| Clearing caches before isolating the error | Object cache and OPcache can mask PHP fatal errors or database connection failures. Clearing them prematurely removes diagnostic signals. | Disable caching layers first (wp cache flush, restart PHP-FPM), then reproduce the error to capture accurate logs. |
| Editing core files during triage | Modifying wp-includes or wp-admin files breaks WordPress checksum verification and complicates rollback. | Use WP-CLI core verify-checksums to detect alterations. Restore core files from backup instead of patching them live. |
| Assuming 500 errors are always plugin-related | Rewrite rule syntax errors, PHP version mismatches, and missing extensions also trigger 500 responses. Plugin deactivation won't resolve infrastructure or configuration faults. | Check .htaccess syntax, verify PHP version compatibility, and inspect server error logs before touching plugins. |
| Restoring backups without verifying integrity | Restoring a corrupted or outdated backup propagates the failure state and wastes recovery time. | Validate backup timestamps, run wp db check post-restore, and verify file checksums before declaring recovery complete. |
| Ignoring server load during plugin deactivation | Deactivating plugins triggers uninstall hooks, option deletions, and database writes. On resource-constrained servers, this can spike CPU and prolong downtime. | Deactivate plugins in small batches, monitor uptime and iostat, and schedule heavy operations during low-traffic windows. |
| Communicating fixes without root cause analysis | Clients and stakeholders require context. Delivering a fix without explaining the failure domain leads to recurring incidents and eroded trust. | Document the diagnostic signal, failure domain, and remediation step. Include this in the post-incident report and update monitoring rules accordingly. |
| Leaving debug mode enabled in production | WP_DEBUG and WP_DEBUG_DISPLAY expose stack traces, file paths, and database queries to end users, creating security and performance risks. | Set WP_DEBUG_DISPLAY to false, route logs to debug.log, and disable debug constants immediately after recovery. |
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
500 error with clean checksums | Deactivate plugins via WP-CLI, then reactivate in batches | Isolates plugin conflict without touching core files or database | Low (CPU/IO only) |
| Database connection failure | Verify wp-config.php credentials, test raw MySQL, contact hosting if unreachable | Separates configuration errors from infrastructure outages | Medium (Support ticket + potential failover) |
| Malware redirect detected | Restore from verified off-server backup immediately | Manual cleanup leaves residual backdoors and breaks integrity checks | High (Backup storage + restore window) |
High load + 503 | Scale horizontally or enable maintenance mode, then investigate resource spikes | Prevents cascading failures while preserving database state | Medium (Infrastructure scaling costs) |
| Checkout/payment broken | Inspect JavaScript console, verify gateway API status, check webhook logs | Integration failures rarely affect core CMS functionality | Low (API rate limits + debugging time) |
Configuration Template
// wp-config.php - Production Debug & Safety Configuration
define( 'WP_DEBUG', true );
define( 'WP_DEBUG_LOG', true );
define( 'WP_DEBUG_DISPLAY', false );
define( 'SCRIPT_DEBUG', false );
define( 'WP_MEMORY_LIMIT', '256M' );
define( 'WP_MAX_MEMORY_LIMIT', '512M' );
// Disable automatic updates during active incident response
define( 'WP_AUTO_UPDATE_CORE', false );
define( 'AUTOMATIC_UPDATER_DISABLED', true );
#!/usr/bin/env bash
# uptime_monitor.sh - Lightweight production health check
TARGET_URL="${1:-https://production-domain.com}"
ALERT_EMAIL="ops@yourdomain.com"
MAX_LATENCY="3.0"
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}:%{time_total}" "$TARGET_URL")
STATUS="${RESPONSE%%:*}"
LATENCY="${RESPONSE##*:}"
if [[ "$STATUS" -ne 200 ]] || (( $(echo "$LATENCY > $MAX_LATENCY" | bc -l) )); then
echo "ALERT: $TARGET_URL returned $STATUS (${LATENCY}s)" | mail -s "WordPress Incident: $STATUS" "$ALERT_EMAIL"
fi
Quick Start Guide
- Deploy the diagnostic script: Save
diagnostic_check.sh to your operations directory and make it executable (chmod +x diagnostic_check.sh). Run it against the target domain to capture baseline status and latency.
- Configure safe debugging: Add the
wp-config.php template to your production environment. Ensure WP_DEBUG_DISPLAY remains false to prevent information leakage.
- Isolate the failure domain: Execute
wp db check and wp core verify-checksums. Route the incident to application, data, or infrastructure based on the output.
- Apply targeted remediation: Use the symptom-to-remediation matrix to execute the correct fix. Avoid blanket plugin deactivation or full-site restores unless checksums or database integrity are compromised.
- Validate and monitor: Flush caches, restart PHP-FPM, and run the uptime monitor script. Confirm recovery across frontend and admin endpoints before closing the incident ticket.