Troubleshooting Guide¶
Solutions for common issues when running Tinkero.
Table of Contents¶
- Build Failures
- Deployment Accessibility Issues
- GitHub Webhook Problems
- SSL Certificate Issues
- Redis Connection Errors
- Build Timeouts
- Disk Space Issues
- GitHub Credential Validation
- Service Health Issues
- Sentry Issues
- Quick Diagnostics
Build Failures¶
npm install fails¶
Symptoms:
- Build fails during dependency installation
- Error message mentions npm ERR! or ENOENT
Common Causes and Solutions:
| Cause | Solution |
|---|---|
Missing package-lock.json |
Commit package-lock.json to repository |
| Incompatible Node version | Update nodeVersion in .tinkero.yml |
| Private npm registry | Configure registry in .npmrc |
| Memory issues | Check server has sufficient RAM |
Debugging:
# View build logs
docker compose logs -f webhook-handler
# Look for specific error
docker compose logs webhook-handler | grep -i "npm ERR"
Example Fix:
Build command fails¶
Symptoms:
- Installation succeeds but build fails
- Error during npm run build
Common Causes:
-
Missing environment variables:
-
Wrong output directory:
-
TypeScript errors:
Debugging:
Output directory not found¶
Symptoms: - Build completes but deployment fails - Error: "output directory not found"
Solutions:
-
Verify output directory matches build output:
-
Update
.tinkero.yml: -
Check for conditional builds: Some builds only create output in production mode:
Build produces empty output¶
Symptoms: - Build completes without errors - Deployed site shows nothing or 404
Causes: - Build output went to wrong directory - Build requires specific environment variables - Static export not configured (Next.js)
Solutions:
-
Next.js - Enable static export:
-
Check base path configuration:
yaml # .tinkero.yml env: BASE_URL: / PUBLIC_URL: /
Deployment Accessibility Issues¶
Site returns 404¶
Symptoms: - Deployment shows success - Site URL returns 404
Diagnostic Steps:
-
Check if files exist:
-
Check symlink:
-
Check Caddy configuration:
-
Check Traefik routing:
Solutions:
| Issue | Solution |
|---|---|
| Files missing | Re-trigger deployment |
| Symlink broken | Check release directory exists |
| Caddy not updated | Restart Caddy: docker compose restart caddy |
| Traefik routing | Check Traefik dashboard |
Site returns 502 Bad Gateway¶
Symptoms: - Traefik returns 502 error - Site was working before
Causes: - Caddy container is down - Caddy is overloaded - Network connectivity issue
Solutions:
-
Check Caddy status:
-
Restart Caddy:
-
Check Caddy logs:
Site shows old content¶
Symptoms: - Pushed new changes - Site still shows old content
Causes: - Deployment didn't trigger - Browser caching - CDN caching (if using Cloudflare)
Solutions:
-
Verify deployment triggered:
-
Check current release:
-
Clear browser cache:
-
Hard refresh: Ctrl+Shift+R (Windows/Linux) or Cmd+Shift+R (Mac)
-
Clear Cloudflare cache:
- Dashboard > Caching > Configuration > Purge Everything
GitHub Webhook Problems¶
Webhooks not being received¶
Symptoms: - Push to repository - No deployment triggered - No logs in webhook-handler
Diagnostic Steps:
- Check GitHub webhook deliveries:
- Go to GitHub App settings
- Click Advanced
-
Check Recent Deliveries
-
Verify webhook URL:
-
Should be:
https://lair.nntin.xyz/tinkero/webhook -
Test webhook URL:
Common Issues:
| Issue | Solution |
|---|---|
| Wrong URL | Update in GitHub App settings |
| App not installed | Install on repository |
| Branch filter | Check .tinkero.yml branch setting |
| Firewall | Ensure port 443 is open |
Webhook signature validation failed¶
Symptoms: - Webhooks received but rejected - Error: "signature validation failed" - GitHub shows 401 response
Solution:
-
Verify webhook secret matches:
-
Update secret:
-
Check for whitespace: The secret might have leading/trailing spaces.
Webhook returns 500 error¶
Symptoms: - GitHub shows 500 response - webhook-handler is crashing
Debugging:
# Check webhook-handler logs
docker compose logs -f webhook-handler
# Check if container is running
docker compose ps webhook-handler
# Restart service
docker compose restart webhook-handler
Common Causes: - Invalid private key - Redis connection failed - Out of memory
SSL Certificate Issues¶
Certificate not issued¶
Symptoms: - Site shows SSL error - Browser warns "connection not secure" - Traefik logs show ACME errors
Diagnostic Steps:
-
Check Traefik logs:
-
Check acme.json:
-
Verify DNS configuration:
-
Verify Cloudflare API token:
Common Issues:
| Issue | Solution |
|---|---|
| Cloudflare token invalid | Create new API token with Zone:DNS:Edit permission |
| Domain not on Cloudflare | Transfer domain nameservers to Cloudflare |
| DNS not propagated | Wait up to 48 hours after nameserver change |
| Token missing permissions | Recreate token with Zone:DNS:Edit and Zone:Zone:Read |
| Wrong zone selected | Verify token has access to your specific domain |
| Rate limited | Wait 1 hour, check Let's Encrypt status page |
| Wrong domain in .env | Update DOMAIN in .env and restart |
| CF_DNS_API_TOKEN not set | Check CLOUDFLARE_DNS_API_TOKEN in .env |
Cloudflare DNS-01 Challenge Verification:
# Check if DNS-01 challenge is working
docker compose logs traefik | grep "dnschallenge"
# Should see logs like:
# - "Trying to solve DNS-01"
# - "Waiting for DNS propagation"
# - "The DNS challenge is complete"
Fix Steps:
-
Verify .env configuration:
-
Restart Traefik with fresh attempt:
-
Verify Cloudflare API access:
Certificate expired¶
Symptoms: - Site was working, now shows SSL error - Certificate expired warning
Solution:
-
Check certificate:
-
Force renewal:
-
Check Traefik can reach Let's Encrypt:
Redis Connection Errors¶
Redis connection refused¶
Symptoms:
- tinkero health shows Redis disconnected
- Error: "connection refused"
Solution:
-
Check Redis is running:
-
Restart Redis:
-
Check Redis logs:
-
Test connection:
Redis out of memory¶
Symptoms: - Redis commands fail - Error: "OOM command not allowed"
Solution:
-
Check memory usage:
-
Clear old data:
-
Increase memory limit:
Build Timeouts¶
Build exceeds time limit¶
Symptoms: - Build fails after timeout - Error: "build timed out"
Solutions:
- Optimize build:
- Use
npm ciinstead ofnpm install - Enable build caching
-
Remove unused dependencies
-
Use pre-built sites:
Build locally or in CI/CD and commit built files. -
Check for infinite loops:
- Review build scripts
- Check for circular dependencies
Clone timeout¶
Symptoms: - Build fails during repository clone - Error: "clone timed out"
Causes: - Large repository - Slow network - GitHub rate limiting
Solutions:
-
Use shallow clone (if supported): Large repos take longer; consider splitting.
-
Check network:
-
Check rate limits:
Disk Space Issues¶
Disk full¶
Symptoms: - Builds fail - Services crash - Error: "no space left on device"
Immediate Actions:
# Check disk usage
df -h
# Find large directories
du -sh /srv/tinkero/sites/* | sort -hr | head -10
# Clean old releases
tinkero cleanup
Additional Cleanup:
# Docker cleanup
docker system prune -a -f
docker volume prune -f
# Clean logs
sudo journalctl --vacuum-time=3d
Prevention: - Set up automated cleanup (see Operations Guide) - Monitor disk usage in the central Grafana dashboard - Configure log rotation
Sites directory permissions¶
Symptoms: - Deployment fails - Error: "permission denied"
Solution:
# Check permissions
ls -la /srv/tinkero/sites/
# Fix permissions
sudo chown -R root:docker /srv/tinkero/sites/
sudo chmod -R 775 /srv/tinkero/sites/
GitHub Credential Validation¶
Authentication failed (401)¶
Error:
Causes: - Wrong App ID - Private key doesn't match App ID - Private key was regenerated
Solutions:
- Verify App ID:
- Go to https://github.com/settings/apps
-
Check the App ID matches
-
Regenerate and re-download key:
- Go to App settings
- Generate new private key
-
Download and update path
-
Run config wizard:
App not found (404)¶
Error:
Causes: - App ID is incorrect - App was deleted
Solution:
- Verify app exists:
- Go to https://github.com/settings/apps
-
Find your app and note the correct ID
-
Re-run configuration:
Private key parse error¶
Error:
Causes: - Corrupted key file - Wrong file downloaded - Key was overwritten
Solutions:
-
Check key file:
-
Re-download key:
- Go to GitHub App settings
- Generate new private key
-
Download fresh copy
-
Check permissions:
Service Health Issues¶
Container keeps restarting¶
Symptoms: - Container status shows "Restarting" - Service unavailable intermittently
Debugging:
# Check container status
docker compose ps
# View restart count
docker inspect webhook-handler --format='{{.RestartCount}}'
# Check logs for errors
docker compose logs --tail 100 webhook-handler
Common Causes:
| Cause | Solution |
|---|---|
| Missing .env values | Check all required vars are set |
| Port conflict | Check no other service using same port |
| Memory limit | Increase container memory |
| Bad configuration | Review recent changes |
All services unhealthy¶
Symptoms:
- tinkero health shows all services unhealthy
- Nothing is working
Recovery Steps:
# 1. Check Docker
sudo systemctl status docker
# 2. Full restart
docker compose down
docker compose up -d
# 3. Check for resource issues
free -h
df -h
# 4. Review logs
docker compose logs --tail 50 | head -100
Sentry Issues¶
No events appear in Sentry¶
Symptoms: - Sentry project shows no new events - Errors are visible in logs but not in Sentry
Common Causes and Solutions:
| Cause | Solution |
|---|---|
SENTRY_DSN not set |
Add SENTRY_DSN to .env and restart |
Sample rate set to 0.0 |
Set SENTRY_TRACES_SAMPLE_RATE to a non-zero value for traces |
| Network egress blocked | Allow outbound HTTPS to *.ingest.sentry.io |
| Service not restarted | Restart webhook-handler after updating .env |
Debugging:
# Check env variables are loaded
docker compose exec webhook-handler env | grep SENTRY
# Check service logs for initialization
docker compose logs webhook-handler | grep -i sentry
Invalid DSN or authentication errors¶
Symptoms: - Logs show Sentry init errors - No events appear despite DSN set
Solutions:
- Re-copy DSN from Sentry:
- Project Settings → Client Keys (DSN)
- Verify
.envformatting: - Ensure no quotes or trailing spaces
- Restart services:
Quick Diagnostics¶
Health Check Command¶
Service Status¶
Recent Logs¶
# All services
docker compose logs --tail 50
# Specific service
docker compose logs --tail 50 webhook-handler
Network Check¶
# Test internal connectivity
docker exec webhook-handler wget -qO- http://redis:6379
docker exec webhook-handler wget -qO- http://caddy:2019/config/
Resource Check¶
Full Diagnostic Script¶
#!/bin/bash
echo "=== Tinkero Diagnostic Report ==="
echo ""
echo "=== System Resources ==="
free -h
df -h /
echo ""
echo "=== Docker Status ==="
docker compose ps
echo ""
echo "=== Recent Errors ==="
docker compose logs --tail 20 2>&1 | grep -i "error\|fail\|fatal" | tail -10
echo ""
echo "=== Health Check ==="
tinkero health
Still stuck? - Check the FAQ for common questions - Open an issue on GitHub with diagnostic output