Skip to content

Troubleshooting Guide

Solutions for common issues when running Tinkero.

Table of Contents

  1. Build Failures
  2. Deployment Accessibility Issues
  3. GitHub Webhook Problems
  4. SSL Certificate Issues
  5. Redis Connection Errors
  6. Build Timeouts
  7. Disk Space Issues
  8. GitHub Credential Validation
  9. Service Health Issues
  10. Sentry Issues
  11. Quick Diagnostics

Build Failures

npm install fails

Symptoms: - Build fails during dependency installation - Error message mentions npm ERR! or ENOENT

Common Causes and Solutions:

Cause Solution
Missing package-lock.json Commit package-lock.json to repository
Incompatible Node version Update nodeVersion in .tinkero.yml
Private npm registry Configure registry in .npmrc
Memory issues Check server has sufficient RAM

Debugging:

# View build logs
docker compose logs -f webhook-handler

# Look for specific error
docker compose logs webhook-handler | grep -i "npm ERR"

Example Fix:

# .tinkero.yml - Use npm ci for cleaner installs
installCommand: npm ci
nodeVersion: "20"


Build command fails

Symptoms: - Installation succeeds but build fails - Error during npm run build

Common Causes:

  1. Missing environment variables:

    # .tinkero.yml
    env:
      NODE_ENV: production
      API_URL: https://api.example.com
    

  2. Wrong output directory:

    # Check your framework's output directory
    outputDir: dist  # Vite
    outputDir: build # Create React App
    outputDir: out   # Next.js static export
    

  3. TypeScript errors:

    # Build locally first to check for errors
    npm run build
    

Debugging:

# View detailed build output
docker compose logs webhook-handler | grep -A 50 "Running build"


Output directory not found

Symptoms: - Build completes but deployment fails - Error: "output directory not found"

Solutions:

  1. Verify output directory matches build output:

    # Check what your build creates
    npm run build
    ls -la dist/  # or build/, out/, public/
    

  2. Update .tinkero.yml:

    outputDir: dist  # Match your actual output directory
    

  3. Check for conditional builds: Some builds only create output in production mode:

    env:
      NODE_ENV: production
    


Build produces empty output

Symptoms: - Build completes without errors - Deployed site shows nothing or 404

Causes: - Build output went to wrong directory - Build requires specific environment variables - Static export not configured (Next.js)

Solutions:

  1. Next.js - Enable static export:

    // next.config.js
    module.exports = {
      output: 'export',
    }
    

  2. Check base path configuration: yaml # .tinkero.yml env: BASE_URL: / PUBLIC_URL: /

Deployment Accessibility Issues

Site returns 404

Symptoms: - Deployment shows success - Site URL returns 404

Diagnostic Steps:

  1. Check if files exist:

    ls -la /srv/tinkero/sites/my-app/current/
    

  2. Check symlink:

    readlink /srv/tinkero/sites/my-app/current
    

  3. Check Caddy configuration:

    docker exec caddy wget -qO- http://localhost:2019/config/ | jq
    

  4. Check Traefik routing:

    curl -I https://my-app.tkr.lair.nntin.xyz/
    

Solutions:

Issue Solution
Files missing Re-trigger deployment
Symlink broken Check release directory exists
Caddy not updated Restart Caddy: docker compose restart caddy
Traefik routing Check Traefik dashboard

Site returns 502 Bad Gateway

Symptoms: - Traefik returns 502 error - Site was working before

Causes: - Caddy container is down - Caddy is overloaded - Network connectivity issue

Solutions:

  1. Check Caddy status:

    docker compose ps caddy
    

  2. Restart Caddy:

    docker compose restart caddy
    

  3. Check Caddy logs:

    docker compose logs --tail 50 caddy
    


Site shows old content

Symptoms: - Pushed new changes - Site still shows old content

Causes: - Deployment didn't trigger - Browser caching - CDN caching (if using Cloudflare)

Solutions:

  1. Verify deployment triggered:

    # Check webhook-handler logs
    docker compose logs --tail 100 webhook-handler | grep "deployment"
    

  2. Check current release:

    ls -la /srv/tinkero/sites/my-app/releases/ | tail -5
    readlink /srv/tinkero/sites/my-app/current
    

  3. Clear browser cache:

  4. Hard refresh: Ctrl+Shift+R (Windows/Linux) or Cmd+Shift+R (Mac)

  5. Clear Cloudflare cache:

  6. Dashboard > Caching > Configuration > Purge Everything

GitHub Webhook Problems

Webhooks not being received

Symptoms: - Push to repository - No deployment triggered - No logs in webhook-handler

Diagnostic Steps:

  1. Check GitHub webhook deliveries:
  2. Go to GitHub App settings
  3. Click Advanced
  4. Check Recent Deliveries

  5. Verify webhook URL:

  6. Should be: https://lair.nntin.xyz/tinkero/webhook

  7. Test webhook URL:

    curl -I https://lair.nntin.xyz/tinkero/webhook
    # Should return 405 Method Not Allowed (GET not supported)
    

Common Issues:

Issue Solution
Wrong URL Update in GitHub App settings
App not installed Install on repository
Branch filter Check .tinkero.yml branch setting
Firewall Ensure port 443 is open

Webhook signature validation failed

Symptoms: - Webhooks received but rejected - Error: "signature validation failed" - GitHub shows 401 response

Solution:

  1. Verify webhook secret matches:

    # Check .env
    grep GITHUB_WEBHOOK_SECRET .env
    

  2. Update secret:

    tinkero config
    # Re-enter webhook secret
    

  3. Check for whitespace: The secret might have leading/trailing spaces.


Webhook returns 500 error

Symptoms: - GitHub shows 500 response - webhook-handler is crashing

Debugging:

# Check webhook-handler logs
docker compose logs -f webhook-handler

# Check if container is running
docker compose ps webhook-handler

# Restart service
docker compose restart webhook-handler

Common Causes: - Invalid private key - Redis connection failed - Out of memory

SSL Certificate Issues

Certificate not issued

Symptoms: - Site shows SSL error - Browser warns "connection not secure" - Traefik logs show ACME errors

Diagnostic Steps:

  1. Check Traefik logs:

    docker compose logs traefik | grep -i "acme\|certificate\|cloudflare"
    

  2. Check acme.json:

    cat data/acme.json | jq '.letsencrypt.Certificates'
    

  3. Verify DNS configuration:

    # Check DNS resolves
    dig +short yourdomain.com
    
    # Verify domain uses Cloudflare nameservers
    dig NS yourdomain.com
    # Should show *.ns.cloudflare.com
    

  4. Verify Cloudflare API token:

    # Test token validity
    curl -X GET "https://api.cloudflare.com/client/v4/user/tokens/verify" \
      -H "Authorization: Bearer YOUR_CLOUDFLARE_TOKEN" \
      -H "Content-Type: application/json"
    
    # Should return: "success":true
    

Common Issues:

Issue Solution
Cloudflare token invalid Create new API token with Zone:DNS:Edit permission
Domain not on Cloudflare Transfer domain nameservers to Cloudflare
DNS not propagated Wait up to 48 hours after nameserver change
Token missing permissions Recreate token with Zone:DNS:Edit and Zone:Zone:Read
Wrong zone selected Verify token has access to your specific domain
Rate limited Wait 1 hour, check Let's Encrypt status page
Wrong domain in .env Update DOMAIN in .env and restart
CF_DNS_API_TOKEN not set Check CLOUDFLARE_DNS_API_TOKEN in .env

Cloudflare DNS-01 Challenge Verification:

# Check if DNS-01 challenge is working
docker compose logs traefik | grep "dnschallenge"

# Should see logs like:
# - "Trying to solve DNS-01"
# - "Waiting for DNS propagation"
# - "The DNS challenge is complete"

Fix Steps:

  1. Verify .env configuration:

    # Check .env has Cloudflare token
    grep CLOUDFLARE_DNS_API_TOKEN .env
    # Should show: CLOUDFLARE_DNS_API_TOKEN=your_token_here
    

  2. Restart Traefik with fresh attempt:

    # Clear old certificates
    sudo rm -f traefik-certs/acme.json
    
    # Restart Traefik
    docker compose restart traefik
    
    # Watch logs for certificate acquisition
    docker compose logs -f traefik
    

  3. Verify Cloudflare API access:

    # List zones accessible with token
    curl -X GET "https://api.cloudflare.com/client/v4/zones" \
      -H "Authorization: Bearer YOUR_TOKEN" \
      -H "Content-Type: application/json"
    
    # Should list your domain
    


Certificate expired

Symptoms: - Site was working, now shows SSL error - Certificate expired warning

Solution:

  1. Check certificate:

    echo | openssl s_client -connect yourdomain.com:443 2>/dev/null | \
      openssl x509 -noout -dates
    

  2. Force renewal:

    # Remove acme.json (will re-request certificates)
    rm data/acme.json
    docker compose restart traefik
    

  3. Check Traefik can reach Let's Encrypt:

    docker exec traefik wget -qO- https://acme-v02.api.letsencrypt.org/directory
    

Redis Connection Errors

Redis connection refused

Symptoms: - tinkero health shows Redis disconnected - Error: "connection refused"

Solution:

  1. Check Redis is running:

    docker compose ps redis
    

  2. Restart Redis:

    docker compose restart redis
    

  3. Check Redis logs:

    docker compose logs --tail 50 redis
    

  4. Test connection:

    docker exec redis redis-cli ping
    # Should return: PONG
    


Redis out of memory

Symptoms: - Redis commands fail - Error: "OOM command not allowed"

Solution:

  1. Check memory usage:

    docker exec redis redis-cli INFO memory
    

  2. Clear old data:

    docker exec redis redis-cli FLUSHDB
    

  3. Increase memory limit:

    # docker-compose.yml
    services:
      redis:
        command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
    

Build Timeouts

Build exceeds time limit

Symptoms: - Build fails after timeout - Error: "build timed out"

Solutions:

  1. Optimize build:
  2. Use npm ci instead of npm install
  3. Enable build caching
  4. Remove unused dependencies

  5. Use pre-built sites:

    # .tinkero.yml
    skipBuild: true
    outputDir: dist
    
    Build locally or in CI/CD and commit built files.

  6. Check for infinite loops:

  7. Review build scripts
  8. Check for circular dependencies

Clone timeout

Symptoms: - Build fails during repository clone - Error: "clone timed out"

Causes: - Large repository - Slow network - GitHub rate limiting

Solutions:

  1. Use shallow clone (if supported): Large repos take longer; consider splitting.

  2. Check network:

    # Test GitHub connectivity
    docker exec webhook-handler wget -qO- https://api.github.com
    

  3. Check rate limits:

    curl -s https://api.github.com/rate_limit
    

Disk Space Issues

Disk full

Symptoms: - Builds fail - Services crash - Error: "no space left on device"

Immediate Actions:

# Check disk usage
df -h

# Find large directories
du -sh /srv/tinkero/sites/* | sort -hr | head -10

# Clean old releases
tinkero cleanup

Additional Cleanup:

# Docker cleanup
docker system prune -a -f
docker volume prune -f

# Clean logs
sudo journalctl --vacuum-time=3d

Prevention: - Set up automated cleanup (see Operations Guide) - Monitor disk usage in the central Grafana dashboard - Configure log rotation


Sites directory permissions

Symptoms: - Deployment fails - Error: "permission denied"

Solution:

# Check permissions
ls -la /srv/tinkero/sites/

# Fix permissions
sudo chown -R root:docker /srv/tinkero/sites/
sudo chmod -R 775 /srv/tinkero/sites/

GitHub Credential Validation

Authentication failed (401)

Error:

❌ Authentication failed (401 Unauthorized)

Causes: - Wrong App ID - Private key doesn't match App ID - Private key was regenerated

Solutions:

  1. Verify App ID:
  2. Go to https://github.com/settings/apps
  3. Check the App ID matches

  4. Regenerate and re-download key:

  5. Go to App settings
  6. Generate new private key
  7. Download and update path

  8. Run config wizard:

    tinkero config
    


App not found (404)

Error:

❌ GitHub App not found (404 Not Found)

Causes: - App ID is incorrect - App was deleted

Solution:

  1. Verify app exists:
  2. Go to https://github.com/settings/apps
  3. Find your app and note the correct ID

  4. Re-run configuration:

    tinkero config
    


Private key parse error

Error:

❌ Failed to generate JWT token: failed to parse private key

Causes: - Corrupted key file - Wrong file downloaded - Key was overwritten

Solutions:

  1. Check key file:

    cat /srv/tinkero/github-app-key.pem
    # Should start with: -----BEGIN RSA PRIVATE KEY-----
    

  2. Re-download key:

  3. Go to GitHub App settings
  4. Generate new private key
  5. Download fresh copy

  6. Check permissions:

    ls -la /srv/tinkero/github-app-key.pem
    # Should be readable (at least 400)
    

Service Health Issues

Container keeps restarting

Symptoms: - Container status shows "Restarting" - Service unavailable intermittently

Debugging:

# Check container status
docker compose ps

# View restart count
docker inspect webhook-handler --format='{{.RestartCount}}'

# Check logs for errors
docker compose logs --tail 100 webhook-handler

Common Causes:

Cause Solution
Missing .env values Check all required vars are set
Port conflict Check no other service using same port
Memory limit Increase container memory
Bad configuration Review recent changes

All services unhealthy

Symptoms: - tinkero health shows all services unhealthy - Nothing is working

Recovery Steps:

# 1. Check Docker
sudo systemctl status docker

# 2. Full restart
docker compose down
docker compose up -d

# 3. Check for resource issues
free -h
df -h

# 4. Review logs
docker compose logs --tail 50 | head -100

Sentry Issues

No events appear in Sentry

Symptoms: - Sentry project shows no new events - Errors are visible in logs but not in Sentry

Common Causes and Solutions:

Cause Solution
SENTRY_DSN not set Add SENTRY_DSN to .env and restart
Sample rate set to 0.0 Set SENTRY_TRACES_SAMPLE_RATE to a non-zero value for traces
Network egress blocked Allow outbound HTTPS to *.ingest.sentry.io
Service not restarted Restart webhook-handler after updating .env

Debugging:

# Check env variables are loaded
docker compose exec webhook-handler env | grep SENTRY

# Check service logs for initialization
docker compose logs webhook-handler | grep -i sentry

Invalid DSN or authentication errors

Symptoms: - Logs show Sentry init errors - No events appear despite DSN set

Solutions:

  1. Re-copy DSN from Sentry:
  2. Project Settings → Client Keys (DSN)
  3. Verify .env formatting:
  4. Ensure no quotes or trailing spaces
  5. Restart services:
    docker compose restart webhook-handler
    

Quick Diagnostics

Health Check Command

tinkero health

Service Status

docker compose ps

Recent Logs

# All services
docker compose logs --tail 50

# Specific service
docker compose logs --tail 50 webhook-handler

Network Check

# Test internal connectivity
docker exec webhook-handler wget -qO- http://redis:6379
docker exec webhook-handler wget -qO- http://caddy:2019/config/

Resource Check

# Memory
free -h

# Disk
df -h

# Docker resources
docker system df

Full Diagnostic Script

#!/bin/bash
echo "=== Tinkero Diagnostic Report ==="
echo ""
echo "=== System Resources ==="
free -h
df -h /
echo ""
echo "=== Docker Status ==="
docker compose ps
echo ""
echo "=== Recent Errors ==="
docker compose logs --tail 20 2>&1 | grep -i "error\|fail\|fatal" | tail -10
echo ""
echo "=== Health Check ==="
tinkero health

Still stuck? - Check the FAQ for common questions - Open an issue on GitHub with diagnostic output