Troubleshooting Guide¶

Solutions for common issues when running Tinkero.

Table of Contents¶

Build Failures
Deployment Accessibility Issues
GitHub Webhook Problems
SSL Certificate Issues
Redis Connection Errors
Build Timeouts
Disk Space Issues
GitHub Credential Validation
Service Health Issues
Sentry Issues
Quick Diagnostics

Build Failures¶

npm install fails¶

Symptoms: - Build fails during dependency installation - Error message mentions npm ERR! or ENOENT

Common Causes and Solutions:

Cause	Solution
Missing `package-lock.json`	Commit `package-lock.json` to repository
Incompatible Node version	Update `nodeVersion` in `.tinkero.yml`
Private npm registry	Configure registry in `.npmrc`
Memory issues	Check server has sufficient RAM

Debugging:

# View build logs
docker compose logs -f webhook-handler

# Look for specific error
docker compose logs webhook-handler | grep -i "npm ERR"

Example Fix:

# .tinkero.yml - Use npm ci for cleaner installs
installCommand: npm ci
nodeVersion: "20"

Build command fails¶

Symptoms: - Installation succeeds but build fails - Error during npm run build

Common Causes:

Missing environment variables:

# .tinkero.yml
env:
  NODE_ENV: production
  API_URL: https://api.example.com

Wrong output directory:

# Check your framework's output directory
outputDir: dist  # Vite
outputDir: build # Create React App
outputDir: out   # Next.js static export

TypeScript errors:

# Build locally first to check for errors
npm run build

Debugging:

# View detailed build output
docker compose logs webhook-handler | grep -A 50 "Running build"

Output directory not found¶

Symptoms: - Build completes but deployment fails - Error: "output directory not found"

Solutions:

Verify output directory matches build output:

# Check what your build creates
npm run build
ls -la dist/  # or build/, out/, public/

Update .tinkero.yml:

outputDir: dist  # Match your actual output directory

Check for conditional builds: Some builds only create output in production mode:
```
env:
  NODE_ENV: production
```

Build produces empty output¶

Symptoms: - Build completes without errors - Deployed site shows nothing or 404

Causes: - Build output went to wrong directory - Build requires specific environment variables - Static export not configured (Next.js)

Solutions:

Next.js - Enable static export:

// next.config.js
module.exports = {
  output: 'export',
}

Check base path configuration: yaml # .tinkero.yml env: BASE_URL: / PUBLIC_URL: /

Deployment Accessibility Issues¶

Site returns 404¶

Symptoms: - Deployment shows success - Site URL returns 404

Diagnostic Steps:

Check if files exist:

ls -la /srv/tinkero/sites/my-app/current/

Check symlink:

readlink /srv/tinkero/sites/my-app/current

Check Caddy configuration:

docker exec caddy wget -qO- http://localhost:2019/config/ | jq

Check Traefik routing:

curl -I https://my-app.tkr.lair.nntin.xyz/

Solutions:

Issue	Solution
Files missing	Re-trigger deployment
Symlink broken	Check release directory exists
Caddy not updated	Restart Caddy: `docker compose restart caddy`
Traefik routing	Check Traefik dashboard

Site returns 502 Bad Gateway¶

Symptoms: - Traefik returns 502 error - Site was working before

Causes: - Caddy container is down - Caddy is overloaded - Network connectivity issue

Solutions:

Check Caddy status:
```
docker compose ps caddy
```
Restart Caddy:
```
docker compose restart caddy
```
Check Caddy logs:
```
docker compose logs --tail 50 caddy
```

Site shows old content¶

Symptoms: - Pushed new changes - Site still shows old content

Causes: - Deployment didn't trigger - Browser caching - CDN caching (if using Cloudflare)

Solutions:

Verify deployment triggered:

# Check webhook-handler logs
docker compose logs --tail 100 webhook-handler | grep "deployment"

Check current release:

ls -la /srv/tinkero/sites/my-app/releases/ | tail -5
readlink /srv/tinkero/sites/my-app/current

Clear browser cache:
Hard refresh: Ctrl+Shift+R (Windows/Linux) or Cmd+Shift+R (Mac)
Clear Cloudflare cache:
Dashboard > Caching > Configuration > Purge Everything

GitHub Webhook Problems¶

Webhooks not being received¶

Symptoms: - Push to repository - No deployment triggered - No logs in webhook-handler

Diagnostic Steps:

Check GitHub webhook deliveries:
Go to GitHub App settings
Click Advanced
Check Recent Deliveries
Verify webhook URL:
Should be: https://lair.nntin.xyz/tinkero/webhook

Test webhook URL:

curl -I https://lair.nntin.xyz/tinkero/webhook
# Should return 405 Method Not Allowed (GET not supported)

Common Issues:

Issue	Solution
Wrong URL	Update in GitHub App settings
App not installed	Install on repository
Branch filter	Check `.tinkero.yml` branch setting
Firewall	Ensure port 443 is open

Webhook signature validation failed¶

Symptoms: - Webhooks received but rejected - Error: "signature validation failed" - GitHub shows 401 response

Solution:

Verify webhook secret matches:

# Check .env
grep GITHUB_WEBHOOK_SECRET .env

Update secret:

tinkero config
# Re-enter webhook secret

Check for whitespace: The secret might have leading/trailing spaces.

Webhook returns 500 error¶

Symptoms: - GitHub shows 500 response - webhook-handler is crashing

Debugging:

# Check webhook-handler logs
docker compose logs -f webhook-handler

# Check if container is running
docker compose ps webhook-handler

# Restart service
docker compose restart webhook-handler

Common Causes: - Invalid private key - Redis connection failed - Out of memory

SSL Certificate Issues¶

Certificate not issued¶

Symptoms: - Site shows SSL error - Browser warns "connection not secure" - Traefik logs show ACME errors

Diagnostic Steps:

Check Traefik logs:

docker compose logs traefik | grep -i "acme\|certificate\|cloudflare"

Check acme.json:

cat data/acme.json | jq '.letsencrypt.Certificates'

Verify DNS configuration:

# Check DNS resolves
dig +short yourdomain.com

# Verify domain uses Cloudflare nameservers
dig NS yourdomain.com
# Should show *.ns.cloudflare.com

Verify Cloudflare API token:

# Test token validity
curl -X GET "https://api.cloudflare.com/client/v4/user/tokens/verify" \
  -H "Authorization: Bearer YOUR_CLOUDFLARE_TOKEN" \
  -H "Content-Type: application/json"

# Should return: "success":true

Common Issues:

Issue	Solution
Cloudflare token invalid	Create new API token with Zone:DNS:Edit permission
Domain not on Cloudflare	Transfer domain nameservers to Cloudflare
DNS not propagated	Wait up to 48 hours after nameserver change
Token missing permissions	Recreate token with Zone:DNS:Edit and Zone:Zone:Read
Wrong zone selected	Verify token has access to your specific domain
Rate limited	Wait 1 hour, check Let's Encrypt status page
Wrong domain in .env	Update DOMAIN in .env and restart
CF_DNS_API_TOKEN not set	Check CLOUDFLARE_DNS_API_TOKEN in .env

Cloudflare DNS-01 Challenge Verification:

# Check if DNS-01 challenge is working
docker compose logs traefik | grep "dnschallenge"

# Should see logs like:
# - "Trying to solve DNS-01"
# - "Waiting for DNS propagation"
# - "The DNS challenge is complete"

Fix Steps:

Verify .env configuration:

# Check .env has Cloudflare token
grep CLOUDFLARE_DNS_API_TOKEN .env
# Should show: CLOUDFLARE_DNS_API_TOKEN=your_token_here

Restart Traefik with fresh attempt:

# Clear old certificates
sudo rm -f traefik-certs/acme.json

# Restart Traefik
docker compose restart traefik

# Watch logs for certificate acquisition
docker compose logs -f traefik

Verify Cloudflare API access:

# List zones accessible with token
curl -X GET "https://api.cloudflare.com/client/v4/zones" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json"

# Should list your domain

Certificate expired¶

Symptoms: - Site was working, now shows SSL error - Certificate expired warning

Solution:

Check certificate:

echo | openssl s_client -connect yourdomain.com:443 2>/dev/null | \
  openssl x509 -noout -dates

Force renewal:

# Remove acme.json (will re-request certificates)
rm data/acme.json
docker compose restart traefik

Check Traefik can reach Let's Encrypt:

docker exec traefik wget -qO- https://acme-v02.api.letsencrypt.org/directory

Redis Connection Errors¶

Redis connection refused¶

Symptoms: - tinkero health shows Redis disconnected - Error: "connection refused"

Solution:

Check Redis is running:
```
docker compose ps redis
```
Restart Redis:
```
docker compose restart redis
```
Check Redis logs:
```
docker compose logs --tail 50 redis
```

Test connection:

docker exec redis redis-cli ping
# Should return: PONG

Redis out of memory¶

Symptoms: - Redis commands fail - Error: "OOM command not allowed"

Solution:

Check memory usage:

docker exec redis redis-cli INFO memory

Clear old data:
```
docker exec redis redis-cli FLUSHDB
```

Increase memory limit:

# docker-compose.yml
services:
  redis:
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru

Build Timeouts¶

Build exceeds time limit¶

Symptoms: - Build fails after timeout - Error: "build timed out"

Solutions:

Optimize build:
Use npm ci instead of npm install
Enable build caching
Remove unused dependencies
Use pre-built sites:
```
# .tinkero.yml
skipBuild: true
outputDir: dist
```
Build locally or in CI/CD and commit built files.
Check for infinite loops:
Review build scripts
Check for circular dependencies

Clone timeout¶

Symptoms: - Build fails during repository clone - Error: "clone timed out"

Causes: - Large repository - Slow network - GitHub rate limiting

Solutions:

Use shallow clone (if supported): Large repos take longer; consider splitting.

Check network:

# Test GitHub connectivity
docker exec webhook-handler wget -qO- https://api.github.com

Check rate limits:

curl -s https://api.github.com/rate_limit

Disk Space Issues¶

Disk full¶

Symptoms: - Builds fail - Services crash - Error: "no space left on device"

Immediate Actions:

# Check disk usage
df -h

# Find large directories
du -sh /srv/tinkero/sites/* | sort -hr | head -10

# Clean old releases
tinkero cleanup

Additional Cleanup:

# Docker cleanup
docker system prune -a -f
docker volume prune -f

# Clean logs
sudo journalctl --vacuum-time=3d

Prevention: - Set up automated cleanup (see Operations Guide) - Monitor disk usage in the central Grafana dashboard - Configure log rotation

Sites directory permissions¶

Symptoms: - Deployment fails - Error: "permission denied"

Solution:

# Check permissions
ls -la /srv/tinkero/sites/

# Fix permissions
sudo chown -R root:docker /srv/tinkero/sites/
sudo chmod -R 775 /srv/tinkero/sites/

GitHub Credential Validation¶

Authentication failed (401)¶

Error:

❌ Authentication failed (401 Unauthorized)

Causes: - Wrong App ID - Private key doesn't match App ID - Private key was regenerated

Solutions:

Verify App ID:
Go to https://github.com/settings/apps
Check the App ID matches
Regenerate and re-download key:
Go to App settings
Generate new private key
Download and update path
Run config wizard:
```
tinkero config
```

App not found (404)¶

Error:

❌ GitHub App not found (404 Not Found)

Causes: - App ID is incorrect - App was deleted

Solution:

Verify app exists:
Go to https://github.com/settings/apps
Find your app and note the correct ID
Re-run configuration:
```
tinkero config
```

Private key parse error¶

Error:

❌ Failed to generate JWT token: failed to parse private key

Causes: - Corrupted key file - Wrong file downloaded - Key was overwritten

Solutions:

Check key file:

cat /srv/tinkero/github-app-key.pem
# Should start with: -----BEGIN RSA PRIVATE KEY-----

Re-download key:
Go to GitHub App settings
Generate new private key
Download fresh copy

Check permissions:

ls -la /srv/tinkero/github-app-key.pem
# Should be readable (at least 400)

Service Health Issues¶

Container keeps restarting¶

Symptoms: - Container status shows "Restarting" - Service unavailable intermittently

Debugging:

# Check container status
docker compose ps

# View restart count
docker inspect webhook-handler --format='{{.RestartCount}}'

# Check logs for errors
docker compose logs --tail 100 webhook-handler

Common Causes:

Cause	Solution
Missing .env values	Check all required vars are set
Port conflict	Check no other service using same port
Memory limit	Increase container memory
Bad configuration	Review recent changes

All services unhealthy¶

Symptoms: - tinkero health shows all services unhealthy - Nothing is working

Recovery Steps:

# 1. Check Docker
sudo systemctl status docker

# 2. Full restart
docker compose down
docker compose up -d

# 3. Check for resource issues
free -h
df -h

# 4. Review logs
docker compose logs --tail 50 | head -100

Sentry Issues¶

No events appear in Sentry¶

Symptoms: - Sentry project shows no new events - Errors are visible in logs but not in Sentry

Common Causes and Solutions:

Cause	Solution
`SENTRY_DSN` not set	Add `SENTRY_DSN` to `.env` and restart
Sample rate set to `0.0`	Set `SENTRY_TRACES_SAMPLE_RATE` to a non-zero value for traces
Network egress blocked	Allow outbound HTTPS to `*.ingest.sentry.io`
Service not restarted	Restart `webhook-handler` after updating `.env`

Debugging:

# Check env variables are loaded
docker compose exec webhook-handler env | grep SENTRY

# Check service logs for initialization
docker compose logs webhook-handler | grep -i sentry

Invalid DSN or authentication errors¶

Symptoms: - Logs show Sentry init errors - No events appear despite DSN set

Solutions:

Re-copy DSN from Sentry:
Project Settings → Client Keys (DSN)
Verify .env formatting:
Ensure no quotes or trailing spaces
Restart services:
```
docker compose restart webhook-handler
```

Quick Diagnostics¶

Health Check Command¶

tinkero health

Service Status¶

docker compose ps

Recent Logs¶

# All services
docker compose logs --tail 50

# Specific service
docker compose logs --tail 50 webhook-handler

Network Check¶

# Test internal connectivity
docker exec webhook-handler wget -qO- http://redis:6379
docker exec webhook-handler wget -qO- http://caddy:2019/config/

Resource Check¶

# Memory
free -h

# Disk
df -h

# Docker resources
docker system df

Full Diagnostic Script¶

#!/bin/bash
echo "=== Tinkero Diagnostic Report ==="
echo ""
echo "=== System Resources ==="
free -h
df -h /
echo ""
echo "=== Docker Status ==="
docker compose ps
echo ""
echo "=== Recent Errors ==="
docker compose logs --tail 20 2>&1 | grep -i "error\|fail\|fatal" | tail -10
echo ""
echo "=== Health Check ==="
tinkero health

Still stuck? - Check the FAQ for common questions - Open an issue on GitHub with diagnostic output