Por qué los webhooks importan para el monitoring#

Las alertas por email te llegan tarde o temprano. Las alertas de Slack aparecen en un canal. Pero los webhooks te permiten hacer cosas cuando ocurren incidentes.

Sin webhooks:

Salta la alerta → Creas manualmente un ticket en Jira → Actualizas manualmente la status page → Avisas manualmente al ingeniero on-call
Tiempo de respuesta: 5-10 minutos

Con webhooks:

Salta la alerta → El webhook se dispara → Crea automáticamente un ticket en Jira → Actualiza la status page → Avisa al ingeniero on-call
Tiempo de respuesta: 10 segundos

Para infraestructura crítica, esa diferencia de 5 minutos evita el impacto en los clientes.

Cómo funcionan los webhooks#

Cuando tu sitio se cae y el monitoring lo detecta:

1. Nova Uptime monitoring service detects failure
2. Nova Uptime calls your webhook URL with incident data
3. Your server receives HTTP POST with:
   - Domain
   - Status
   - Time detected
   - Response time
   - Previous check result
4. Your system decides what to do
   - Create Jira ticket?
   - Page on-call?
   - Update status page?
   - Post to Slack?
5. Actions execute automatically

Configurar webhooks#

Paso 1: crea un receptor de webhooks#

Tu receptor de webhooks es un endpoint HTTP simple que recibe los datos del incidente.

Ejemplo: receptor de webhooks con Express.js

const express = require('express');
const app = express();

app.use(express.json());

app.post('/webhooks/uptime-incident', async (req, res) => {
  const { domain, status, detectedAt, responseTime } = req.body;

  console.log(`Incident detected: ${domain} is ${status}`);

  // Handle the incident
  await handleIncident({
    domain,
    status,
    detectedAt,
    responseTime
  });

  // Respond with 200 OK to acknowledge receipt
  res.json({ success: true });
});

app.listen(3000, () => {
  console.log('Webhook receiver listening on port 3000');
});

Paso 2: configura el webhook en Nova Uptime#

Inicia sesión en go.novauptime.com
Configuración del dominio → Webhooks
Haz clic en "Add Webhook"
Introduce la URL de tu endpoint: https://yourdomain.com/webhooks/uptime-incident
Selecciona los eventos que lo dispararán:
- ✅ Sitio caído
- ✅ Sitio recuperado
- ✅ Aviso de tiempo de respuesta
Guarda

Paso 3: prueba el webhook#

La mayoría de las herramientas tienen un botón "Test Webhook":

Haz clic en "Test" en la configuración del webhook
Tu endpoint recibe los datos de prueba
Verifica que tu sistema responda con 200 OK

Patrones reales de webhooks#

Patrón 1: crear tickets de incidentes#

Cuando un sitio se cae, crea automáticamente un ticket en Jira.

async function handleIncident({ domain, status, detectedAt }) {
  if (status === 'down') {
    // Create Jira ticket
    const ticket = await createJiraTicket({
      project: 'OPS',
      issueType: 'Incident',
      summary: `Production Incident: ${domain} is down`,
      description: `
        Domain: ${domain}
        Detected: ${detectedAt}
        Status: DOWN

        Actions:
        1. Check server status
        2. Review recent deployments
        3. Check error logs
      `,
      priority: 'Critical',
      labels: ['incident', 'production']
    });

    console.log(`Created ticket: ${ticket.key}`);
  }
}

Patrón 2: actualizar la status page#

Cuando ocurra un incidente, actualiza automáticamente tu status page pública.

async function handleIncident({ domain, status, detectedAt }) {
  if (status === 'down') {
    // Create incident on status page
    await createStatusPageIncident({
      name: `${domain} is Down`,
      status: 'investigating',
      body: `We're investigating an issue with ${domain}. More info coming soon.`,
      affectedComponents: [domain]
    });
  } else if (status === 'up') {
    // Resolve incident on status page
    await updateStatusPageIncident({
      status: 'resolved',
      body: `${domain} is now back online. We apologize for the inconvenience.`
    });
  }
}

Patrón 3: avisar al ingeniero on-call#

Envía un SMS a la persona on-call inmediatamente en incidentes críticos.

async function handleIncident({ domain, status, detectedAt }) {
  if (status === 'down') {
    // Get current on-call engineer from PagerDuty
    const oncall = await getOnCallEngineer();

    // Send SMS
    await sendSMS({
      to: oncall.phone,
      message: `CRITICAL: ${domain} is down. Incident ticket: JIRA-123`
    });

    // Also post to #incidents Slack channel
    await postToSlack({
      channel: '#incidents',
      text: `@${oncall.slackHandle}: ${domain} is down. See JIRA-123`
    });
  }
}

Patrón 4: guardar el historial de incidentes#

Registra todos los incidentes en tu base de datos para hacer analítica.

async function handleIncident({ domain, status, detectedAt, responseTime }) {
  // Store in database
  const incident = await Incident.create({
    domain,
    status,
    detectedAt,
    responseTime,
    createdAt: new Date(),
    handledAt: new Date(),
    ticketCreated: false,
    statusPageUpdated: false
  });

  console.log(`Stored incident: ${incident._id}`);

  // Later: calculate MTTR, uptime %, etc.
  await updateIncidentMetrics(domain);
}

Patrón 5: orquestación multiservicio#

Cuando un servicio falla, dispara acciones en múltiples plataformas.

async function handleIncident({ domain, status, detectedAt }) {
  if (status === 'down') {
    // Parallel actions: Don't wait for each to finish
    await Promise.all([
      createJiraTicket({ domain, status }),
      createStatusPageIncident({ domain }),
      pageOnCallEngineer({ domain }),
      postSlackAlert({ domain }),
      storeIncidentHistory({ domain, detectedAt }),
      triggerPostmortemWorkflow({ domain })
    ]);
  }
}

Patrones avanzados#

Patrón 6: lógica condicional según la severidad#

Distintas acciones para distintos niveles de severidad.

async function handleIncident({ domain, status, severity }) {
  if (severity === 'critical') {
    // Critical: Page everyone
    await pageOnCall({ priority: 'high' });
    await updateStatusPage({ status: 'major_outage' });
    await createJiraTicket({ priority: 'Critical' });
  } else if (severity === 'warning') {
    // Warning: Slack + Jira, no SMS
    await postSlackAlert({ channel: '#alerts' });
    await createJiraTicket({ priority: 'Medium' });
  } else if (severity === 'info') {
    // Info: Log only, no alert
    await storeIncidentHistory({ domain });
  }
}

Patrón 7: deduplicación#

Evita tickets/alertas duplicados si el mismo dominio falla varias veces.

async function handleIncident({ domain, status }) {
  if (status === 'down') {
    // Check if active incident already exists
    const activeIncident = await Incident.findOne({
      domain,
      status: 'active',
      createdAfter: new Date(Date.now() - 15 * 60 * 1000) // Last 15 mins
    });

    if (activeIncident) {
      // Incident already reported, just update
      activeIncident.lastSeen = new Date();
      await activeIncident.save();
      console.log(`Updated existing incident: ${activeIncident._id}`);
    } else {
      // New incident, create everything
      await createJiraTicket({ domain });
      await pageOnCall({ domain });
      // ... etc
    }
  }
}

Patrón 8: reintentar webhooks fallidos#

Si tu receptor de webhooks está caído, Nova Uptime debería reintentar.

Configuración en Nova Uptime:

Configuración del dominio → Webhooks
Haz clic en el webhook
Configuración avanzada → Política de reintentos
Activa: "Retry on failure"
Reintentos máximos: 3
Retraso entre reintentos: exponencial (5s, 10s, 20s)

Tu receptor de webhooks debe ser idempotente (seguro para llamarse varias veces):

// Good: Idempotent
async function handleIncident({ domain, status, eventId }) {
  // Check if already processed
  const processed = await ProcessedEvents.findOne({ eventId });
  if (processed) {
    return res.json({ success: true, cached: true });
  }

  // Process incident
  await doActualWork();

  // Record as processed
  await ProcessedEvents.create({ eventId, processedAt: new Date() });

  res.json({ success: true });
}

Ejemplos de integración con webhooks#

Integración 1: Zapier#

Si no quieres construir webhooks personalizados, usa Zapier:

Nova Uptime → Zapier → Slack/Jira/Email/etc.
No requiere programar
Limitaciones: menos control, añade latencia

Integración 2: GitHub Actions#

Ante un incidente, dispara una GitHub Action (por ejemplo, auto-scaling, rollback):

async function handleIncident({ domain, status }) {
  if (status === 'down') {
    // Trigger GitHub Actions workflow
    await triggerGitHubAction({
      repo: 'mycompany/infrastructure',
      workflow: 'incident-response.yml',
      inputs: {
        domain,
        action: 'scale-up'
      }
    });
  }
}

Integración 3: AWS Lambda#

Usa Lambda para gestionar webhooks sin servidores:

# AWS Lambda function
import json
import boto3

def lambda_handler(event, context):
    body = json.loads(event['body'])
    domain = body['domain']
    status = body['status']

    if status == 'down':
        # Auto-scale on AWS
        ec2 = boto3.client('ec2')
        ec2.start_instances(InstanceIds=['i-1234567890abcdef0'])

    return {
        'statusCode': 200,
        'body': json.dumps({'success': True})
    }

Seguridad de los webhooks#

Verificar la firma del webhook#

Nova Uptime firma cada webhook con HMAC-SHA256. Verifícalo antes de procesar:

const crypto = require('crypto');

app.post('/webhooks/uptime-incident', (req, res) => {
  const signature = req.headers['x-gum-signature'];
  const body = JSON.stringify(req.body);
  const secret = process.env.NOVAUPTIME_WEBHOOK_SECRET;

  // Compute expected signature
  const expected = crypto
    .createHmac('sha256', secret)
    .update(body)
    .digest('hex');

  if (signature !== expected) {
    console.error('Invalid webhook signature');
    return res.status(401).json({ error: 'Unauthorized' });
  }

  // Process webhook
  handleIncident(req.body);
  res.json({ success: true });
});

Rate limiting#

Tu receptor de webhooks debe limitar la tasa de llamadas:

const rateLimit = require('express-rate-limit');

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100 // max 100 requests per minute
});

app.post('/webhooks/uptime-incident', limiter, (req, res) => {
  // Handle webhook
});

Gestión de timeouts#

El receptor de webhooks debe responder rápido:

app.post('/webhooks/uptime-incident', async (req, res) => {
  // Respond immediately
  res.json({ success: true });

  // Do real work in background
  setTimeout(async () => {
    await handleIncident(req.body);
  }, 0);
});

Probar tus webhooks#

Método de prueba 1: pruebas locales con ngrok#

Arranca el receptor de webhooks local en localhost:3000
Ejecuta ngrok: ngrok http 3000
Obtén la URL pública: https://abc123.ngrok.io
Configúrala en Nova Uptime: https://abc123.ngrok.io/webhooks/uptime-incident
Haz clic en "Test" en Nova Uptime → ves la solicitud en la consola local

Método de prueba 2: tester de webhooks#

Usa webhook.site para hacer pruebas gratis:

Entra en webhook.site
Copia tu URL única
Configúrala en Nova Uptime como receptor de webhooks
Prueba → ves la solicitud en el dashboard de webhook.site

Monitorizar tus webhooks#

Controla la salud de tus webhooks:

async function monitorWebhookHealth() {
  const stats = await WebhookEvent.aggregate([
    {
      $group: {
        _id: null,
        totalEvents: { $sum: 1 },
        successCount: { $sum: { $cond: ['$success', 1, 0] } },
        failureCount: { $sum: { $cond: ['$success', 0, 1] } },
        avgResponseTime: { $avg: '$responseTime' }
      }
    }
  ]);

  const successRate = stats[0].successCount / stats[0].totalEvents;

  if (successRate < 0.95) {
    // Alert: Webhook success rate below 95%
    await alertSlack(`
      Webhook health: ${(successRate * 100).toFixed(1)}% success rate
      Failed events: ${stats[0].failureCount}
    `);
  }
}

Resumen: checklist de integración de webhooks#

✅ Crea el endpoint receptor de webhooks
✅ Configura el webhook en los ajustes de Nova Uptime
✅ Prueba el webhook con datos de ejemplo
✅ Verifica la firma del webhook (HMAC-SHA256)
✅ Implementa lógica de reintentos e idempotencia
✅ Añade rate limiting al endpoint del webhook
✅ Configura la gestión de timeouts en la respuesta
✅ Crea el flujo de incidentes (Jira + status page + Slack)
✅ Prueba con un incidente real o un fallo forzado
✅ Monitoriza la salud y la tasa de éxito de los webhooks
✅ Documenta los endpoints de webhooks para tu equipo

Empieza hoy#

Los webhooks transforman el monitoring: pasas de tener solo alertas a una respuesta totalmente automatizada ante incidentes.

Si usas Nova Uptime, ve a la configuración del dominio y añade tu primer webhook. Empieza simple: limítate a registrar los incidentes en tu base de datos. Después, añade integraciones una a una.

Documentación de webhooks: Documentación de la API de Nova Uptime

Webhooks e integraciones de monitoring de uptime: crea flujos personalizados