Por que webhooks são importantes para monitoramento#

Alertas por e-mail chegam em algum momento. Alertas no Slack aparecem em um canal. Mas webhooks deixam você executar ações quando incidentes acontecem.

Sem webhooks:

Alerta dispara → Você clica manualmente para criar um ticket no Jira → Atualiza a página de status manualmente → Aciona o engenheiro de plantão manualmente
Tempo de resposta: 5 a 10 minutos

Com webhooks:

Alerta dispara → Webhook é acionado → Cria ticket no Jira automaticamente → Atualiza a página de status → Aciona o engenheiro de plantão
Tempo de resposta: 10 segundos

Para infraestrutura crítica, essa diferença de 5 minutos evita impacto no cliente.

Como webhooks funcionam#

Quando seu site cai e o monitoramento detecta isso:

1. Nova Uptime monitoring service detects failure
2. Nova Uptime calls your webhook URL with incident data
3. Your server receives HTTP POST with:
   - Domain
   - Status
   - Time detected
   - Response time
   - Previous check result
4. Your system decides what to do
   - Create Jira ticket?
   - Page on-call?
   - Update status page?
   - Post to Slack?
5. Actions execute automatically

Configurando webhooks#

Passo 1: Crie um receptor de webhook#

Seu receptor de webhook é um endpoint HTTP simples que recebe os dados do incidente.

Exemplo: receptor de webhook em Express.js

const express = require('express');
const app = express();

app.use(express.json());

app.post('/webhooks/uptime-incident', async (req, res) => {
  const { domain, status, detectedAt, responseTime } = req.body;

  console.log(`Incident detected: ${domain} is ${status}`);

  // Handle the incident
  await handleIncident({
    domain,
    status,
    detectedAt,
    responseTime
  });

  // Respond with 200 OK to acknowledge receipt
  res.json({ success: true });
});

app.listen(3000, () => {
  console.log('Webhook receiver listening on port 3000');
});

Passo 2: Configure o webhook no Nova Uptime#

Faça login em go.novauptime.com
Configurações do domínio → Webhooks
Clique em "Add Webhook"
Informe a URL do seu endpoint: https://yourdomain.com/webhooks/uptime-incident
Selecione os eventos que devem disparar:
- ✅ Site fora do ar
- ✅ Site recuperado
- ✅ Aviso de tempo de resposta
Salve

Passo 3: Teste o webhook#

A maioria das ferramentas tem um botão "Test Webhook":

Clique em "Test" nas configurações do webhook
Seu endpoint recebe os dados de teste
Verifique se seu sistema responde com 200 OK

Padrões de webhook do mundo real#

Padrão 1: Criar tickets de incidente#

Quando o site cai, crie um ticket no Jira automaticamente.

async function handleIncident({ domain, status, detectedAt }) {
  if (status === 'down') {
    // Create Jira ticket
    const ticket = await createJiraTicket({
      project: 'OPS',
      issueType: 'Incident',
      summary: `Production Incident: ${domain} is down`,
      description: `
        Domain: ${domain}
        Detected: ${detectedAt}
        Status: DOWN

        Actions:
        1. Check server status
        2. Review recent deployments
        3. Check error logs
      `,
      priority: 'Critical',
      labels: ['incident', 'production']
    });

    console.log(`Created ticket: ${ticket.key}`);
  }
}

Padrão 2: Atualizar a página de status#

Quando um incidente acontece, atualize sua página de status pública automaticamente.

async function handleIncident({ domain, status, detectedAt }) {
  if (status === 'down') {
    // Create incident on status page
    await createStatusPageIncident({
      name: `${domain} is Down`,
      status: 'investigating',
      body: `We're investigating an issue with ${domain}. More info coming soon.`,
      affectedComponents: [domain]
    });
  } else if (status === 'up') {
    // Resolve incident on status page
    await updateStatusPageIncident({
      status: 'resolved',
      body: `${domain} is now back online. We apologize for the inconvenience.`
    });
  }
}

Padrão 3: Acionar o engenheiro de plantão#

Envie SMS para o responsável de plantão imediatamente em incidentes críticos.

async function handleIncident({ domain, status, detectedAt }) {
  if (status === 'down') {
    // Get current on-call engineer from PagerDuty
    const oncall = await getOnCallEngineer();

    // Send SMS
    await sendSMS({
      to: oncall.phone,
      message: `CRITICAL: ${domain} is down. Incident ticket: JIRA-123`
    });

    // Also post to #incidents Slack channel
    await postToSlack({
      channel: '#incidents',
      text: `@${oncall.slackHandle}: ${domain} is down. See JIRA-123`
    });
  }
}

Padrão 4: Armazenar histórico de incidentes#

Registre todos os incidentes no seu banco para análise.

async function handleIncident({ domain, status, detectedAt, responseTime }) {
  // Store in database
  const incident = await Incident.create({
    domain,
    status,
    detectedAt,
    responseTime,
    createdAt: new Date(),
    handledAt: new Date(),
    ticketCreated: false,
    statusPageUpdated: false
  });

  console.log(`Stored incident: ${incident._id}`);

  // Later: calculate MTTR, uptime %, etc.
  await updateIncidentMetrics(domain);
}

Padrão 5: Orquestração multi-serviço#

Quando um serviço falha, dispare ações em várias plataformas.

async function handleIncident({ domain, status, detectedAt }) {
  if (status === 'down') {
    // Parallel actions: Don't wait for each to finish
    await Promise.all([
      createJiraTicket({ domain, status }),
      createStatusPageIncident({ domain }),
      pageOnCallEngineer({ domain }),
      postSlackAlert({ domain }),
      storeIncidentHistory({ domain, detectedAt }),
      triggerPostmortemWorkflow({ domain })
    ]);
  }
}

Padrões avançados#

Padrão 6: Lógica condicional baseada em severidade#

Ações diferentes para níveis de severidade diferentes.

async function handleIncident({ domain, status, severity }) {
  if (severity === 'critical') {
    // Critical: Page everyone
    await pageOnCall({ priority: 'high' });
    await updateStatusPage({ status: 'major_outage' });
    await createJiraTicket({ priority: 'Critical' });
  } else if (severity === 'warning') {
    // Warning: Slack + Jira, no SMS
    await postSlackAlert({ channel: '#alerts' });
    await createJiraTicket({ priority: 'Medium' });
  } else if (severity === 'info') {
    // Info: Log only, no alert
    await storeIncidentHistory({ domain });
  }
}

Padrão 7: Deduplicação#

Evite tickets/alertas duplicados se o mesmo domínio falhar várias vezes.

async function handleIncident({ domain, status }) {
  if (status === 'down') {
    // Check if active incident already exists
    const activeIncident = await Incident.findOne({
      domain,
      status: 'active',
      createdAfter: new Date(Date.now() - 15 * 60 * 1000) // Last 15 mins
    });

    if (activeIncident) {
      // Incident already reported, just update
      activeIncident.lastSeen = new Date();
      await activeIncident.save();
      console.log(`Updated existing incident: ${activeIncident._id}`);
    } else {
      // New incident, create everything
      await createJiraTicket({ domain });
      await pageOnCall({ domain });
      // ... etc
    }
  }
}

Padrão 8: Reenvio de webhooks com falha#

Se seu receptor de webhook estiver fora do ar, o Nova Uptime deve tentar de novo.

Configuração no Nova Uptime:

Configurações do domínio → Webhooks
Clique no webhook
Configurações avançadas → Política de retry
Habilite: "Retry on failure"
Máximo de tentativas: 3
Intervalo: Exponencial (5s, 10s, 20s)

Seu receptor de webhook deve ser idempotente (seguro para chamar várias vezes):

// Good: Idempotent
async function handleIncident({ domain, status, eventId }) {
  // Check if already processed
  const processed = await ProcessedEvents.findOne({ eventId });
  if (processed) {
    return res.json({ success: true, cached: true });
  }

  // Process incident
  await doActualWork();

  // Record as processed
  await ProcessedEvents.create({ eventId, processedAt: new Date() });

  res.json({ success: true });
}

Exemplos de integração via webhook#

Integração 1: Zapier#

Se você não quer construir webhooks personalizados, use o Zapier:

Nova Uptime → Zapier → Slack/Jira/E-mail/etc.
Sem necessidade de código
Limitações: Menos controle, adiciona latência

Integração 2: GitHub Actions#

Em um incidente, dispare uma GitHub Action (por exemplo, auto-scaling, rollback):

async function handleIncident({ domain, status }) {
  if (status === 'down') {
    // Trigger GitHub Actions workflow
    await triggerGitHubAction({
      repo: 'mycompany/infrastructure',
      workflow: 'incident-response.yml',
      inputs: {
        domain,
        action: 'scale-up'
      }
    });
  }
}

Integração 3: AWS Lambda#

Use o Lambda para tratamento de webhooks serverless:

# AWS Lambda function
import json
import boto3

def lambda_handler(event, context):
    body = json.loads(event['body'])
    domain = body['domain']
    status = body['status']

    if status == 'down':
        # Auto-scale on AWS
        ec2 = boto3.client('ec2')
        ec2.start_instances(InstanceIds=['i-1234567890abcdef0'])

    return {
        'statusCode': 200,
        'body': json.dumps({'success': True})
    }

Segurança de webhooks#

Verifique a assinatura do webhook#

O Nova Uptime assina cada webhook com HMAC-SHA256. Verifique antes de processar:

const crypto = require('crypto');

app.post('/webhooks/uptime-incident', (req, res) => {
  const signature = req.headers['x-gum-signature'];
  const body = JSON.stringify(req.body);
  const secret = process.env.NOVAUPTIME_WEBHOOK_SECRET;

  // Compute expected signature
  const expected = crypto
    .createHmac('sha256', secret)
    .update(body)
    .digest('hex');

  if (signature !== expected) {
    console.error('Invalid webhook signature');
    return res.status(401).json({ error: 'Unauthorized' });
  }

  // Process webhook
  handleIncident(req.body);
  res.json({ success: true });
});

Rate limiting#

Seu receptor de webhook deve aplicar rate limit nas chamadas:

const rateLimit = require('express-rate-limit');

const limiter = rateLimit({
  windowMs: 60 * 1000, // 1 minute
  max: 100 // max 100 requests per minute
});

app.post('/webhooks/uptime-incident', limiter, (req, res) => {
  // Handle webhook
});

Tratamento de timeout#

O receptor de webhook deve responder rapidamente:

app.post('/webhooks/uptime-incident', async (req, res) => {
  // Respond immediately
  res.json({ success: true });

  // Do real work in background
  setTimeout(async () => {
    await handleIncident(req.body);
  }, 0);
});

Testando seus webhooks#

Método de teste 1: Teste local com ngrok#

Inicie o receptor de webhook local em localhost:3000
Execute o ngrok: ngrok http 3000
Pegue a URL pública: https://abc123.ngrok.io
Configure no Nova Uptime: https://abc123.ngrok.io/webhooks/uptime-incident
Clique em "Test" no Nova Uptime → Veja a requisição no console local

Método de teste 2: Webhook Tester#

Use o webhook.site para testar grátis:

Acesse webhook.site
Copie sua URL única
Configure no Nova Uptime como receptor de webhook
Teste → Veja a requisição no painel do webhook.site

Monitorando seus webhooks#

Acompanhe a saúde dos webhooks:

async function monitorWebhookHealth() {
  const stats = await WebhookEvent.aggregate([
    {
      $group: {
        _id: null,
        totalEvents: { $sum: 1 },
        successCount: { $sum: { $cond: ['$success', 1, 0] } },
        failureCount: { $sum: { $cond: ['$success', 0, 1] } },
        avgResponseTime: { $avg: '$responseTime' }
      }
    }
  ]);

  const successRate = stats[0].successCount / stats[0].totalEvents;

  if (successRate < 0.95) {
    // Alert: Webhook success rate below 95%
    await alertSlack(`
      Webhook health: ${(successRate * 100).toFixed(1)}% success rate
      Failed events: ${stats[0].failureCount}
    `);
  }
}

Resumo: Checklist de integração com webhooks#

✅ Crie o endpoint receptor de webhook
✅ Configure o webhook nas configurações do Nova Uptime
✅ Teste o webhook com dados de exemplo
✅ Verifique a assinatura do webhook (HMAC-SHA256)
✅ Implemente lógica de retry e idempotência
✅ Adicione rate limiting ao endpoint do webhook
✅ Configure o tratamento de timeout de resposta
✅ Crie o fluxo de incidentes (Jira + página de status + Slack)
✅ Teste com um incidente real ou falha forçada
✅ Monitore a saúde e a taxa de sucesso dos webhooks
✅ Documente os endpoints de webhook para o time

Comece hoje#

Webhooks transformam o monitoramento de algo só com alertas em uma resposta a incidentes totalmente automatizada.

Se você usa o Nova Uptime, vá nas configurações do domínio e adicione seu primeiro webhook. Comece simples: apenas registre os incidentes no seu banco de dados. Depois adicione integrações uma de cada vez.

Documentação de webhooks: Documentação da API do Nova Uptime

Webhooks e Integrações de Monitoramento de Uptime: Crie Fluxos Personalizados