Prometheus问题记录1--ArgocdClusterConnectionDown误报

2025-08-25 运维 Prometheus, 监控 0 评论

背景

最近发现集成的zabbix/prometheus一直在报Argocd集群连接问题，但是在Argocd页面上看又是正常的，
原表达式

- alert: "Argocd Cluster Connection Down"
          expr: argocd_cluster_connection_status{status="down"} > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: Argocd Cluster Connection Unhealthy (instance {{ $labels.instance }})
            description: "Service {{ $labels.name }} Argocd Cluster Connection is down.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

解决办法

参考MR: https://github.com/argoproj/argo-cd/pull/7419
解析：argocd_cluster_connection_status以前返回0表示没问题，后来倒了一下，如果返回1说明没问题，所以导致了argocd升级之后一直返回0，而Prometheus的rule没有对应的升级，导致了误报

修改以后

name: Argocd Cluster Connection Down
expr: argocd_cluster_connection_status < 1
for: 5m
labels:
severity: warning
annotations:
description: Service {{ $labels.name }} Argocd Cluster Connection is down.
  VALUE = {{ $value }}
  LABELS = {{ $labels }}
summary: Argocd Cluster Connection Unhealthy (instance {{ $labels.instance }})

问题解决

本文链接： https://maydaychen.github.io/2025/08/25/Prometheus问题记录1-ArgocdClusterConnectionDown误报/

版权声明： 本博客所有文章除特别声明外，均采用 CC BY 4.0 CN协议许可协议。转载请注明出处！

MaydaychenDeveloper & Infra engineer

前Android/Vue开发，现Infra从业人员，主营监控/AWS