Prometheus Up & Running
  • Introduction
  • PART I -- Introduction
    • 1.What Is Prometheus?
      • What Is Monitoring?
        • A Brief and Incomplete History of Monitoring
        • Categories of Monitoring
      • Prometheus Architecture
        • Client Libraries
        • Exporters
        • Service Discovery
        • Scraping
        • Storage
        • Dashboards
        • Recording Rules and Alerts
        • Alert Management
        • Long-Term Storage
      • What Prometheus Is Not
    • 2.Getting Started with Prometheus
      • Running Prometheus
      • Using the Expression Browser
      • Running the Node Exporter
      • Alerting
  • Part II. Application Monitoring(建议直接跳至5.Labels)
    • 3.Instrumentation
      • A Simple Program
      • The Counter
        • Counting Exceptions
        • Counting Size
      • The Gauge
        • Using Gauges
        • Callbacks
      • The Summary
      • The Histogram
        • Buckets
      • Unit Testing Instrumentation
      • Approaching Instrumentation
        • What Should I Instrument?
        • How Much Should I Instrument?
        • What Should I Name My Metrics?
    • 4. Exposition
      • Python
        • WSGI
        • Twisted
        • Multiprocess with Gunicorn
      • Go
      • Java
        • HTTPServer
        • Servlet
      • Pushgateway
      • Bridges
      • Parsers
      • Exposition Format
        • Metric Types
        • Labels
        • Escaping
        • Timestamps
        • check metrics
    • 5. Labels
      • What Are Labels?
      • Instrumentation and Target Labels
      • Instrumentation
        • Metric
        • Multiple Labels
        • Child
      • Aggregating
      • Label Patterns
        • Enum
        • Info
      • When to Use Labels
        • Cardinality
    • 6. Dashboarding with Grafana
      • Installation
      • Data Source
      • Dashboards and Panels
        • Avoiding the Wall of Graphs
      • Graph Panel
        • Time Controls
      • Singlestat Panel
      • Table Panel
      • Template Variables
  • Part III. Infrastructure Monitoring
    • 7.Node Exporter
      • CPU Collector
      • Filesystem Collector
      • Diskstats Collector
      • Netdev Collector
      • Meminfo Collector
      • Hwmon Collector
      • Stat Collector
      • Uname Collector
      • Loadavg Collector
      • Textfile Collector
        • Using the Textfile Collector
        • Timestamps
    • 8.Service Discovery
      • Service Discovery Mechanisms
        • Static
        • File
        • Consul
        • EC2
      • Relabelling
        • Choosing What to Scrape
        • Target Labels
      • How to Scrape
        • metric_relabel_configs
        • Label Clashes and honor_labels
    • 9.Containers and Kubernetes
      • cAdvisor
        • CPU
        • Memory
        • Labels
      • Kubernetes
        • Running in Kubernetes
        • Service Discovery
        • kube-state-metrics
    • 10.Common Exporters
      • Consul
      • HAProxy
      • Grok Exporter
      • Blackbox
        • ICMP
        • TCP
        • HTTP
        • DNS
        • Prometheus Configuration
    • 11.Working with Other Monitoring Systems
      • Other Monitoring Systems
      • InfluxDB
      • StatsD
    • 12.Writing Exporters
      • Consul Telemetry
      • Custom Collectors
        • Labels
      • Guidelines
  • Part IV. PromQL
    • 13.Introduction to PromQL
      • Aggregation Basics
        • Gauge
        • Counter
        • Summary
        • Histogram
      • Selectors
        • Matchers
        • Instant Vector
        • Range Vector
        • Offset
      • HTTP API
        • query
        • query_range
    • 14.Aggregation Operators
      • Grouping
        • without
        • by
      • Operators
        • sum
        • count
        • avg
        • stddev and stdvar
        • min and max
        • topk and bottomk
        • quantile
        • count_values
    • 15.Binary Operators
      • Working with Scalars
        • Arithmetic Operators
        • Comparison Operators
      • Vector Matching
        • One-to-One
        • Many-to-One and group_left
        • Many-to-Many and Logical Operators
      • Operator Precedence
    • 16.Functions
      • Changing Type
        • vector
        • scalar
      • Math
        • abs
        • ln, log2, and log10
        • exp
        • sqrt
        • ceil and floor
        • round
        • clamp_max and clamp_min
      • Time and Date
        • time
        • minute, hour, day_of_week, day_of_month, days_in_month, month, and year
        • timestamp
      • Labels
        • label_replace
        • label_join
      • Missing Series and absent
      • Sorting with sort and sort_des
      • Histograms with histogram_quantile
      • Counters
        • rate
        • increase
        • irate
        • resets
      • Changing Gauges
        • changes
        • deriv
        • predict_linear
        • delta
        • idelta
        • holt_winters
      • Aggregation Over Time
    • 17.Recording Rules
      • Using Recording Rules
      • When to Use Recording Rules
        • Reducing Cardinality
        • Composing Range Vector Functions
        • Rules for APIs
        • How Not to Use Rules
      • Naming of Recording Rules
  • Part V. Alerting
    • 18.Alerting
      • Alerting Rules
        • for
        • Alert Labels
        • Annotations and Templates
        • What Are Good Alerts?
      • Configuring Alertmanagers
        • External Labels
    • 19.Alertmanager
      • Notification Pipeline
      • Configuration File
      • Alertmanager Web Interface
  • Part VI. Deployment
    • 20.Putting It All Together
      • Planning a Rollout
        • Growing Prometheus
      • Going Global with Federation
      • Long-Term Storage
      • Running Prometheus
        • Hardware
        • Configuration Management
        • Networks and Authentication
      • Planning for Failure
        • Alertmanager Clustering
        • Meta- and Cross-Monitoring
      • Managing Performance
        • Detecting a Problem
        • Finding Expensive Metrics and Targets
        • Reducing Load
        • Horizontal Sharding
      • Managing Change
      • Getting Help
Powered by GitBook
On this page

Was this helpful?

  1. PART I -- Introduction
  2. 2.Getting Started with Prometheus

Alerting

PreviousRunning the Node ExporterNextPart II. Application Monitoring(建议直接跳至5.Labels)

Last updated 6 years ago

Was this helpful?

告警有两个部分。 首先,向Prometheus添加警报规则(Alert rules),定义构成警报的逻辑和依据值。 其次,Alertmanager将触发警报转换为通知,例如电子邮件,页面和聊天消息。

先给prometheus添加告警规则后面手动触发下告警,在prometheus.yml 同目录创建文件rules.yml

groups:
- name: example
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 1m

然后在prometheus.yml 添加下面内容让prometheus去加载rules并配置prometheus使用alertmanager的信息

rule_files:
  - rules.yml
alerting:
  alertmanagers:
  - static_configs:
     - targets: ['localhost:9093']

然后重新运行prometheus后停掉node_exporter后查询可以up == 0 可以看到down了

在prometheus的web页面的Alerts可以看到有告警生成

告警在未达到for的时间上限内状态为pending 达到了时间就会切成firing 并被发送到alertmanager,这里我们部署下alertmanager来处理告警

$ wget https://github.com/prometheus/alertmanager/releases/download/v0.16.0/alertmanager-0.16.0.linux-amd64.tar.gz
$ tar -xzf alertmanager-*.linux-amd64.tar.gz
$ cd alertmanager-*.linux-amd64/

配置alertmanager,创建文件alertmanager.yml 写入下面信息。可以申请个163的邮箱开启pop3和smtp然后开启客户端密码即可。

global:
  resolve_timeout: 5m                               #处理超时时间,默认为5min
  smtp_smarthost: 'smtp.163.com:25'                # 邮箱smtp服务器代理
  smtp_from: '177xxxx7xx6@163.com'                  # 发送邮箱名称
  smtp_auth_username: '177xxxx7xx6@163.com'         # 邮箱账号
  smtp_auth_password: 'xxxxxxxxx'                # 邮箱密码或授权码
route:
   receiver: example-email
receivers:
  - name: example-email                              # 路由中对应的receiver名称
    email_configs:                                    # 邮箱配置
      - send_resolved: true                           #故障恢复的时候时候发邮件
        to: 'youraddress@xxx.com'

然后运行alertmanager

$ ./alertmanager
level=info ts=2019-01-18T03:23:25.618242308Z caller=main.go:174 msg="Starting Alertmanager" version="(version=0.15.3, branch=HEAD, revision=d4a7697cc90f8bce62efe7c44b63b542578ec0a1)"
level=info ts=2019-01-18T03:23:25.618299711Z caller=main.go:175 build_context="(go=go1.11.2, user=root@4ecc17c53d26, date=20181109-15:40:48)"
level=info ts=2019-01-18T03:23:25.620962633Z caller=cluster.go:155 component=cluster msg="setting advertise address explicitly" addr=172.25.0.4 port=9094
level=info ts=2019-01-18T03:23:25.622780056Z caller=main.go:322 msg="Loading configuration file" file=/etc/alertmanager/config.yml
level=info ts=2019-01-18T03:23:25.62283361Z caller=cluster.go:570 component=cluster msg="Waiting for gossip to settle..." interval=2s
level=info ts=2019-01-18T03:23:25.625547849Z caller=main.go:398 msg=Listening address=:9093
level=info ts=2019-01-18T03:23:27.623182959Z caller=cluster.go:595 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000272s
level=info ts=2019-01-18T03:23:35.624264613Z caller=cluster.go:587 component=cluster msg="gossip settled; proceeding" elapsed=10.001369295s

如果配置正确的话在你的邮件收件箱能看到如下图类似的告警邮件

这个基本设置让您对prometheus可以做的事情有所了解。 您可以向prometheus.yml添加更多target,您的警报也会自动为它们工作

在下一章中,我将重点介绍使用Prometheus的client library添加到自己的应用程序里

然后访问alertmanager的web页面

http://localhost:9093/