Prometheus Up & Running
  • Introduction
  • PART I -- Introduction
    • 1.What Is Prometheus?
      • What Is Monitoring?
        • A Brief and Incomplete History of Monitoring
        • Categories of Monitoring
      • Prometheus Architecture
        • Client Libraries
        • Exporters
        • Service Discovery
        • Scraping
        • Storage
        • Dashboards
        • Recording Rules and Alerts
        • Alert Management
        • Long-Term Storage
      • What Prometheus Is Not
    • 2.Getting Started with Prometheus
      • Running Prometheus
      • Using the Expression Browser
      • Running the Node Exporter
      • Alerting
  • Part II. Application Monitoring(建议直接跳至5.Labels)
    • 3.Instrumentation
      • A Simple Program
      • The Counter
        • Counting Exceptions
        • Counting Size
      • The Gauge
        • Using Gauges
        • Callbacks
      • The Summary
      • The Histogram
        • Buckets
      • Unit Testing Instrumentation
      • Approaching Instrumentation
        • What Should I Instrument?
        • How Much Should I Instrument?
        • What Should I Name My Metrics?
    • 4. Exposition
      • Python
        • WSGI
        • Twisted
        • Multiprocess with Gunicorn
      • Go
      • Java
        • HTTPServer
        • Servlet
      • Pushgateway
      • Bridges
      • Parsers
      • Exposition Format
        • Metric Types
        • Labels
        • Escaping
        • Timestamps
        • check metrics
    • 5. Labels
      • What Are Labels?
      • Instrumentation and Target Labels
      • Instrumentation
        • Metric
        • Multiple Labels
        • Child
      • Aggregating
      • Label Patterns
        • Enum
        • Info
      • When to Use Labels
        • Cardinality
    • 6. Dashboarding with Grafana
      • Installation
      • Data Source
      • Dashboards and Panels
        • Avoiding the Wall of Graphs
      • Graph Panel
        • Time Controls
      • Singlestat Panel
      • Table Panel
      • Template Variables
  • Part III. Infrastructure Monitoring
    • 7.Node Exporter
      • CPU Collector
      • Filesystem Collector
      • Diskstats Collector
      • Netdev Collector
      • Meminfo Collector
      • Hwmon Collector
      • Stat Collector
      • Uname Collector
      • Loadavg Collector
      • Textfile Collector
        • Using the Textfile Collector
        • Timestamps
    • 8.Service Discovery
      • Service Discovery Mechanisms
        • Static
        • File
        • Consul
        • EC2
      • Relabelling
        • Choosing What to Scrape
        • Target Labels
      • How to Scrape
        • metric_relabel_configs
        • Label Clashes and honor_labels
    • 9.Containers and Kubernetes
      • cAdvisor
        • CPU
        • Memory
        • Labels
      • Kubernetes
        • Running in Kubernetes
        • Service Discovery
        • kube-state-metrics
    • 10.Common Exporters
      • Consul
      • HAProxy
      • Grok Exporter
      • Blackbox
        • ICMP
        • TCP
        • HTTP
        • DNS
        • Prometheus Configuration
    • 11.Working with Other Monitoring Systems
      • Other Monitoring Systems
      • InfluxDB
      • StatsD
    • 12.Writing Exporters
      • Consul Telemetry
      • Custom Collectors
        • Labels
      • Guidelines
  • Part IV. PromQL
    • 13.Introduction to PromQL
      • Aggregation Basics
        • Gauge
        • Counter
        • Summary
        • Histogram
      • Selectors
        • Matchers
        • Instant Vector
        • Range Vector
        • Offset
      • HTTP API
        • query
        • query_range
    • 14.Aggregation Operators
      • Grouping
        • without
        • by
      • Operators
        • sum
        • count
        • avg
        • stddev and stdvar
        • min and max
        • topk and bottomk
        • quantile
        • count_values
    • 15.Binary Operators
      • Working with Scalars
        • Arithmetic Operators
        • Comparison Operators
      • Vector Matching
        • One-to-One
        • Many-to-One and group_left
        • Many-to-Many and Logical Operators
      • Operator Precedence
    • 16.Functions
      • Changing Type
        • vector
        • scalar
      • Math
        • abs
        • ln, log2, and log10
        • exp
        • sqrt
        • ceil and floor
        • round
        • clamp_max and clamp_min
      • Time and Date
        • time
        • minute, hour, day_of_week, day_of_month, days_in_month, month, and year
        • timestamp
      • Labels
        • label_replace
        • label_join
      • Missing Series and absent
      • Sorting with sort and sort_des
      • Histograms with histogram_quantile
      • Counters
        • rate
        • increase
        • irate
        • resets
      • Changing Gauges
        • changes
        • deriv
        • predict_linear
        • delta
        • idelta
        • holt_winters
      • Aggregation Over Time
    • 17.Recording Rules
      • Using Recording Rules
      • When to Use Recording Rules
        • Reducing Cardinality
        • Composing Range Vector Functions
        • Rules for APIs
        • How Not to Use Rules
      • Naming of Recording Rules
  • Part V. Alerting
    • 18.Alerting
      • Alerting Rules
        • for
        • Alert Labels
        • Annotations and Templates
        • What Are Good Alerts?
      • Configuring Alertmanagers
        • External Labels
    • 19.Alertmanager
      • Notification Pipeline
      • Configuration File
      • Alertmanager Web Interface
  • Part VI. Deployment
    • 20.Putting It All Together
      • Planning a Rollout
        • Growing Prometheus
      • Going Global with Federation
      • Long-Term Storage
      • Running Prometheus
        • Hardware
        • Configuration Management
        • Networks and Authentication
      • Planning for Failure
        • Alertmanager Clustering
        • Meta- and Cross-Monitoring
      • Managing Performance
        • Detecting a Problem
        • Finding Expensive Metrics and Targets
        • Reducing Load
        • Horizontal Sharding
      • Managing Change
      • Getting Help
Powered by GitBook
On this page

Was this helpful?

  1. PART I -- Introduction
  2. 1.What Is Prometheus?

What Is Monitoring?

在中学,我的一位老师告诉我们,如果你问十位经济学家经济学意味着什么,你会得到十一个答案。 监控同样的难以达成共识。 当我告诉别人我做了什么时,人们认为我的工作需要注意工厂的温度,员工监控,以及我在那里查找谁在工作时间访问Facebook,甚至在网络上检测入侵者。

Prometheus并不是为了做任何这些事情而产生的。它是为帮助软件开发人员和管理员操作生产计算机系统而构建的。而是例如应用程序,工具,数据库和热门网站的网络这些方面。那么在这种情况下监控是什么?我喜欢将这种对计算机系统的操作监控范围缩小到四个方面:

  • 告警(Alert): 你想监控到啥时候出错时最重要的,你希望监控系统能通知对应的人来查看

  • 调试(Debugging): 现在已经通知到了人员,他们需要调查以确定根本原因并最终解决问题所在。

  • 趋势(Trending): 告警和调试通常发生在几分钟到几小时的时间刻度上。 虽然不那么紧急,但是能够看到您的系统如何被使用以及随时间变化也是有用的。 趋势可以用于设计决策和流程,例如容量规划。

  • Plumbing: 当你拿着一把锤子时,所有东西都变得像钉子。 在一天结束时,所有监控系统都是数据处理流水线。 有时,将监控系统的一部分用于其他目的更方便,而不是构建定制的解决方案。 这不是严格的监控,但在实践中很常见,所以我喜欢把它包括在内

根据您的谈话对象和他们的背景,他们可能只考虑其中一些是监控。 这引发了许多关于围绕圈子进行监控的讨论,让每个人都感到沮丧。 为了帮助您了解其他人的来源,我将简要介绍一下监控的历史。

Previous1.What Is Prometheus?NextA Brief and Incomplete History of Monitoring

Last updated 6 years ago

Was this helpful?