No description
Find a file
Cédric Villemain b6efd330f5 [FIX] hostname was used to build the metrics filename
Changed to use the hashed hostname instead.
If the infosec class of get_host_name_hashed() is changed to restricted
then name is left empty.

This is a subtle breakage of functions layers: we do not expect to use
get_ functions outisde of collect_ functions but it's very convenient
here and is accepted as an exception.
2024-07-10 17:27:35 +02:00
docs Update parameter name from --change-policy to --update-policy 2024-07-10 17:27:21 +02:00
scenarios.d/foundation_public Adding initial code for pg_benchmark collect 2024-06-27 14:38:26 +02:00
test [FIX] hostname was used to build the metrics filename 2024-07-10 17:27:35 +02:00
Dockerfile Adding initial code for pg_benchmark collect 2024-06-27 14:38:26 +02:00
INSTALL.md Reduce complexity around storage 2024-07-10 17:27:07 +02:00
LICENSE Adding initial code for pg_benchmark collect 2024-06-27 14:38:26 +02:00
Makefile Adding initial code for pg_benchmark collect 2024-06-27 14:38:26 +02:00
pg_benchmark.sh [FIX] hostname was used to build the metrics filename 2024-07-10 17:27:35 +02:00
README.md Reduce complexity around storage 2024-07-10 17:27:07 +02:00

Sommaire

pg_benchmark

pg_benchmark is designed to help in executing benchmarks.

A benchmark is a set of information which is collected and analyzed in order to report conformance or state of a system.

Benchmarks and their processing are registered in pg_benchmark as "scenarios".

pg_benchmark is released under the PostgreSQL License.

pg_benchmark main repository is located at https://git.data-bene.io/PostgreSQL/pg_benchmark.

Purpose

pg_benchmark purpose is to provide a convenient tool to run user defined scenario, focusing on PostgreSQL and associated technologies.

There is a fundamental feature that scenarios MUST respect: privacy first, by default user data MUST NOT be inspected. pg_benchmark provides helpers to correctly define what scenario are doing and are allowed to do.

This allows to execute pg_benchmark even in restricted environment (health and finance data for example).

Install

See INSTALL.md for installation instructions and guidance.

Collecting data

pg_benchmark collect only information of technical interest and keep away from private or sensitive information as much as possible.

These collected information may be written to one or more files which we call a collection, whatever the number of files.

At the end of the collecting process, an archive (compressed tar) of the collection is generated.

Collected data

A collection is composed of 2 parts:

  • benchmark metadata,
  • benchmark data.

You can evaluate what is collected:

  • commands executed using verbosity and dry-run mode: ./pg_benchmark.sh --dry-run -vv
  • information collected using verbosity: ./pg_benchmark.sh -vv

Benchmark metadata

  • Identification of the collection

    • collection time
    • host, ip, system-uuid
    • PostgreSQL cluster name
  • Basic metrics to get an idea of what the server is:

    • Physical or Virtual machine
    • CPU, RAM, Disk Sizes
    • PostgreSQL cluster size, shared buffers, max_connections

Note: the connection string conninfo is never included in the collection metadata.

Benchmark data

pg_benchmark is able to collect the following items.

OS (Linux)

Currently, we collect at most:

  • OS name and release
  • /proc/cpuinfo
  • /proc/meminfo, transparent huge pages
  • sysctl

Note: this information is collected as is (i.e., without being redacted).

PostgreSQL

Currently, we collect at most:

  • Server version
  • Grand unified configuration
  • tablespaces
  • Overwritten GUC for roles in databases
  • Statistic tables (pg_stat/io_*), excluding pg_stats and pg_statistic

Running the tool

We encourage you to try man -l docs/pg_benchmark.1 to know the offered arguments.

We advise to run this tool on the PostgreSQL server because OS level metrics are collected.

Scenarios

Foundation (default)

The goal is to collect data to ensure PostgreSQL is installed and configured in a way where data durability is not at risk. Backup plan and procedure are not checked here.

It is expected to be run once only but new run can be executed after hardware, system or software upgrade/updates.

pg_benchmark

pg_benchmark collect

pg_benchmark collect --scenario=foundation

Storage

The collected data will be written in:

hashedhostname_scenario_timestamp.collection (and inside folder of the same name if needed).

Default value for store (--store) is the current directory.

Connection string

Thanks to «--conninfo» we can provide a connection string to access PostgreSQL. Please read PostgreSQL [documentation]https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING for more information.

Collecting

The simplest form to execute a «foundation» scenario is:

pg_benchmark

The help can also be useful:

pg_benchmark -h
pg_benchmark --help