Configuration

Repairman has two scopes of configuration, internally it’s called a policy. Application policy is a default policy for each container, and a Regular Policy is a per-container policy that mixes Application policy + container specific modifications.

Example:

  • Application global policy has time between restarts equal to 180 and 3 maximum restarts
  • The container can modify some values, ex. will want to have 2 maximum restarts instead of 3 restarts

Reference

Parameters
in shell as docker env variable as a docker label description
–debug NONE NONE Console debugging mode
–interval CHECK_INTERVAL NONE How often in seconds to check all containers
–namespace NAMESPACE NONE Containers prefix (ex. compose env name)
–seconds-between-restarts DEFAULT_SECONDS_BETWEEN_RESTARTS org.riotkit.repairman.seconds_between_restarts Seconds to wait until next try
–frame-size-in-seconds DEFAULT_FRAME_SIZE org.riotkit.repairman.frame_size_in_seconds Frame size (time frame in which max restarts can occur)
–max-restarts-in-frame DEFAULT_MAX_RESTARTS_IN_FRAME org.riotkit.repairman.max_restarts_in_frame Maximum restarts in given time (frame)
–seconds-between-next-frame DEFAULT_SECONDS_BETWEEN_NEXT_FRAME org.riotkit.repairman.seconds_between_next_frame Time between frames (for longer wait)
–max-checks-to-give-up DEFAULT_MAX_CHECKS_TO_GIVE_UP org.riotkit.repairman.max_checks_to_give_up After this number, the service will not be monitored
–max-historic-entries DEFAULT_MAX_HISTORIC_ENTRIES org.riotkit.repairman.max_historic_entries Technically, how many events to remember
–enable-cleaning-duplicated-services ENABLE_CLEANING_DUPLICATED_SERVICES org.riotkit.repairman.enable_cleaning_duplicated_services Remove services with hash prefix created by compose
–enable-autoheal DEFAULT_ENABLE_AUTO_HEAL org.riotkit.repairman.enable_autoheal Enable healing of unhealthy and exited containers
–http-address HTTP_ADDRESS NONE Web server address ex. 0.0.0.0 or 127.0.0.1
–http-port HTTP_PORT NONE Web server port ex. 80 or 8080
–http-prefix HTTP_PREFIX NONE Web server path prefix ex. /something or /SgbaCaVyewq
–notify-url DEFAULT_NOTIFY_URL org.riotkit.repairman.notify_url Slack/Mattermost notification url
–notify-level DEFAULT_NOTIFY_LEVEL org.riotkit.repairman.notify_level Notify level ex. DEBUG, INFO, WARNING
–db-path DB_PATH NONE Path to sqlite3 database or “:memory:”
NONE TZ NONE Docker container timezone ex. Europe/Warsaw
NONE DOCKER_HOST NONE Docker host address or socket
NONE DOCKER_TLS_VERIFY NONE Verify the host against a CA certificate.
NONE DOCKER_CERT_PATH NONE Path to directory with certificates

Concept of frames and timing

Frame is a time defined by –frame-size-in-seconds, ex. 5 minutes. In this time given service can be restarted only –max-restarts-in-frame, if it still fails, then it needs to wait –seconds-between-next-frame to next restart try.

Cleaning up duplicated services

When a v2tec/watchtower container is updating a service its starting a container with new image version. After compose up, the container is created twice. The –enable-cleaning-duplicated-services resolves this problem by stopping and removing a container with hash prefix.

Changes between restarts

Repairman uses SQLite3, by default a in-memory database is used - :memory:, but it is not a problem to use a persistent database by changing the –db-path

Notifications

Notifications can be sent to Slack/Mattermost. There are three levels of verbosity. Do not confuse with –debug

Verbosity levels:

  • DEBUG: Each container restart info, maximum restarts limit reached in frame, multiple restart failure info, configuration error
  • INFO: Multiple restart failure info, configuration error, maximum restarts limit reached in frame
  • WARNING: Configuration error, maximum restarts limit reached in frame