sysconfig/nagios3.git
15 months agovault: Check for recovery of the salt/vault integration more often
Alex Dehnert [Sat, 25 Nov 2023 00:02:57 +0000 (00:02 +0000)]
vault: Check for recovery of the salt/vault integration more often

16 months agoAdd new hosts to hostgroups, remove shut down ones
Alex Dehnert [Mon, 9 Oct 2023 04:32:03 +0000 (04:32 +0000)]
Add new hosts to hostgroups, remove shut down ones

16 months agomasada: Start monitoring it
Alex Dehnert [Mon, 9 Oct 2023 04:30:58 +0000 (04:30 +0000)]
masada: Start monitoring it

Long past time -- not sure why I missed it. Really we should make sure the KDC
is operational, but pingable is a start.

16 months agoolinda: olinda is shut down, so disable most notifications
Alex Dehnert [Mon, 9 Oct 2023 04:30:20 +0000 (04:30 +0000)]
olinda: olinda is shut down, so disable most notifications

16 months agobots: Tweak what's monitored
Alex Dehnert [Mon, 9 Oct 2023 04:27:48 +0000 (04:27 +0000)]
bots: Tweak what's monitored

I reinstalled bots, and it prompted me to look closer at monitoring of it. Some
old services no longer work (in some cases I think because Hangouts is shut
down ish, in others perhaps the migration didn't work well), so disable
notifications of those. I've also added some new services, so monitor them.

16 months agoPrefer hostnames over IPs
Alex Dehnert [Mon, 9 Oct 2023 04:26:33 +0000 (04:26 +0000)]
Prefer hostnames over IPs

I feel like I flip-flop on this every couple years, and I'm not entirely sure
what prompted this one (maybe the question of how to reach bots?), but probably
I had a reason.

16 months agoAdd monitoring of new hosts
Alex Dehnert [Mon, 9 Oct 2023 04:33:30 +0000 (04:33 +0000)]
Add monitoring of new hosts

Monitoring of augsburg is very basic, but chankillo is more reasonable (and
heavily based off olinda)

16 months agoAllow some services to have public notifications and others private
Alex Dehnert [Mon, 9 Oct 2023 04:23:57 +0000 (04:23 +0000)]
Allow some services to have public notifications and others private

17 months agoReplication is being checked locally, so apply to localhost
Alex Dehnert [Tue, 12 Sep 2023 18:24:00 +0000 (18:24 +0000)]
Replication is being checked locally, so apply to localhost

Otherwise standing up a monitoring service on chankillo led to misleading
outage messages claiming that olinda was broken. Citing "localhost" leaves you
to figure out which host it is, but at least it's obviously unclear.

18 months agoSwitch to instanced personals, not classed, for notifications
Alex Dehnert [Sat, 5 Aug 2023 20:24:56 +0000 (20:24 +0000)]
Switch to instanced personals, not classed, for notifications

Zulip doesn't support classed personals, and these days I mostly read on Zulip.

19 months agoIgnore (empty) htdigest.users for now adehnert-test-d
Alex Dehnert [Sat, 15 Jul 2023 02:29:55 +0000 (02:29 +0000)]
Ignore (empty) htdigest.users for now

19 months agoMerge remote-tracking branch 'origin/master' into nagios4 nagios4
Alex Dehnert [Sun, 9 Jul 2023 05:05:03 +0000 (05:05 +0000)]
Merge remote-tracking branch 'origin/master' into nagios4

19 months agonagios4: Update Apache config
Alex Dehnert [Sun, 9 Jul 2023 05:03:42 +0000 (05:03 +0000)]
nagios4: Update Apache config

20 months agoAdd monitoring of backups
Alex Dehnert [Sun, 11 Jun 2023 19:06:51 +0000 (15:06 -0400)]
Add monitoring of backups

21 months agoAdd new stylesheets
Alex Dehnert [Sat, 27 May 2023 18:33:20 +0000 (18:33 +0000)]
Add new stylesheets

21 months agoUse dehnerts.com hostnames, not mit.edu
Alex Dehnert [Sat, 27 May 2023 06:35:52 +0000 (06:35 +0000)]
Use dehnerts.com hostnames, not mit.edu

We trust the SSH CA for dehnerts.com, not mit.edu, so this avoids host
key verification failed errors.

I think the motivation not to do this was DNS downtime, but hopefully we
can solve that with redundant DNS.

21 months agoMore nagios4 updates
Alex Dehnert [Sat, 27 May 2023 06:35:33 +0000 (06:35 +0000)]
More nagios4 updates

21 months agonagios4: Add new objects config, remove old extinfo
Alex Dehnert [Fri, 26 May 2023 07:58:11 +0000 (07:58 +0000)]
nagios4: Add new objects config, remove old extinfo

3 years agosalt: Update Vault check to run elsewhere
Alex Dehnert [Mon, 4 Oct 2021 15:19:42 +0000 (11:19 -0400)]
salt: Update Vault check to run elsewhere

Also, make it run less frequently (we're looking for expiry more than downtime)
and have a higher timeout (it seems to frequently take more than ten seconds).

3 years agowieliczka: Add a check that Salt can talk to Vault
Alex Dehnert [Sun, 3 Oct 2021 02:41:34 +0000 (22:41 -0400)]
wieliczka: Add a check that Salt can talk to Vault

3 years agoroost-api: Add zephyr/zulip bridge monitoring
Alex Dehnert [Mon, 27 Sep 2021 02:59:44 +0000 (22:59 -0400)]
roost-api: Add zephyr/zulip bridge monitoring

3 years agoxidi: Add xidi (adehnert-pi4) monitoring
Alex Dehnert [Mon, 27 Sep 2021 02:59:27 +0000 (22:59 -0400)]
xidi: Add xidi (adehnert-pi4) monitoring

3 years agovault: Add a check for seal status
Alex Dehnert [Sun, 11 Jul 2021 21:40:38 +0000 (17:40 -0400)]
vault: Add a check for seal status

3 years agovault: Check that the vault server is responding with good cert
Alex Dehnert [Fri, 9 Jul 2021 00:23:49 +0000 (20:23 -0400)]
vault: Check that the vault server is responding with good cert

3 years agoroost-api: Add check for HTTPS service
Alex Dehnert [Thu, 29 Apr 2021 00:21:34 +0000 (20:21 -0400)]
roost-api: Add check for HTTPS service

3 years agoSend personal zephyrs, now that there's more content
Alex Dehnert [Sat, 17 Apr 2021 07:46:00 +0000 (03:46 -0400)]
Send personal zephyrs, now that there's more content

3 years agoSend long output by zephyrs
Alex Dehnert [Sat, 17 Apr 2021 04:57:49 +0000 (00:57 -0400)]
Send long output by zephyrs

3 years agoCheck for ssh signing less often
Alex Dehnert [Sat, 17 Apr 2021 04:56:02 +0000 (00:56 -0400)]
Check for ssh signing less often

We only try to re-sign every three days, so checking every two minutes is
really pretty excessive.

3 years agoAdd some more checks on salt minions
Alex Dehnert [Sun, 4 Apr 2021 04:03:14 +0000 (00:03 -0400)]
Add some more checks on salt minions

Among other things, this fixes sysconfig/salt#16.

3 years agoConfigure monitoring of salt minions
Alex Dehnert [Sun, 28 Mar 2021 16:44:02 +0000 (12:44 -0400)]
Configure monitoring of salt minions

5 years agoAdd monitoring of dovecot replication
Alex Dehnert [Mon, 26 Aug 2019 06:37:36 +0000 (02:37 -0400)]
Add monitoring of dovecot replication

5 years agoESP has their own monitoring now
Alex Dehnert [Thu, 27 Jun 2019 05:31:49 +0000 (01:31 -0400)]
ESP has their own monitoring now

5 years agoSwitch to new IPs
Alex Dehnert [Thu, 27 Jun 2019 05:27:32 +0000 (01:27 -0400)]
Switch to new IPs

5 years agoNew Apache/nagios config for xenial (16.04)
Alex Dehnert [Sun, 5 May 2019 09:21:59 +0000 (05:21 -0400)]
New Apache/nagios config for xenial (16.04)

5 years agonagios config tweaks from upgrading to 16.04
Alex Dehnert [Sun, 5 May 2019 03:14:10 +0000 (23:14 -0400)]
nagios config tweaks from upgrading to 16.04

7 years agoNew post-renumbering IP addrs at ET
Alex Dehnert [Sat, 20 May 2017 19:07:35 +0000 (15:07 -0400)]
New post-renumbering IP addrs at ET

8 years agoUse novgorod's post-renumbering IP
Alex Dehnert [Sat, 21 Jan 2017 18:47:40 +0000 (13:47 -0500)]
Use novgorod's post-renumbering IP

9 years agoFix nagios checks
Alex Dehnert [Tue, 19 Jan 2016 05:48:12 +0000 (00:48 -0500)]
Fix nagios checks

- HTTPS checks should use check_https_hostname, so that we use the hostname's
  vhost, not the IP address
- explicitly use .my.cnf for olinda's mysql check, rather than just setting the
  home dir (I don't know why that seems to have broken, but it has)

9 years agoLinerva has been dead for ages, so delete the config
Alex Dehnert [Tue, 19 Jan 2016 01:40:33 +0000 (20:40 -0500)]
Linerva has been dead for ages, so delete the config

9 years agoNew config options with Ubuntu 14.04's nagios
Alex Dehnert [Tue, 19 Jan 2016 01:34:25 +0000 (20:34 -0500)]
New config options with Ubuntu 14.04's nagios

9 years agoNew stylesheets with Ubuntu 14.04's nagios
Alex Dehnert [Tue, 19 Jan 2016 01:33:50 +0000 (20:33 -0500)]
New stylesheets with Ubuntu 14.04's nagios

9 years agoESP: add RAID check
Alex Dehnert [Tue, 13 Oct 2015 03:55:09 +0000 (23:55 -0400)]
ESP: add RAID check

11 years agoSet up repeat notifications for my outages
Alex Dehnert [Sat, 15 Feb 2014 20:31:29 +0000 (15:31 -0500)]
Set up repeat notifications for my outages

11 years agoLunatique is no more
Alex Dehnert [Sat, 15 Feb 2014 18:49:24 +0000 (13:49 -0500)]
Lunatique is no more

11 years agoCheck the jabber server
Alex Dehnert [Wed, 31 Jul 2013 06:26:18 +0000 (02:26 -0400)]
Check the jabber server

11 years agoWarn on olinda cert expiry only 10 days early
Alex Dehnert [Sat, 6 Jul 2013 18:37:04 +0000 (14:37 -0400)]
Warn on olinda cert expiry only 10 days early

StartSSL doesn't want to renew my cert 14 days early, apparently. :(

11 years agoBump the large queue threshold
Alex Dehnert [Wed, 10 Apr 2013 17:49:41 +0000 (13:49 -0400)]
Bump the large queue threshold

I'm tuning out the 20/40 triggers, which means that I might as well not have
them.

11 years agoDisable notifications on s-b
Alex Dehnert [Thu, 28 Mar 2013 03:07:20 +0000 (23:07 -0400)]
Disable notifications on s-b

Ops has switched to a new set of dialups, so s-b will be down for a longish
while yet.

12 years agoBump the olinda mailq thresholds
Alex Dehnert [Mon, 26 Nov 2012 07:44:12 +0000 (02:44 -0500)]
Bump the olinda mailq thresholds

12 years agoCGI-related changes?
Alex Dehnert [Wed, 14 Nov 2012 10:22:18 +0000 (05:22 -0500)]
CGI-related changes?

I think this may be the result of some package upgrade

12 years agoUpdate ESP config for web access
Alex Dehnert [Wed, 14 Nov 2012 10:21:57 +0000 (05:21 -0500)]
Update ESP config for web access

12 years agoUpstream updates (Lucid->Precise upgrade)
Alex Dehnert [Sun, 26 Aug 2012 21:02:14 +0000 (17:02 -0400)]
Upstream updates (Lucid->Precise upgrade)

12 years agoPush esp.mit.edu's cert expiry warning to 30 days
Alex Dehnert [Sat, 28 Jul 2012 19:58:50 +0000 (15:58 -0400)]
Push esp.mit.edu's cert expiry warning to 30 days

This uses "fake" default arguments for check commands. See
http://tracker.nagios.org/print_bug_page.php?bug_id=174 for the bug asking for
a good way to do them, and a workaround for how to do them with current nagios.

12 years agoNotify ESP by email, too
Alex Dehnert [Sat, 28 Jul 2012 19:41:08 +0000 (15:41 -0400)]
Notify ESP by email, too

12 years agoMonitor SSL on olinda
Alex Dehnert [Thu, 26 Jul 2012 05:17:28 +0000 (01:17 -0400)]
Monitor SSL on olinda

12 years agoMonitor lunatique pingability
Alex Dehnert [Thu, 26 Jul 2012 05:16:58 +0000 (01:16 -0400)]
Monitor lunatique pingability

12 years agoIgnore brief spikes in disk due to backups(?)
Alex Dehnert [Sat, 31 Mar 2012 07:57:00 +0000 (03:57 -0400)]
Ignore brief spikes in disk due to backups(?)

12 years agoMonitor linerva (especially disk)
Alex Dehnert [Fri, 9 Mar 2012 17:44:42 +0000 (12:44 -0500)]
Monitor linerva (especially disk)

13 years agoRequire more failed checks before alerting
Alex Dehnert [Sat, 1 Oct 2011 18:55:27 +0000 (14:55 -0400)]
Require more failed checks before alerting

13 years agoSend zephyrs about esp to -c esp-auto, not -c esp
Alex Dehnert [Sat, 1 Oct 2011 18:55:09 +0000 (14:55 -0400)]
Send zephyrs about esp to -c esp-auto, not -c esp

13 years agoIncrease the threshold for postfix queue alerts
Alex Dehnert [Fri, 10 Jun 2011 01:30:34 +0000 (21:30 -0400)]
Increase the threshold for postfix queue alerts

13 years agoAdd Exim queue checks
Alex Dehnert [Mon, 25 Apr 2011 07:47:33 +0000 (03:47 -0400)]
Add Exim queue checks

Unfortunately, Exim doesn't allow non-admins to see the queues,
which makes it hard to actually use this check.

13 years agoCheck that can connect to services on esp.mit.edu
Alex Dehnert [Mon, 25 Apr 2011 07:25:42 +0000 (03:25 -0400)]
Check that can connect to services on esp.mit.edu

13 years agoFix MySQL check
Alex Dehnert [Sat, 23 Apr 2011 07:27:56 +0000 (03:27 -0400)]
Fix MySQL check

13 years agoExpand monitoring of olinda
Alex Dehnert [Sat, 23 Apr 2011 07:08:07 +0000 (03:08 -0400)]
Expand monitoring of olinda

Add monitoring of:
* Postfix (SMTP connections, queue size)
* Dovecot (IMAPS connections)
* BIND (DNS port, I think)
* MySQL

13 years agoReconfigure esp so notification go to -c esp
Alex Dehnert [Wed, 20 Apr 2011 19:33:53 +0000 (15:33 -0400)]
Reconfigure esp so notification go to -c esp

13 years agoAdd NRPE config
Alex Dehnert [Wed, 20 Apr 2011 19:22:38 +0000 (15:22 -0400)]
Add NRPE config

13 years agoMonitor esp.mit.edu
Alex Dehnert [Wed, 20 Apr 2011 19:21:53 +0000 (15:21 -0400)]
Monitor esp.mit.edu

13 years agoCheck every 2 minutes
Alex Dehnert [Wed, 20 Apr 2011 17:33:25 +0000 (13:33 -0400)]
Check every 2 minutes

13 years agoIgnore the htpasswd file
Alex Dehnert [Wed, 20 Apr 2011 17:32:41 +0000 (13:32 -0400)]
Ignore the htpasswd file

13 years agoWatch my machines --- novgorod, olinda
Alex Dehnert [Wed, 20 Apr 2011 17:32:13 +0000 (13:32 -0400)]
Watch my machines --- novgorod, olinda

13 years agoWatch scrubbing-bubbles
Alex Dehnert [Wed, 20 Apr 2011 17:31:47 +0000 (13:31 -0400)]
Watch scrubbing-bubbles

13 years agoAdd contacts (zephyr)
Alex Dehnert [Wed, 20 Apr 2011 17:31:18 +0000 (13:31 -0400)]
Add contacts (zephyr)

13 years agoAdd directory for site-specific config
Alex Dehnert [Wed, 20 Apr 2011 17:30:19 +0000 (13:30 -0400)]
Add directory for site-specific config

This won't separate it all out, but it'll make me feel a bit better.

13 years agoFix line break in zephyr command
Alex Dehnert [Wed, 20 Apr 2011 17:29:25 +0000 (13:29 -0400)]
Fix line break in zephyr command

13 years agoAdd zephyr config
Alex Dehnert [Wed, 20 Apr 2011 15:46:43 +0000 (11:46 -0400)]
Add zephyr config

13 years agoAdd upstream nagios config
Alex Dehnert [Wed, 20 Apr 2011 15:44:46 +0000 (11:44 -0400)]
Add upstream nagios config