tiki-check.php: make all these 50+ values available to Nagios/Icinga/Shinken
- Status
- Open
- Subject
- tiki-check.php: make all these 50+ values available to Nagios/Icinga/Shinken
- Version
- 12.x
- Category
- Dogfood on a *.tiki.org site
- Feature request
- Feature
- Monitoring
- Resolution status
- New
- Submitted by
- Marc Laporte
- Volunteered to solve
- Marc Laporte
- Lastmod by
- Nelson Ko
- Rating
- Related-to
- Description
Use case 1:
- Customer has his own hosting (ex: dedicated or shared hosting)
- There is not just Tiki on that hosting
- Tiki consultant doesn't really control the hosting. They may change config to suit another app or in an upgrade.
- Tiki consultant sets up everything just nice thanks to Tiki Check
- Several months go by, all is well. Tiki Consultant is a hero.
- Hosting company (or someone else working on another app on the same server) proceeds to an upgrade/change without telling anyone
- Several months go by, problems appear, there is dissatisfaction
- Customer considers this as within the warranty and expects the Consultant to fix without extra charge
- Tiki consultant feels: "hey, it worked when I left it"
- Tiki consultant doesn't remember / have data on the previous config so can't explain the cause of the issues.
- Customer thinks Tiki was perhaps not such a good idea, as it's not supporting data load and there are all kinds of quirks
- Customer gets told, "why didn't you use system XYZ instead?"
In an alternate reality, Tiki Consultant sets up a Nagios/Icinga/Shinken instance to track all Tiki sites he has been associated with. Data is logged quietly in the background.When an issue is reported, he can look at historical data and see what changed and have a clue. As a bonus, he can indicate to the customer that hosting company made changes to the server without advising anyone. Tiki Consultant is a hero (and can bill that time), and hosting company is not. If Nagios/Icinga/Shinken could alert the Tiki Consultant of changes, it would permit Tiki Consultant to review changes and to evaluate if there are any risks of issues.
Use case 2: run on all *.tiki.org sites to help reliability.
Use case 3: run on pre-dogfood servers and if we notice something went awry (ex: requires more RAM), we have a clear indication of which day the commit came in.
We have 50+ beautiful checks in tiki-check.php Surely it can't be hard to make them accessible to an outside monitoring system?- Files
- Solution
OK this task is kind of broad and unactionable, so I have broken it down into actionable sub-tasks. It is more of an ongoing thing anyway.
1) Document the state of testing going on tiki.org sites, collect historical information, and produce monthly report on pass/fail on various checks. (some of these checks might fail a lot of the time with no problem, but collecting the historical info is useful to tune the tests for future, notification can be disabled for most of these checks).
2) If possible, provide easy access to the monitoring dashboard for tiki.org sites.
3) Increase the number of checks on the Tiki.org sites. There is a scalability issue here - it might require more machines and changi's shinken setup might scale better than amette's icinga setup. Anyway, the more checks the better, even if it is to understand better how Tiki sites behave under various situations, in various measured parameters.
4) Organize TMIT webinar on "how did we setup a Tiki monitoring infra". Will invite all sysadmins to describe/present their setup. The idea is not to do an academic study on the umpteen ways to do such a thing but just to present what has been done.
I will create separate tasks for each of these
- Importance
- 9
- Easy to solve?
- 7
- Priority
- 63
- Demonstrate Bug on Tiki 19+
-
This bug has been demonstrated on show2.tiki.org
Please demonstrate your bug on show2.tiki.org
- Demonstrate Bug (older Tiki versions)
-
This bug has been demonstrated on show.tikiwiki.org
Please demonstrate your bug on show.tikiwiki.org
Show.tiki.org is currently unavailableUnable to connect to show.tikiwiki.org. Please let us know of the problem so that we can do something about it. Thanks.
- Ticket ID
- 4680
- Created
- Wednesday 21 August, 2013 07:21:13 UTC
by Marc Laporte - LastModif
- Saturday 06 July, 2024 10:21:44 UTC