From #MonitoringSucks to  From #MonitoringSucks to   #MonitoringLove #MonitoringLove  (and back)(and back) @KrisBuytaert T-Dose 2015, Eindhoven,.nl
Kris BuytaertKris Buytaert ● I used to be a Dev,I used to be a Dev, ● Then Became an OpThen Became an Op ● Chief Trolling Officer and Open SourceChief Trolling Officer and Open Source Consultant @inuits.euConsultant @inuits.eu ● Everything is an effing DNS ProblemEverything is an effing DNS Problem ● Building Clouds since before the bookstoreBuilding Clouds since before the bookstore ● Organising ConferencesOrganising Conferences ● Evangelizing devopsEvangelizing devops
An opinionated talk about the Open SourceAn opinionated talk about the Open Source Monitoring tooling landscapeMonitoring tooling landscape In which I hope to learn from YOUIn which I hope to learn from YOU
#devops=~C(L)AMS#devops=~C(L)AMS ● CultureCulture ● (Lean)(Lean) ● AutomationAutomation ● Monitoring and MeasurementMonitoring and Measurement ● SharingSharing Damon Edwards and John WillisDamon Edwards and John Willis Gene KimGene Kim
Monitoring is usually anMonitoring is usually an aftertoughtaftertought ENOBUDGET, ENOTIMEENOBUDGET, ENOTIME
An 2008 OLS PaperAn 2008 OLS Paper ● We have bloated Java toolsWe have bloated Java tools ● Some open Core stufSome open Core stuf ● DYI folks want traditional NagiosDYI folks want traditional Nagios ● DBA RequiredDBA Required
#monitoringsucks#monitoringsucks ● John Vincent (@lusis), june 2011John Vincent (@lusis), june 2011 ● A sub #devops movementA sub #devops movement ● https://github.com/monitoringsucks/https://github.com/monitoringsucks/
Why #monitoringsucksWhy #monitoringsucks ● Manual config (gui)Manual config (gui) ● Not in sync with realityNot in sync with reality ● Hosts onlyHosts only ● Services sometimesServices sometimes ● Aplication neverAplication never ● Chaos or out of sync with realityChaos or out of sync with reality ● Alert FatigueAlert Fatigue
Let's forget aboutLet's forget about ● Tools with no (stable) APITools with no (stable) API ● Tools with strong focus on GUITools with strong focus on GUI ● Unless you are an SME with < 100 nodesUnless you are an SME with < 100 nodes ● Zenoss, Hyperic, GroundWork, ....Zenoss, Hyperic, GroundWork, .... ● P.S. : don't even mention proprietary software to meP.S. : don't even mention proprietary software to me
What we wantWhat we want ● Small , well suited componentsSmall , well suited components • CollectCollect • Transport / MangleTransport / Mangle • StoreStore • AnalyseAnalyse • Act / AlertAct / Alert • VisualizeVisualize
#monitoringlove#monitoringlove • • Ulf Mansson #devopsdays Rome 2011Ulf Mansson #devopsdays Rome 2011 • A new era of toolingA new era of tooling • #monitoringlove hacksessions @inuits#monitoringlove hacksessions @inuits • #monitorama#monitorama
IcingaIcinga • 2009 Fork2009 Fork • I consider Nagios deadI consider Nagios dead • Vibrant Community (or they stalk me)Vibrant Community (or they stalk me) • Throw great parties in NurnbergThrow great parties in Nurnberg • Nobody can pronounce it anyhowNobody can pronounce it anyhow • https://github.com/Inuits/puppet-icinga/https://github.com/Inuits/puppet-icinga/
AutomationAutomation
#monitoringlove#monitoringlove But the love was about :But the love was about :
SensuSensu ● Awesome for non staticAwesome for non static environmentsenvironments ● Scaling a clustered RabbitMQ ?Scaling a clustered RabbitMQ ? ● This is Europe, U no do cloudThis is Europe, U no do cloud
Automation ofAutomation of #monitoring#monitoring brought backbrought back thethe #love#love
Monitoring aMonitoring a serviceservice vsvs Monitoring aMonitoring a ServiceService
definition of done:definition of done: monitored and in productionmonitored and in production
A software project is not doneA software project is not done untill your last end user is deaduntill your last end user is dead
Culture,Culture, Automation,Automation, Measurement :Measurement : measure all the thingsmeasure all the things SharingSharing
Deploy StatisticsDeploy Statistics ● Time To DeployTime To Deploy ● DeployDeploy FrequencyFrequency ● LifecycleLifecycle frequencyfrequency ● Map to otherMap to other metricsmetrics
CollectD all the metrics,CollectD all the metrics, at high intervalsat high intervals
Oldschool graphiteOldschool graphite
Self ServiceSelf Service Gdash based pipelinesGdash based pipelines Puppetized Templates (wip)Puppetized Templates (wip)
GdashGdash
GrafanaGrafana
Graphite++Graphite++ ● DashboardsDashboards • GrafanaGrafana ● Engines :Engines : • InfluxDBInfluxDB • CyaniteCyanite
Triggers on GraphsTriggers on Graphs ● Export Java MetricsExport Java Metrics ● JMXTransJMXTrans ● Export JMXConfigsExport JMXConfigs ● Configure NRPE CheckConfigure NRPE Check ● Export NagiosCheckExport NagiosCheck ● Collect JMX Exports onCollect JMX Exports on JMXTransNodeJMXTransNode ● Graph EmGraph Em Collect Icinga ConfigsCollect Icinga Configs on Icingaon Icinga
AggregationAggregation ● Alert on streamsAlert on streams ● Alert on aggregated metricsAlert on aggregated metrics
RiemannRiemann ● I still don't get it ?I still don't get it ? ● Distributed TopDistributed Top ● Do you like Clojure ?Do you like Clojure ? ● Riemann Health plugin ?Riemann Health plugin ? ● s/riemann-health/collectd/g;s/riemann-health/collectd/g; ● Output to graphiteOutput to graphite
Graphs to KnowledgeGraphs to Knowledge SkylineSkyline • OculusOculus • Creating Information out of this dataCreating Information out of this data • Big dataBig data • Machine LearningMachine Learning
But I have log files..But I have log files..
Logs and MetricsLogs and Metrics ● Graylog2Graylog2 ● ELSA (Enterprise Log Search andELSA (Enterprise Log Search and Archive)Archive) ● ELK StackELK Stack
● Collect fromCollect from anywhereanywhere ● FilterFilter ● Send anywhereSend anywhere
APMAPM But what about my apps ?But what about my apps ? Half the world cheers about SAASHalf the world cheers about SAAS tools :(tools :(
PacketbeatPacketbeat ● Traffic FlowTraffic Flow through networkthrough network ● TransactionsTransactions causing errroscausing errros ● SQL per HTTPSQL per HTTP ● API call usageAPI call usage
PacketBeatPacketBeat
So your DC failsSo your DC fails Whom to alert when ?Whom to alert when ?
'New' kids on the block'New' kids on the block ● FlapjackFlapjack flapjack.ioflapjack.io monitoring notification routing +monitoring notification routing + event processing systemevent processing system ● OpenDutyOpenDuty github.com/szechuen/OpenDutygithub.com/szechuen/OpenDuty Duty managementDuty management
My Alerting StrategyMy Alerting Strategy Is still in betaIs still in beta
And back :(And back :( In 2014 I`m still running the same check forIn 2014 I`m still running the same check for - service registration (consul)- service registration (consul) - high availability (pacemaker/corosync)- high availability (pacemaker/corosync) - monitoring (icinga)- monitoring (icinga)
But I love where Monitoring is headingBut I love where Monitoring is heading We have much less false positivesWe have much less false positives And we have a Maintainable Monitoring InfraAnd we have a Maintainable Monitoring Infra KindaKinda
ContactContact Kris.Buytaert@inuits.euKris.Buytaert@inuits.eu Further ReadingFurther Reading @krisbuytaert@krisbuytaert http://www.krisbuytaert.be/blog/http://www.krisbuytaert.be/blog/ http://www.inuits.eu/http://www.inuits.eu/ InuitsInuits Duboistraat 50Duboistraat 50 2060 Antwerpen2060 Antwerpen BelgiumBelgium 891.514.231891.514.231 +32 475 961221+32 475 961221

Open Source Monitoring in 2015