AWS Meetup Chicago
Who am I Asaf Yigal Co-Founder and VP Product @logz.io Email: asaf@logz.io Twitter @asafyigal
Agenda • Why do we need Log analytics? • Intro to ELK • What is Logz.io • Installing ELK on your own • Our Architecture • EC2 machine comparison
Why do we need Log analytics?
Werner Vogels AWS CTO “Log Analytics is Fundamental for Building Cloud Applications”
Product Management Business Analysis Customer Success BI Monitoring DevOps IoT Troubleshooting Support QA IT OPPS , ITOA Compliance SecOps SIEM Multiple Use- Cases
Log driven development • Errors, Warnings and exceptions • Metrics • Alerts • Dashboard
Why Open Source
*based on Logz.io research The Market is Dominated by Open Source Solutions Over the past 3 years, the market shifted attention from proprietary to open source ELK Stack, 400,000+ companies Splunk, Sumo Logic, Loggly, - 20,000 companies Graphite has > 1M companies using it
ELK Popularity
Intro to ELK Logstash •Streaming data digestion •Time normalization •Field extraction Elasticsearch •Schema-less search DB •Highly scalable Kibana •Visualization
Open source ELK +/- Simple and beautifulIt’s simple to get started and play with ELK and the UI is just beautiful Open Source The largest user base with a vibrant open source community that supports and improves the product Fast. Very fast. Built on the Elasticsearch search engine, ELK provide blazing quick responses even when searching through millions of documents Hard to Scale Data piles up and organization experience usage bursts. It’s super-complex building elastic ELK deployments that can scale up and down Poor Security Logs include sensitive data and open source ELK offers no real security solution, from authentication to role based access Not Production Ready Building production ready ELK deployment is a great challenge organization face. With hundreds of different configurations and support matrix, making sure it’s always up is difficult
Up and running in minutesSign up in and get insights into your data in minutes Logz.io Enterprise ELK Cloud Service Production ready Predefined and community designed dashboard, visualization and alerts are all bundled and ready to provide insights Infinitely scalable Ship as much data as you want whenever you want Alerts Unique Alerts system proprietary built on top of open source ELK transform the ELK into a proactive system Highly Available Data and entire data ingestion pipeline can sustain downtime in full datacenter without losing data or service Advanced Security 360 degrees security with role based access and multi-layer security
Installing ELK on your own
Prototype • Installing ELK stack on a single server – 1hr • Shipping one type of log – 1hr • Log parsing – 2 hr • Building Kibana Dashboard – 2hr • 6 hours to get a simple Prototype
Turning ELK Production ready
OS Level OptimizationElasticsearch require a lot of OS level optimization in order to run properly. Elasticsearch Shard Allocation Optimizing insert and query times can be tricky and require a lot of attention. Index Management Because deletion is an expensive operation Index management is required for log analytics solutions Zone awareness This is specific for AWS and required to achieve high availability Cluster Topology Elasticsearch clusters require 3 Master nodes, Data nodes and Client nodes. Bulk inserts OptimizationOptimizing insert time and latency
Capacity provisioningNeed to account for log bursts and be able to provision enough capacity. Elasticsearch (2) Archive (DR) Snapshot the data to a different repository for disaster recovery Mapping managementMapping conflicts and sync issues need to be detected and addressed Monitoring Marvell does a good job but require DevOps constant attention Curator Remove or optimize old indices Alias management For better cluster control you need to define and use aliases
Data parsing Extracting values from text messages and enhancing them with geo user agent etc. Logstash High Availability Running logstash in a cluster is not trivial. Scalability Dealing with increase of load on the logstash servers Burst Protection Logs tend to be bursty – A special buffer like Redis, Kafka etc. is required to front logstash Rejection from ElasticsearchElaticsearch rejects about 1% of messages due to mapping issues – This needs to be addressed Configuration managementA special infrastructure need to be in place to allow config changes with no data loss
Security Kibana by default has no protection. User authentication is required to be implemented Kibana High Availability Running Kibana in a cluster for upgrades and high availability. Role based access If you want to restrict access to certain information this capability needs to be developed Alerts Alerts is not part of the open source. Anomaly Detection Basic anomaly detection is missing from the Kibana Pre Canned DashboardsBuilding Dashboards and visualization in Kibana is tricky and require special knowledge
Turning ELK Production ready ~ 4-6 weeks of work
Upgrades Challenging to upgrade – need to be aware of backward compatibility. Maintenance Overall cluster healthMonitor the health of the environment AWS Issues Dealing with AWS stability issues Mapping conflicts Deal with arising mapping conflicts Personnel redundancyNeed to have multiple people with deep knowledge of the stack Capacity increase Provision additional capacity and grow the cluster.
Our Architecture
Ha Proxy Listener Listener Listener Listener Kafka Log Engine S3 Elasticsearch Play server Curator Hot/Cold migration DLQ Alert Engine Kibana Monitoring: ELK, Graphite, Nagios etc. Shard optimizer Log Engine Logstash API Gateway Cluster Protec- tion
Demo
AWS Server Comparison Machine Number TB/Day M1.xlarge 4 0.6 i2.xlarge 4 1 C3.8xlarge 6 1.5 C4.2xlarge + 1TB EBS 3 1.3
We’re Hiring • Technical evangelist • Business Development • Marketing jobs@logz.io
Questions?

Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)

  • 1.
  • 2.
    Who am I AsafYigal Co-Founder and VP Product @logz.io Email: asaf@logz.io Twitter @asafyigal
  • 3.
    Agenda • Why dowe need Log analytics? • Intro to ELK • What is Logz.io • Installing ELK on your own • Our Architecture • EC2 machine comparison
  • 4.
    Why do weneed Log analytics?
  • 5.
    Werner Vogels AWS CTO “Log Analyticsis Fundamental for Building Cloud Applications”
  • 6.
  • 7.
    Log driven development •Errors, Warnings and exceptions • Metrics • Alerts • Dashboard
  • 9.
  • 10.
    *based on Logz.ioresearch The Market is Dominated by Open Source Solutions Over the past 3 years, the market shifted attention from proprietary to open source ELK Stack, 400,000+ companies Splunk, Sumo Logic, Loggly, - 20,000 companies Graphite has > 1M companies using it
  • 11.
  • 12.
    Intro to ELK Logstash •Streamingdata digestion •Time normalization •Field extraction Elasticsearch •Schema-less search DB •Highly scalable Kibana •Visualization
  • 13.
    Open source ELK+/- Simple and beautifulIt’s simple to get started and play with ELK and the UI is just beautiful Open Source The largest user base with a vibrant open source community that supports and improves the product Fast. Very fast. Built on the Elasticsearch search engine, ELK provide blazing quick responses even when searching through millions of documents Hard to Scale Data piles up and organization experience usage bursts. It’s super-complex building elastic ELK deployments that can scale up and down Poor Security Logs include sensitive data and open source ELK offers no real security solution, from authentication to role based access Not Production Ready Building production ready ELK deployment is a great challenge organization face. With hundreds of different configurations and support matrix, making sure it’s always up is difficult
  • 14.
    Up and runningin minutesSign up in and get insights into your data in minutes Logz.io Enterprise ELK Cloud Service Production ready Predefined and community designed dashboard, visualization and alerts are all bundled and ready to provide insights Infinitely scalable Ship as much data as you want whenever you want Alerts Unique Alerts system proprietary built on top of open source ELK transform the ELK into a proactive system Highly Available Data and entire data ingestion pipeline can sustain downtime in full datacenter without losing data or service Advanced Security 360 degrees security with role based access and multi-layer security
  • 15.
  • 16.
    Prototype • Installing ELKstack on a single server – 1hr • Shipping one type of log – 1hr • Log parsing – 2 hr • Building Kibana Dashboard – 2hr • 6 hours to get a simple Prototype
  • 17.
  • 18.
    OS Level OptimizationElasticsearch requirea lot of OS level optimization in order to run properly. Elasticsearch Shard Allocation Optimizing insert and query times can be tricky and require a lot of attention. Index Management Because deletion is an expensive operation Index management is required for log analytics solutions Zone awareness This is specific for AWS and required to achieve high availability Cluster Topology Elasticsearch clusters require 3 Master nodes, Data nodes and Client nodes. Bulk inserts OptimizationOptimizing insert time and latency
  • 19.
    Capacity provisioningNeed to accountfor log bursts and be able to provision enough capacity. Elasticsearch (2) Archive (DR) Snapshot the data to a different repository for disaster recovery Mapping managementMapping conflicts and sync issues need to be detected and addressed Monitoring Marvell does a good job but require DevOps constant attention Curator Remove or optimize old indices Alias management For better cluster control you need to define and use aliases
  • 20.
    Data parsing Extracting valuesfrom text messages and enhancing them with geo user agent etc. Logstash High Availability Running logstash in a cluster is not trivial. Scalability Dealing with increase of load on the logstash servers Burst Protection Logs tend to be bursty – A special buffer like Redis, Kafka etc. is required to front logstash Rejection from ElasticsearchElaticsearch rejects about 1% of messages due to mapping issues – This needs to be addressed Configuration managementA special infrastructure need to be in place to allow config changes with no data loss
  • 21.
    Security Kibana by defaulthas no protection. User authentication is required to be implemented Kibana High Availability Running Kibana in a cluster for upgrades and high availability. Role based access If you want to restrict access to certain information this capability needs to be developed Alerts Alerts is not part of the open source. Anomaly Detection Basic anomaly detection is missing from the Kibana Pre Canned DashboardsBuilding Dashboards and visualization in Kibana is tricky and require special knowledge
  • 22.
    Turning ELK Productionready ~ 4-6 weeks of work
  • 23.
    Upgrades Challenging to upgrade– need to be aware of backward compatibility. Maintenance Overall cluster healthMonitor the health of the environment AWS Issues Dealing with AWS stability issues Mapping conflicts Deal with arising mapping conflicts Personnel redundancyNeed to have multiple people with deep knowledge of the stack Capacity increase Provision additional capacity and grow the cluster.
  • 24.
  • 25.
  • 26.
  • 27.
    AWS Server Comparison MachineNumber TB/Day M1.xlarge 4 0.6 i2.xlarge 4 1 C3.8xlarge 6 1.5 C4.2xlarge + 1TB EBS 3 1.3
  • 28.
    We’re Hiring • Technicalevangelist • Business Development • Marketing jobs@logz.io
  • 29.