NOC Analyst | Observability Engineer | SRE Enthusiast
π Nairobi County, Kenya
Welcome to my GitHub! I'm a Network Operations Center (NOC) Analyst with 5+ years in financial services, specializing in observability, monitoring, and reliability engineering. I transform complex system data into actionable insights using modern monitoring stacks, automation, and SRE principles to ensure high-availability infrastructure.
- System Observability: Building comprehensive monitoring solutions with ELK, Zabbix, and Grafana
- Site Reliability Engineering: Implementing SLOs, error budgets, and automation for infrastructure resilience
- Security Operations: Integrating security monitoring into observability pipelines
- Infrastructure as Code: Automating deployment and monitoring with Python and Bash scripting
- Docker roadmap for RHEL environments: secure installs, SELinux considerations, and hardened defaults
- Covers images, networking, storage, logging, and baseline CI/CD patterns
- Practical checklists and commands for ops-ready RHEL container setups
- Trade analytics and import recommendation insights for the Kenyan market
- Data exploration, visualization, and signal extraction for decision support
- Modular notebooks/pipelines for reproducible analysis
- DevOps/SRE interview prep kit with curated Q&A, scenarios, and checklists
- Focus on observability, reliability, automation, and incident playbooks
- AI-augmented notes for faster review and retention
- Python-based detector for unused/orphaned files across 20+ file types
- CI-friendly JSON output, configurable rules, and heuristics for suspicious files
- Streamlit uptime/SLA analytics with Z-score + MAD spike detection
- Pushes SLA reports back to Zabbix; smart caching and history/trends auto-switching
- 99.8% average uptime across monitored services (tracked via custom Zabbix analytics)
- 12 minutes Mean Time to Detection (MTTD) using adaptive spike detection algorithms
- 22 minutes Mean Time to Resolution (MTTR) with automated correlation analysis
- 65% reduction in manual operational tasks through Python automation
- Zero false positive alerts through intelligent filtering and MAD-based anomaly detection
- Production-grade SLA monitoring with automated daily uptime reporting to stakeholders
| Certification | Badge | Status |
|---|---|---|
| ISC2 Certified in Cybersecurity (CC) | β Active | |
| CompTIA Linux+ | β Active | |
| CISCO Cybersecurity Essentials | β Active | |
| AWS Solutions Architect | π― In Progress |
- Advanced Analytics: Implementing Z-score and MAD algorithms for infrastructure anomaly detection
- SLA Engineering: Building comprehensive uptime calculation engines with automated reporting
- API Integration: Developing robust API clients with OAuth2, caching, and rate limiting
- Statistical Analysis: Cross-host correlation matrices for infrastructure pattern identification
- Performance Optimization: Auto-switching between Zabbix history/trends for optimal query performance
- Production Systems: Deploying enterprise-grade monitoring dashboards with 14-day log retention
- Code Quality Automation: Building dead code detection tools for automated repository maintenance
I'm passionate about advanced observability engineering, statistical monitoring algorithms, and production-grade SLA systems. Whether you're interested in anomaly detection, automated uptime reporting, or building enterprise monitoring dashboards, I'd love to share insights and collaborate!
π LinkedIn: josephkibaki
π§ Email: kibaki.joseph1@gmail.com
π¬ Open to: Mentoring, Knowledge Sharing, SRE Discussions
"Observability is not about collecting dataβit's about understanding your systems well enough to ask the right questions when things go wrong."
β‘ Building reliable systems, one metric at a time β‘

