Finding up-to-date log-based public datasets including labels for new attacks, is hard to find. but there are some old-fashioned log-based datasets for some known attacks (i.e., iSQL, XSS injection) within weblogs or HTTP requests for the context of Web-server Log Anomaly Detection (WLAD) if fits you.
Please see Table II in this paper:
Majd, Mehryar, et al. "A Comprehensive Review of Anomaly Detection in Web Logs by M. Majd et al. -" 2022 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT). IEEE, 2022.
Context: Web-server Log Anomaly Detection (WLAD)
Here author collected recent workarounds including the used datasets of weblogs or HTTPS requests in the cybersecurity domain that the author addressed recently reviewed works of literature. As you see in this table, one of the most recent papers from Amazon used: HTTP CSIC 2010 and ISCX IDS 2012 which are old public datasets as I mentioned in his approach.
I also would like to share that a long time ago I saw a conversation in RG you might look at:
- How can I get data set of SQL injections for research purposes?
- Can I find benchmark dataset that contain Web Access log server infected by HTTP Flooding attacks?
- What are the tools for detecting SQL Injection Attack (SQLiA) in Enterprise Web Application?
there are also old posts at https://security.stackexchange.com/ :
- Datasets dedicated for SIEM systems [closed]
- Looking for large marked datasets in cyber security [closed]
- Public Availability of a good Dataset in PCAP (TCPDUMP) format for IDS/IPS testing [closed]
some related Repo GH:
recent survey:
-
Landauer, Max, et al. "Deep learning for anomaly detection in log data: A survey." Machine Learning with Applications 12 (2023): 100470.
-
Le, V. H., & Zhang, H. (2022, May). Log-based anomaly detection with deep learning: How far are we?. In Proceedings of the 44th international conference on software engineering (pp. 1356-1367).