2

I would like to extract data continuously from some systemd log files. I have a regular expression which extracts what I'm interested in. What I'm not sure about is how to continously digest that file. I'm using tokio to keep things asynchronous.

I'm wondering if I should regularly (e.g. once per second) open that file, read into a buffer until the previously last line is matched, if I can keep that file open (I don't think this is a good idea from my Python experience, but I'm not sure if that would be ok in rust) or if there is another more elegant way to achieve that.

Thanks a lot in advance!

1
  • 1
    You could look at notify, but it doesn't appear to support async and the API looks tightly coupled with std::mpsc so might be hard to make it work nicely with tokio. Commented Mar 27, 2022 at 1:21

2 Answers 2

3

Based on the comment from piojo I found this a viable solution for me:

 let mut file = File::open(path).await.unwrap(); let mut interval = time::interval(Duration::from_millis(1000)); let mut contents = vec![]; let mut position = 0; loop { contents.truncate(0); file.seek(SeekFrom::Start(position as u64)).await; position += file.read_to_end(&mut contents).await.unwrap(); /// do_process(contents) interval.tick().await; } 
Sign up to request clarification or add additional context in comments.

6 Comments

It's nice that you're posting your solution. BTW, instead of allocating a new vec in each loop iteration, you can call contents.truncate(0). And you should be able to set last_position += read_to_end() and not need an extra call.
And lastly, a common pattern is to make most or all of your functions return Result<_>, and within them, call foo? instead of foo.unwrap(). That will cascade any errors upwards, but if you decide you want to handle them, you can. (But the functions may return different error types, so people use a helper library for error handling. I use anyhow and return anyhow::Result<_>. But your code is fine! These are just tips.
Last comment (I promise): you should check your CPU usage. It's likely that if you don't use notify (or just let the thread sleep), the loop will peg one core at 100%.
Regarding the loop it looks actually fine with the interval.tick running this asynchronously. It's less elegant than notify, though.
What a about this for eliminating the need for two variables? play.rust-lang.org/…
|
2

Don't read into a buffer until the previously last line is matched. Instead, keep track of your position in the file then seek to that position. You can make sure the seeked text matches (and that the seek succeeds) to be sure the file hasn't been rotated or truncated.

As for when to read, you can poll or you can use an API that notifies you of changes. I haven't used it, but I found the notify crate with a quick search. In addition to the crate documentation, the page here looks useful.

1 Comment

Thanks that worked! For the sake of completeness I'll post my solution based on your input. I'm happy to take any feedback :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.