11

When building my Haskell project locally using stack build, only the changed source files are re-compiled. Unfortunately, I am not able to make Stack behave like this on GitHub Actions. Any suggestions please?

Example

I created a simple example with Lib.hs and Fib.hs, I even check that cached .stack-work folder is updated between builds but it always compiles both files even when just one is changed.

Here is the example:

  1. (no cache used, builds both Lib.hs and Fib.hs + dependencies): https://github.com/MarekSuchanek/stack-test/runs/542163994
  2. (only Lib.hs changes, builds both Lib.hs and Fib.hs): https://github.com/MarekSuchanek/stack-test/runs/542174351

I can observe from logs (verbose Stack) that something in cache is being updated, but it is totally not clear to me what and why. It correctly finds out that only Lib.hs is changed: "stack-test-0.1.0.0: unregistering (local file changes: src/Lib.hs)" so I can't understand why all gets compiled. I noticed that in 2. Fib.hi is not updated in .stack-work but others (Fib.o, Fib.dyn_hi, and Fib.dyn_o) are.

Note

Caching of ~/.stack is OK as well as no-build when no source file is changed. Of course, this is dummy example, but we have different projects with many more source files where it would significantly speed up the build. When non-source file is changed (e.g. README file), nothing is being built as expected.

2
  • 1
    As I see nobody knows how Stack actually "works" 😁 Commented Apr 12, 2020 at 13:09
  • 1
    See the answer I provided ;) I guess some people have an idea on how it works. ;P Commented Apr 13, 2020 at 21:03

2 Answers 2

7

The culprit for this problem is that stack uses timestamp (as many other tools do) to figure out if a source file has changed or not. When you restore cache on CI and you do it correctly, none of the dependencies will get rebuild, but the problem the source files is that when the CI provider clones a repo for you, the timestamps for all of the files in the repo are set to the date and time when it was cloned.

Hopefully the cause for recompilation of unchanged source files makes sense now. What do we do about working around this problem. The only real way to get it is to restore the timestamp of the last git commit that changed a particular file. I noticed this quite a while ago and a bit of googling gave me some answers on SO, here is one of them I think: Restore a file's modification time in Git

A modified it a bit to suite my needs and that is what I ended up with:

 git ls-tree -r --name-only HEAD | while read filename; do TS="$(git log -1 --format="%ct" -- ${filename})" touch "${filename}" -mt "$(date --date="@$TS" "+%Y%m%d%H%M.%S")" done 

That worker great for a while for me on Ubuntu CI, but solving this problem in an OS agnostic manner with bash is not something I wanted to do when I needed to setup Azure CI. For that reason I wrote a Haskell script that works for all GHC-8.2 version and newer without requiring any non-core dependencies. I use it for all of my projects and I'll embed the juice of it here, but also provide a link to a permanent gist:

main = do args <- getArgs let rev = case args of [] -> "HEAD" (x:_) -> x fs <- readProcess "git" ["ls-tree", "-r", "-t", "--full-name", "--name-only", rev] "" let iso8601 = iso8601DateFormat (Just "%H:%M:%S%z") restoreFileModtime fp = do modTimeStr <- readProcess "git" ["log", "--pretty=format:%cI", "-1", rev, "--", fp] "" modTime <- parseTimeM True defaultTimeLocale iso8601 modTimeStr setModificationTime fp modTime putStrLn $ "[" ++ modTimeStr ++ "] " ++ fp putStrLn "Restoring modification time for all these files:" mapM_ restoreFileModtime $ lines fs 

How would you go about using it without much overhead. The trick is to:

  • use stack itself to run the script
  • use the exactly samel resolver as the one for the project.

Above two points will ensure that no redundant dependencies or ghc versions will get installed. All in all the only two things are needed are stack and something like curl or wget and it will work cross platform:

# Script for restoring source files modification time from commit to avoid recompilation. curl -sSkL https://gist.githubusercontent.com/lehins/fd36a8cc8bf853173437b17f6b6426ad/raw/4702d0252731ad8b21317375e917124c590819ce/git-modtime.hs -o git-modtime.hs # Restore mod time and setup ghc, if it wasn't restored from cache stack script --resolver ${RESOLVER} git-modtime.hs --package base --package time --package directory --package process 

Here is a real project that uses this approach and you can dig through it to see how it works: massiv-io

Edit @Simon Michael in the comments mentioned that he can't reproduce this issue locally. Reason for this is that not everything is the same up on CI as it is locally. Quite often an absolute path is different, for example, possibly other things that I can't think of right now. Those things, together with the source file timestamp cause the recompilation of the source files.

For example follow this steps and you will find your project will be recompiled:

~/tmp$ git clone [email protected]:fpco/safe-decimal.git ~/tmp$ cd safe-decimal ~/tmp/safe-decimal$ stack build safe-decimal> configure (lib) [1 of 2] Compiling Main ... Configuring safe-decimal-0.2.0.0... safe-decimal> build (lib) Preprocessing library for safe-decimal-0.2.0.0.. Building library for safe-decimal-0.2.0.0.. [1 of 3] Compiling Numeric.Decimal.BoundedArithmetic [2 of 3] Compiling Numeric.Decimal.Internal [3 of 3] Compiling Numeric.Decimal ... ~/tmp/safe-decimal$ cd ../ ~/tmp$ mv safe-decimal safe-decimal-moved ~/tmp$ cd safe-decimal-moved/ ~/tmp/safe-decimal-moved$ stack build safe-decimal-0.2.0.0: unregistering (old configure information not found) safe-decimal> configure (lib) [1 of 2] Compiling Main ... 

You'll see that the location of the project triggered project building. Despite that the project itself was rebuild, you will notice that none of the source files were recompiled. Now if you combine that procedure with a touch of a source file, that source file will get recompiled.

To sum it up:

  • Environment can cause the project to be rebuild
  • Contents of a source file can cause the source file (and others that depend on it) to be recompiled
  • Environment together with the source file contents or timestamp change can cause the project together with that source file to be recompiled
Sign up to request clarification or add additional context in comments.

11 Comments

I'm confused by this, because I don't seem to see timestamp affecting my local stack builds. Eg if I touch a source file, it's not rebuilt.
Likewise if I touch the .{dyn_hi,dyn_o,hi,o} files.
@SimonMichael I added an example to the answer. In short, you need to trigger the rebuild of a project in order for the timestamp to trigger recompilation.
Thank you for the detailed info, very helpful. I saw it, as you say: changed paths (eg from renaming the folder) causes a rebuild of (a) Setup.hs and (b) any other modules whose timestamp has changed. Do you know of any issue for this in github.com/commercialhaskell/stack/issues ?
Thanks! Timestamps really solved this but additionally, GitHub actions use by default only very limited fetch without any history, so it had to be adjusted to fetch all history in order to recover timestamps correctly.
|
1

I have provided a PR fix for this so modified time is no longer relied on!

4 Comments

This is merged now in stack 2.5.1 - thank you @Andres S. Unfortunately even with stack 2.5.1 I continued to see the error Trouble loading CompilerPaths cache that brought me to this thread. For me it was the caching key, which was not correctly identified: key: ${{ runner.os }}-${{ matrix.ghc }} did not work, key: ${{ runner.os }}-${{ matrix.ghc }}-stack did.
@nevrome with stack (and cabal) now caching by content correctly, unfortunately ghc itself is not. I have spent too much time inside of the ghc build code to realize this. If I ever get some extra time I'll see about writing a proposal/PR to fix this but it will be an undertaking. If you compile a simple codebase with ghc, change the modified time of a file, and recompile the project with ghc, you'll notice that the file is recompiled.
I see - so maybe my issue about dependency caching is entirely unrelated to this thread after all. I'll leave the comment here anyway, because maybe somebody comes across it just like I did. Keep up the good work!
Good news! A WIP PR was just opened against GHC gitlab.haskell.org/ghc/ghc/-/merge_requests/5130

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.