9

Given a git repo, I need to generate a dictionary of each version controlled file's last modified date as a unix timestamp mapped to its file path. I need the last modified date as far as git is concerned - not the file system.

In order to do this, I'd like to get git to output a list of all files under version control along with each file's author date. The output from git ls-files or git ls-tree -r master would be perfect if their output had timestamps included on each line.

Is there a way to get this output from git?

Update for more context: I have a current implementation that consists of a python script that iterates through every file under source control and does a git log on each one, but I'm finding that that doesn't scale well. The more files in the repo, the more git log calls I have to make. So that has led me to look for a way to gather this info from git with fewer calls (ideally just 1).

4
  • 1
    @derekvanvivliet What do you mean by file author? There can be multiple people who have made commits to a file. Commented Oct 3, 2013 at 18:40
  • Unfortunately git lacks "tree blame" i.e. "git blame <directory>" to get output like in e.g. GitHub tree view. You can get data for each individual file with git log -1 --tformat=... --follow=<filename> with custom format, iterating over git ls-tree --names-only -r HEAD, but it won't be fast. Commented Oct 3, 2013 at 18:45
  • @ansh0l not file author, author date. I mean the date that the file was last modified in source control Commented Oct 3, 2013 at 18:45
  • @JakubNarębski I tried that method, but as you mentioned, it wasn't fast. I need something more performant than that. Commented Oct 3, 2013 at 18:47

4 Answers 4

2

a list of all files under version control along with each file's author date

Scaling isn't a problem with this one:

#!/bin/sh temp="${TMPDIR:-/tmp}/@@@commit-at@@@$$" trap "rm '$temp'" 0 1 2 3 15 git log --pretty=format:"%H%x09%at" --topo-order --reverse "$@" >"$temp" cut -f1 "$temp" \ | git diff-tree -r --root --name-status --stdin \ | awk ' BEGIN {FS="\t"; OFS="\t"} FNR==1{++f} f==1 {at[$1]=$2; next} NF==1 {commit=$1; next} $1=="D"{$1=""; delete last[$0]; next} # comment to also show deleted files {did=$1;$1=""; last[$0]=at[commit]"\t"did} END {for (f in last) print last[f]f} ' "$temp" - \ | sort -t"`printf '\t'`" -k3 
Sign up to request clarification or add additional context in comments.

2 Comments

that works really well. Can you explain what you're doing with git log and git diff-tree there? I'd like to implement it in python if it would perform anywhere near as well.
Essentially all the efficiency here is the bulk diff-tree handling, if python maintains an object cache it could do as well. The log dumps SHA + unix-timestamp-format author date from the root forward, so the latest-actually-committed %at is the one registered, then the cut|diff-tree spits for each a SHA line then name-status pairs into the awk which loads up the sha/timestamp pairs first and then loads up the name->timestamp-via-commit-SHA lookup+given-status table and dumps that table at the end. Best way to see what's going on is to run the commands yourself on a toy repo.
1

What I would do is run git ls-files and add all of them into an array, then run git log $date_args --name-only, and then parse that output and remove those files from the array while adding the date information to a dictionary, and stop the processing once the array is empty.

Comments

0

I wrote the following script to output for each file the path, short hashtag and date.

#!/usr/bin/env python3 # -*- coding: utf-8 -*- # # Author: R.F. Smith <[email protected]> # $Date: 2013-03-23 01:09:59 +0100 $ # # To the extent possible under law, Roland Smith has waived all # copyright and related or neighboring rights to gitdates.py. This # work is published from the Netherlands. See # http://creativecommons.org/publicdomain/zero/1.0/ """For each file in a directory managed by git, get the short hash and data of the most recent commit of that file.""" import os import sys import subprocess import time from multiprocessing import Pool # Suppres terminal windows on MS windows. startupinfo = None if os.name == 'nt': startupinfo = subprocess.STARTUPINFO() startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW def filecheck(fname): """Start a git process to get file info. Return a string containing the filename, the abbreviated commit hash and the author date in ISO 8601 format. Arguments: fname -- Name of the file to check. """ args = ['git', '--no-pager', 'log', '-1', '--format=%h|%at', fname] try: b = subprocess.check_output(args, startupinfo=startupinfo) data = b.decode()[:-1] h, t = data.split('|') out = (fname[2:], h, time.gmtime(float(t))) except (subprocess.CalledProcessError, ValueError): return (fname[2:], '', time.gmtime(0.0)) return out def main(): """Main program.""" # Get a list of all files allfiles = [] # Get a list of excluded files. exargs = ['git', 'ls-files', '-i', '-o', '--exclude-standard'] exc = subprocess.check_output(exargs).split() if not '.git' in os.listdir('.'): print('This directory is not managed by git.') sys.exit(0) for root, dirs, files in os.walk('.'): if '.git' in dirs: dirs.remove('.git') tmp = [os.path.join(root, f) for f in files if f not in exc] allfiles += tmp # Gather the files' data using a Pool. p = Pool() filedata = [] for res in p.imap_unordered(filecheck, allfiles): filedata.append(res) p.close() # Sort the data (latest modified first) and print it filedata.sort(key=lambda a: a[2], reverse=True) dfmt = '%Y-%m-%d %H:%M:%S %Z' for name, tag, date in filedata: print('{}|{}|{}'.format(name, tag, time.strftime(dfmt, date))) if __name__ == '__main__': main() 

1 Comment

Thanks @RolandSmith. In truth, I'm currently using a script that is similar to this one. I was hoping there was a way to get that information (file names associated with their last modified date) out of git with fewer calls to git log or ideally one call.
0

Here you go:

git ls-files -z | xargs -0 -n1 -I{} -- git log -1 --format='%at {}' {} 

This works on bash and probably sh.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.