37

How can I search my git logs to see which files have had the most activity?

5
  • Related: stackoverflow.com/questions/1265040/… Commented Apr 14, 2011 at 21:03
  • You can use git diff --stat revA revB to get the sum of all additions removals (but it won't tell you the absolute number of commits that actually touched the file). Commented Apr 14, 2011 at 21:05
  • That link is for a particular author however the one command git log --numstat seems to be in the right direction, but it just spits out the stats for every file in no particular order, but we have like thousands of files. Commented Apr 14, 2011 at 21:06
  • @jason, thanks, the problem is that we need to look over all the commits ever made and see which files either have had the most commits or the most additions/removals total. Commented Apr 14, 2011 at 21:07
  • 2
    Possible duplicate of Finding most changed files in Git Commented Jan 27, 2016 at 15:11

5 Answers 5

58

that's one of these things that is very easy, accidentally (?):

git rev-list --objects --all | awk '$2' | sort -k2 | uniq -cf1 | sort -rn | head 
  1. give me all objects from all revisions in all branches
  2. ignore any results without a path
  3. sort them by path
  4. make them unique (ignoring the blob hash), prefix lines with duplication count
  5. sort descending on duplication count
  6. show topmost lines

Output similar to

 1058 fffcba193374a85fd6a3490f800c6901218a950b src 715 ffffe0f08798e95b66cc4ad4ff22cf10734d045e src/lib 450 ffcfe596031a5985664e35937fff4ac9ff38dcca src/zfs-fuse 367 ffc5d5340f95360fc9f7b739c5593dd3f92fced0 src/lib/libzpool 202 ff92db000792044d45eec21c57a3cd21618631e7 src/lib/libsolkerncompat 183 ff1a44edae3fd121ddd86864b589e5ab2f9ff99b src/lib/libzfscommon 178 fec6b3a789e578983c2242b3aa5adf217cb8b887 src/lib/libzfs 168 ffeefc9e81222d7c471bdb0911d8b98f23cff050 src/cmd 167 fbd60bd3430765863648c52db7ceb3ffa15d5e50 src/lib/libzfscommon/include 155 ff225f6b41f9557d683079c5f9276f497bcb06bd src/lib/libzfscommon/include/sys 

You can take it from here.

E.g. if you wanted to see only file blobs:

git rev-list --objects --all | awk '$2' | sort -k2 | uniq -cf1 | sort -rn | while read frequency sample file do [ "blob" == "$(git cat-file -t $sample)" ] && echo -e "$frequency\t$file"; done 

output:

135 src/zfs-fuse/zfs_operations.c 84 src/zfs-fuse/zfs_ioctl.c 79 src/zfs-fuse/zfs_vnops.c 73 src/lib/libzfs/libzfs_dataset.c 67 src/lib/libzpool/spa.c 66 src/zfs-fuse/zfs_vfsops.c 62 src/cmd/zdb/zdb.c 62 CHANGES 60 src/cmd/ztest/ztest.c 60 src/lib/libzpool/arc.c 

You wanted to see only specifc range of revisions

You can have a ball with the rev-list part:

git rev-list --after=2011-01-01 --until='two weeks ago' \ tag1...remote/hotfix ^master 

Will use only revisions in the specified date range, that are in the symmetric set difference for tag1 and remote/hotfix and are not in master

Sign up to request clarification or add additional context in comments.

2 Comments

Cheers. I had fun writing that down :) Laaaaarge kudos to the gentlemen who designed git in the UNIX filosophy
A great answer, thanks! I'll leave an edit to make it compatible with ZSH, in which using path as a variable can lead to troubles
7

uses git effort [--above <value>] (from git-extras package) to list all files and the number of commit concerned.

You can restrict to a path

Comments

5

I needed something similar recently in a project whose source code was entirely composed of java files. Similar to sehe's answer which I used as the base for this and expanded upon as I wanted to do it in one line without loops. My question was what are the top 5 files that have changed the most?

git rev-list --objects --all | awk '$2 ~ /\.java/' | awk '{print $2}' | sort -k2 | uniq -c | sort -rn | head -n 5 

To break it down:

  1. git rev-list --objects --all: give me all objects from all branches
  2. awk '$2 ~ /.java/': filter out lines where the second argument ($2) does not contain the phrase .java (~ /.java/) with regex
  3. awk '{print $2}': Print the second argument
  4. sort: Sort by path
  5. uniq -c: Make them unique and count number of times each file appears
  6. sort -r: Sort in reverse order
  7. head -n 5: limit result to top 5

Output is

130 richtextfx/src/main/java/org/fxmisc/richtext/GenericStyledArea.java 126 richtextfx/src/main/java/org/fxmisc/richtext/StyledTextArea.java 58 richtextfx/src/main/java/org/fxmisc/richtext/ParagraphText.java 47 richtextfx/src/main/java/org/fxmisc/richtext/EditableStyledDocument.java 43 richtextfx/src/main/java/org/fxmisc/richtext/skin/StyledTextAreaVisual.java 

Comments

1

Here's a python script that you can pipe the log --numstat output through to get the results:

import sys, re res = {} while 1: line = sys.stdin.readline() if len(line) == 0: break; m = re.match("([0-9]+)[ \t]+([0-9]+)[ \t]+(.*)", line) if m != None: f = m.group(3) if f not in res: res[f] = {'add':0, 'rem':0, 'commits':0} res[f]['commits'] += 1 res[f]['add'] += int(m.group(1)) res[f]['rem'] += int(m.group(2)) for f in res: r = res[f] print "%s %s %s %s"%(r['commits'], r['add'], r['rem'], f) 

You can modify it as needed to sort/filter how you want.

Comments

0

Assuming the range of revisions you want to select is <range>, the command:

git log --format=%n --name-only <range>|sort|uniq -c|tail -n +2 

will output for each file of your repository the number of occurences in commit diffs, ie number of changes, including file creation as a change. Keep <range> empty to get statistics from initial commit to your branch HEAD.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.