7
\$\begingroup\$

This is from my project codecount I use personally that is very similar to cloc.exe or SLOCCount. The part that I am questioning is where I am calculating when I am in a comment block and have deep nesting, I basically am replacing the sections with blank. I would like to deal with fixed format files at some point (RPGLE, CLP, etc) where the comments need to have characters in a specific position.

Any review/comments/suggestions would be appreciated.

def scanfile(self, filename): """ The heart of the codecounter, Scans a file to identify and collect the metrics based on the classification. """ strblock = None endblock = None inblock = False endcode = False sha256 = hashlib.sha256() if(filename.size == 0): logging.info("Skipping File : " + filename.name) return # Identify language for l in self.langs: if(filename.extension in self.langs[l]["ext"]): filename.lang = l break # Unknown files don't need processed if(filename.lang is None): logging.info("Skipping File : " + filename.name) return # Using the with file opening in order to ensure no GC issues. with open(filename.path + "/" + filename.name, encoding="utf-8", errors='ignore') as fp: for line in fp: sha256.update(line.encode("utf-8")) filename.lines += 1 line = line.strip() identified = False if(line == ""): logging.info(" blak " + str(filename.lines)) filename.blanks += 1 continue if(endcode): filename.comments += 1 continue # Check to see if it is a block or was an opening block # ex1 = "/* */ if x;" = Code, not inblock # ex2 = "*/ if x; /*" = Code, inblock # ex3 = " if x; /*" = Code, inblock # ex4 = "/* */ if x; /* */ .." = Code, not inblock # ex4 = "*/" = Comment, not inblock # ex5 = "/* */" = Comment, not inblock # ex6 = "/*" = Comment, inblock # Two scenarios, # 1 - comments removed, code remains # 2 - Comments removed but block is open if(not inblock): for token in self.langs[filename.lang]["bcomm"]: strblock = token endblock = self.langs[filename.lang]["bcomm"][token] while token in line: # If a block has started then check for an exit if(endblock in line): line = line.replace( line[line.find(strblock): line.find(endblock) + len(endblock)], "", 1) else: line = line.replace( line[line.find(strblock):], "", 1) inblock = True # left open else: # Continue until the block ends... when left open if(endblock in line): inblock = False # End the block line = line.replace( line[:line.find(endblock) + len(endblock)], "").strip() else: line = "" # From the block but no hidden code made it out the back.... if(line is ""): logging.info(" bloc " + str(filename.lines) + line) filename.comments += 1 continue # Check line comment designators for token in self.langs[filename.lang]["comment"]: if(line.startswith(token)): logging.info(" line " + str(filename.lines) + line) filename.comments += 1 identified = True break if(identified): continue # If not a blank or comment it must be code logging.info(" code " + str(filename.lines) + line) filename.code += 1 # Check for the ending of code statements for end in self.langs[filename.lang]["endcode"]: if(line == end): endcode = True # Store the hash of this file for comparison to others logging.info("Total " + " " + str(filename.blanks) + " " + str(filename.comments) + " " + str(filename.code)) filename.sha256 = sha256.digest() 
\$\endgroup\$

2 Answers 2

10
\$\begingroup\$

A couple of points:

In Python there is no need to place parentheses after if. For example,

if(filename.size == 0): 

Can be replaced with

if filename.size == 0: 

In fact, the latter is the preferred style.

Also, using format is preferred to using + to append strings. For example,

logging.info("Total " + " " + str(filename.blanks) + " " + str(filename.comments) + " " + str(filename.code)) 

Can be replaced with:

log_message = 'Total {blanks} {comments} {code}'.format( blanks=filename.blanks, comments=filename.comments, code=filename.code) logging.info(log_message) 

This has a few advantages: more readable, easier to internationalize (one single string token instead of multiple), has descriptive named place holders, and finally, will allow for easier formatting if you decide to add it at some point. For example, you can change {comments} to {comments:05d} to format it as a fixed-width number with width 5 and zeros padded on the left if necessary.

\$\endgroup\$
1
  • 2
    \$\begingroup\$ FWIW, you can also use 'Total {fname.blanks} {fname.comments} {fname.code}'.format(fname=filename). \$\endgroup\$ Commented Apr 16, 2014 at 4:42
7
\$\begingroup\$

Instead of having to update the number of lines each iteration through the file, you could use the enumerate function and only assign the line number once:

with open('test_file.txt', 'r') as f: for line_num, line in enumerate(f): # Insert parsing code here pass # Assign line amount here. A NameError error will be thrown # if this code is run on a file with no lines. If this errors, # assign 0. try: filename.lines = line_num + 1 except NameError: filename.lines = 0 

Also, when creating filepaths, look into using os.path.join. This creates the normalized version of the desired filepath for the current OS:

>>> import os >>> os.path.join('some_path', 'to_my', 'file.txt') 'some_path\\to_my\\file.txt' 
\$\endgroup\$
2
  • \$\begingroup\$ On the enumerate, I thought about that but thought it would be wrong if the file happens to have no lines.? I think your right on the os.path.join, I should switch that, got bit by that once before. Thanks. \$\endgroup\$ Commented Apr 22, 2014 at 15:02
  • \$\begingroup\$ If the code is ran with a file with no lines, line_num will be undefined. I will update accordingly. \$\endgroup\$ Commented Apr 22, 2014 at 15:22

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.