0

So I'm trying to count anhCrawler and return the number of characters with and without spaces alone with the position of "DEATH STAR" and return it in theReport. I can't get the numbers to count correctly either. Please help!

anhCrawler = """Episode IV, A NEW HOPE. It is a period of civil war. \ Rebel spaceships, striking from a hidden base, have won their first \ victory against the evil Galactic Empire. During the battle, Rebel \ spies managed to steal secret plans to the Empire's ultimate weapon, \ the DEATH STAR, an armored space station with enough power to destroy \ an entire planet. Pursued by the Empire's sinister agents, Princess Leia\ races home aboard her starship, custodian of the stolen plans that can \ save her people and restore freedom to the galaxy.""" theReport = """ This text contains {0} characters ({1} if you ignore spaces). There are approximately {2} words in the text. The phrase DEATH STAR occurs and starts at position {3}. """ def analyzeCrawler(thetext): numchars = 0 nospacechars = 0 numspacechars = 0 anhCrawler = thetext word = anhCrawler.split() for char in word: numchars = word[numchars] if numchars == " ": numspacechars += 1 anhCrawler = re.split(" ", anhCrawler) for char in anhCrawler: nospacechars += 1 numwords = len(anhCrawler) pos = thetext.find("DEATH STAR") char_len = len("DEATH STAR") ds = thetext[261:271] dspos = "[261:271]" return theReport.format(numchars, nospacechars, numwords, dspos) print analyzeCrawler(theReport) 
4
  • 2
    I can't get the numbers to count correctly either. So what are the expected results and what are you getting now? Commented Feb 9, 2015 at 2:57
  • The correct char count is 519 and word count is 86. I don't know the position number. @Marcin Commented Feb 9, 2015 at 2:59
  • 1
    Have you tried with some smaller text? Commented Feb 9, 2015 at 3:00
  • There's a missing space after "Leia", forming an unwanted single word "Leiaraces". That may be meant to confuse us agents of the Empire but we're much too clever and debug-eyed to be fooled!-) Commented Feb 9, 2015 at 3:06

3 Answers 3

2

You're overthinking this problem.

Number of chars in string (returns 520):

len(anhCrawler) 

Number of non-whitespace characters in string (using split as using split automatically removes the whitespace, and join creates a string with no whitespace) (returns 434):

len(''.join(anhCrawler.split())) 

Finding the position of "DEATH STAR" (returns 261):

anhCrawler.find("DEATH STAR") 
Sign up to request clarification or add additional context in comments.

2 Comments

len(anhCrawler.split()) returns the number of words, not the total number of non-whitespace characters.
Interestingly enough, bypassing the construction of a new string before counting characters is about twice as slow (sum(len(word) for word in anhCrawler.split())).
1

Here, you have simplilfied version of your function:

import re def analyzeCrawler2(thetext, text_to_search = "DEATH STAR"): numchars = len(anhCrawler) nospacechars = len(re.sub(r"\s+", "", anhCrawler)) numwords = len(anhCrawler.split()) dspos = anhCrawler.find(text_to_search) return theReport.format(numchars, nospacechars, numwords, dspos) print analyzeCrawler2(theReport) This text contains 520 characters (434 if you ignore spaces). There are approximately 87 words in the text. The phrase DEATH STAR occurs and starts at position 261. 

I think the trick part is to remove white spaces from the string and to calculate the non-space character count. This can be done simply using regular expression. Rest should be self-explanatory.

1 Comment

Thanks, Marcin. This worked perfectly! But can you tell me the significance of "\s+"?
1

First off, you need to indent the code that's inside a function. Second... your code can be simplified to the following:

theReport = """ This text contains {0} characters ({1} if you ignore spaces). There are approximately {2} words in the text. The phrase DEATH STAR is the {3}th word and starts at the {4}th character. """ def analyzeCrawler(thetext): numchars = len(anhCrawler) nospacechars = len(anhCrawler.replace(' ', '')) numwords = len(anhCrawler.split()) word = 'DEATH STAR' wordPosition = anhCrawler.split().index(word) charPosition = anhCrawler.find(word) return theReport.format( numchars, nospacechars, numwords, wordPosition, charPosition ) 

I modified the last two format arguments because it wasn't clear what you meant by dspos, although maybe it's obvious and I'm not seeing it. In any case, I included the word and char position instead. You can determine which one you really meant to include.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.