-1

I have a TXT and CSV file where is login username tries and other information too but I want to count how many times some username have tried in this case I would like to count how many of each word have been used here example: <hostname> = 12, ssh2 = 6, exc.

python script would be perfect

example (critical information have been changed Ip's and stuff):

sshd|XXX.XX.XX.XXX|1587574870|{"matches": ["Apr 22 18:53:46 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 18:53:48 <hostname> sshd[****]: Failed password for invalid user pengjing from XXX.XX.XX.XXX port **** ssh2", "Apr 22 18:55:14 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 18:55:15 <hostname> sshd[****]: Failed password for invalid user git from XXX.XX.XX.XXX port **** ssh2", "Apr 22 18:56:42 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 18:56:44 <hostname> sshd[****]: Failed password for invalid user test from XXX.XX.XX.XXX port **** ssh2", "Apr 22 18:58:14 <hostname> sshd[****]: Failed password for root from XXX.XX.XX.XXX port **** ssh2", "Apr 22 18:59:44 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 18:59:46 <hostname> sshd[****]: Failed password for invalid user za from XXX.XX.XX.XXX port **** ssh2", "Apr 22 19:01:09 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 19:01:10 <hostname> sshd[****]: Failed password for invalid user yw from XXX.XX.XX.XXX port **** ssh2"], "failures": 18, "mlfid": " <hostname> sshd[****]: ", "user": "root", "ip4": "XXX.XX.XX.XXX"}``` 
1

2 Answers 2

0

Append this logic to your code. This will work after file is read. str variable should be replaced with what you have. Also had to process text and remove unnecessary keywords like double quote,square bracket, comma etc. You can add more.

with open('input_file.txt', 'r') as file: str = file.read() # str = """sshd|XXX.XX.XX.XXX|1587574870|{"matches": ["Apr 22 18:53:46 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 18:53:48 <hostname> sshd[****]: Failed password for invalid user pengjing from XXX.XX.XX.XXX port **** ssh2", "Apr 22 18:55:14 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 18:55:15 <hostname> sshd[****]: Failed password for invalid user git from XXX.XX.XX.XXX port **** ssh2", "Apr 22 18:56:42 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 18:56:44 <hostname> sshd[****]: Failed password for invalid user test from XXX.XX.XX.XXX port **** ssh2", "Apr 22 18:58:14 <hostname> sshd[****]: Failed password for root from XXX.XX.XX.XXX port **** ssh2", "Apr 22 18:59:44 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 18:59:46 <hostname> sshd[****]: Failed password for invalid user za from XXX.XX.XX.XXX port **** ssh2", "Apr 22 19:01:09 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 19:01:10 <hostname> sshd[****]: Failed password for invalid user yw from XXX.XX.XX.XXX port **** ssh2"], "failures": 18, "mlfid": " <hostname> sshd[****]: ", "user": "root", "ip4": "XXX.XX.XX.XXX"} """ word_dict = {} for k in str.split(" ") : word_dict[k.replace('"','').replace("]","").replace(",","")] = 0 print(word_dict) # {'sshd|XXX.XX.XX.XXX|1587574870|{matches:': 0, '[Apr': 0, '22': 0, '18:53:46': 0, '<hostname>': 0, 'sshd[****:': 0, 'pam_unix(sshd:auth):': 0, 'authentication': 0, 'failure;': 0, 'logname=': 0, 'uid=0': 0, 'euid=0': 0, 'tty=ssh': 0, 'ruser=': 0, 'rhost=XXX.XX.XX.XXX': 0, 'Apr': 0, '18:53:48': 0, 'Failed': 0, 'password': 0, 'for': 0, 'invalid': 0, 'user': 0, 'pengjing': 0, 'from': 0, 'XXX.XX.XX.XXX': 0, 'port': 0, '****': 0, 'ssh2': 0, '18:55:14': 0, '18:55:15': 0, 'git': 0, '18:56:42': 0, '18:56:44': 0, 'test': 0, '18:58:14': 0, 'root': 0, '18:59:44': 0, '18:59:46': 0, 'za': 0, '19:01:09': 0, '19:01:10': 0, 'yw': 0, 'failures:': 0, '18': 0, 'mlfid:': 0, '': 0, 'user:': 0, 'ip4:': 0, 'XXX.XX.XX.XXX}': 0} for i in word_dict.keys() : counter = 0 for j in str.split(" ") : # print(j) if j.__contains__(i) : counter +=1 word_dict[i] = counter print(word_dict["ssh2"]) # 6 print(word_dict["<hostname>"]) # 12 for k, v in word_dict.items() : print("Word : ", k , " Occurences : ",v) # Word : sshd|XXX.XX.XX.XXX|1587574870|{matches: Occurences : 0 # Word : [Apr Occurences : 0 # Word : 22 Occurences : 22 # Word : 18:53:46 Occurences : 2 # Word : <hostname> Occurences : 24 # Word : sshd[****: Occurences : 0 # Word : pam_unix(sshd:auth): Occurences : 10 # Word : authentication Occurences : 10 # . # . # . 
Sign up to request clarification or add additional context in comments.

5 Comments

The words "ssh2" and "<hostname>" were just examples I don't really know the real words what it's counting because I have a txt file with over 112k lines of text and the example of the data I gave is only one line of whole txt file
@Crazycrafter 227 The dictionary variable will have all the records of word count. I just printed for those 2 words.
Can you make it to take the str variable from a txt file
And it would tell how many on each word is in that file
@Crazycrafter227 I have updated the code to read from file. also there could be chances where text processing might be required. You should append this logic to yours.
0

Here is how you can use the str.count() method:

s = """sshd|XXX.XX.XX.XXX|1587574870|{"matches": ["Apr 22 18:53:46 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 18:53:48 <hostname> sshd[****]: Failed password for invalid user pengjing from XXX.XX.XX.XXX port **** ssh2", "Apr 22 18:55:14 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 18:55:15 <hostname> sshd[****]: Failed password for invalid user git from XXX.XX.XX.XXX port **** ssh2", "Apr 22 18:56:42 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 18:56:44 <hostname> sshd[****]: Failed password for invalid user test from XXX.XX.XX.XXX port **** ssh2", "Apr 22 18:58:14 <hostname> sshd[****]: Failed password for root from XXX.XX.XX.XXX port **** ssh2", "Apr 22 18:59:44 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 18:59:46 <hostname> sshd[****]: Failed password for invalid user za from XXX.XX.XX.XXX port **** ssh2", "Apr 22 19:01:09 <hostname> sshd[****]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=XXX.XX.XX.XXX", "Apr 22 19:01:10 <hostname> sshd[****]: Failed password for invalid user yw from XXX.XX.XX.XXX port **** ssh2"], "failures": 18, "mlfid": " <hostname> sshd[****]: ", "user": "root", "ip4": "XXX.XX.XX.XXX"}""" print(s.count('ssh2')) print(s.count('<hostname>')) 

Output:

6 12 


UPDATE:

from collections import Counter from re import findall with open('file.txt', 'r') as f: print(Counter(findall('(?<=Failed password for invalid user ).*(?= from XXX\.XX\.XX\.XXX port \*\*\*\* ssh2)', f.read()))) 

Output:

Counter({'pengjing': 1, 'git': 1, 'test': 1, 'za': 1, 'yw': 1}) 

7 Comments

the problem is that is a small part of the data and I can't do manual work here because there are over 112k of that length of text and over 112k username tries
@Crazycrafter227 So the TXT file stores all the words that need to counted?
yes the txt file stores every pit of data what has to be counted
@Crazycrafter227 See my update.
can you make it to read external file so it doesn't need to copied
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.