1

I am trying to do a reverse lookup for all the internal IP addresses, to validate the inventory that I have. I am looking to do this via Python. I am thinking of generating a csv file with all the internal IP addresses using the following code-

import ipaddress as ip import pandas as pd file_name='10Dot.csv' a = ip.ip_network('10.0.0.0/8') ip_list = [] for x in a.hosts(): ip_list.append(x.compressed) df=pd.DataFrame({'IP_Address':ip_list}) df.to_csv(file_name, encoding='utf-8', index=False) end = time.time() print(end - start) 

Similarly, I want to generate files for other internal networks. Then using the following function I am trying to go through each of the lines in the generated file to do a reverse lookup-

def reverse_lookup(host): try: lookup=socket.gethostbyaddr(str(host))[0] except: lookup="NA" return lookup 

If I read the csv file line by line it is very slow to get through all the IP addresses. I am trying to use multi-threads to pick chunks of the CSV file and execute the above function line by line. So with the 10.0.0.0/8 network, I have 16,777,214 rows in the file, I am thinking of diving this in 100 parts and generate a final file with host and the looked up value. How do I read the csv file in parallel for the threads and then combine them into a single file?

Also if you have a better approach to solving this problem please do let me know.

5
  • You could've found it easily stackoverflow.com/questions/8424771/… Commented Jun 6, 2018 at 4:01
  • Be careful with the DNS bandwidth, you can DOS your local resolver if you have too many parallel threads. I would look into aiodns if you want to optimize the wall clock time. Commented Jun 6, 2018 at 4:08
  • @tripleee - That's a good point about DOS, since aiodns does the processing asynchronously how would the program work if multiple threads are calling the same function. Commented Jun 6, 2018 at 4:29
  • You don't need threads really if you use async. Threads can call the same function in parallel just fine, though you need to think about the integrity of shared data structures between threads. How exactly you manage the parallelism is unimportant as far as DNS is concerned anyway. Commented Jun 6, 2018 at 4:39
  • As a benchmark, I don't think 100 parallel requests are going to be a problem. I once managed to DOS the company DNS with uncontrolled parallelism (this was with multiprocessing, basically simply xargs -P10000 -n 1 dig <hosts.txt) but with a max of 256 it was fine, though somewhat slow for everyone the duration of the task. Commented Jun 6, 2018 at 4:41

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.