1

I would like to replace sequence of numbers in file with some other sequence number. for example I want the code find :

5723 5724 5725 . . 

in the file and replace it with

1 2 3 . . 

the format of file is like this :

 5723 1 4 0.0530 40.8469574826 23.6497161096 71.2721134368 # hc 5724 1 4 0.0530 41.2184192051 22.0657965663 70.7655969235 # hc 5725 1 4 0.0530 40.1209834536 22.2320441560 72.1100610464 # hc 5726 1 2 0.0390 38.2072673529 21.5636299564 70.4226801302 # ni 5727 1 3 0.0080 39.1491515464 22.7414447024 70.1836001683 # c1 5728 1 4 0.0530 38.6092690356 23.6286807105 70.4379331882 # hc 5729 1 5 -0.1060 39.4744610200 22.9631667398 68.7099315672 # c 5730 1 4 0.0530 39.7733681662 22.0164196098 68.2561710623 # hc 5731 1 4 0.0530 40.3997078786 23.5957910115 68.6602988667 # hc 5732 1 6 -0.1768 37.4127695738 20.7445960448 69.5033013922 # c5 5733 1 7 0.1268 37.5907142 20.8480311755 68.4090824525 # h 

I've written this cod to do this but it just replace the first , how can I correct this code ?

import os import sys import fileinput masir = os.curdir + '\\test\\' input = open('poly-IL9.data', 'r') output = open('out.data', 'w') range1 = range(5722,13193) range2 = range(1,7472) for i in range(len(x1)): for j in range(len(y1)): x = str(range1[i]) y = str(range2[j]) clean = input.read().replace(x,y) output.write(clean) 
2

3 Answers 3

1

First of all open your file with with statement. instead of opening the file without closing.

The with statement is used to wrap the execution of a block with methods defined by a context manager.

Read more about the with statement and its usage advantage.

All you need here is loop over your file and split the lines and replace the first element with the number of line :

with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out : for i,line in enumerate(inp,1): out.write(' '.join([str(i)]+line.split()[1:])+'\n') 

You can use enumerate to loop over your file-object to preserve the indices.

Also as an alternative way you can use csv module for opening the file to refuse of splitting the lines.

import csv with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out: spamreader = csv.reader(csvfile, delimiter=' ') for i,row in enumerate(spamreader): out.write(' '.join([str(i)]+line[1:])+'\n') 

Note if your file is separated with other whitespaces or mix of them you can use re.split() function to split your file with regex :

import re with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out : for i,line in enumerate(inp,1): out.write(' '.join([str(i)]+re.split(r'\s+',line)[1:]+'\n') 
Sign up to request clarification or add additional context in comments.

Comments

0

The read() method in clean = input.read().replace(x,y) is reading the entire file at once, so it makes sense that only one replacement is made. Try readline() or the preferred for line in file:to process the file line by line.

Comments

0

If you want to work on data, you want to consider using Pandas library

And, here's on way to do it in pandas

Read the csv file using pd.read_csv

In [4]: df = pd.read_csv('temp.csv') In [5]: df Out[5]: b c d e f g 5723 1 4 0.0530 40.846957 23.649716 71.272113 5724 1 4 0.0530 41.218419 22.065797 70.765597 5725 1 4 0.0530 40.120983 22.232044 72.110061 5726 1 2 0.0390 38.207267 21.563630 70.422680 5727 1 3 0.0080 39.149152 22.741445 70.183600 5728 1 4 0.0530 38.609269 23.628681 70.437933 5729 1 5 -0.1060 39.474461 22.963167 68.709932 5730 1 4 0.0530 39.773368 22.016420 68.256171 5731 1 4 0.0530 40.399708 23.595791 68.660299 5732 1 6 -0.1768 37.412770 20.744596 69.503301 5733 1 7 0.1268 37.590714 20.848031 68.409082 

Use reset_index(drop=True) to reset the index order. Here the index starts from 0

In [6]: df.reset_index(drop=True) Out[6]: b c d e f g 0 1 4 0.0530 40.846957 23.649716 71.272113 1 1 4 0.0530 41.218419 22.065797 70.765597 2 1 4 0.0530 40.120983 22.232044 72.110061 3 1 2 0.0390 38.207267 21.563630 70.422680 4 1 3 0.0080 39.149152 22.741445 70.183600 5 1 4 0.0530 38.609269 23.628681 70.437933 6 1 5 -0.1060 39.474461 22.963167 68.709932 7 1 4 0.0530 39.773368 22.016420 68.256171 8 1 4 0.0530 40.399708 23.595791 68.660299 9 1 6 -0.1768 37.412770 20.744596 69.503301 10 1 7 0.1268 37.590714 20.848031 68.409082 

You could also construct your unique index starting from 1 like

In [7]: df.set_index(np.arange(1, len(df)+1)) Out[7]: b c d e f g 1 1 4 0.0530 40.846957 23.649716 71.272113 2 1 4 0.0530 41.218419 22.065797 70.765597 3 1 4 0.0530 40.120983 22.232044 72.110061 4 1 2 0.0390 38.207267 21.563630 70.422680 5 1 3 0.0080 39.149152 22.741445 70.183600 6 1 4 0.0530 38.609269 23.628681 70.437933 7 1 5 -0.1060 39.474461 22.963167 68.709932 8 1 4 0.0530 39.773368 22.016420 68.256171 9 1 4 0.0530 40.399708 23.595791 68.660299 10 1 6 -0.1768 37.412770 20.744596 69.503301 11 1 7 0.1268 37.590714 20.848031 68.409082 

Note: There will be simpler ways to just modify the file. However, if you want to process, analyze the data - using pandas will make your life easier.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.