2

I have a catalog I opened in python, which has about 70,000 rows of data (ra, dec coordinates and object name) for various objects. I also have another list of about 15,000 objects of interest, which also appear in the previously mentioned catalog. For each of these 15,000 objects, I would like to see if any other objects in the large 70,000 list have ra, dec coordinates within 10 arcseconds of the object. If this is found to be true, I'd just like to flag the object and move on to the next one. However, this process takes a long time, since the distances are computed between the current object of interest (out of 15,000) 70,000 different times. This would take days! How could I accomplish the same task more efficiently? Below is my current code, where all_objects is a list of all the 15,000 object names of interest and catalog is the previously mentioned table data for 70,000 objects.

from astropy.coordinates import SkyCoord from astropy import units as u for obj_name in all_objects: obj_ind = list(catalog['NAME']).index(obj_name) c1 = SkyCoord(ra=catalog['RA'][obj_ind]*u.deg, dec=catalog['DEC'][obj_ind]*u.deg, frame='fk5') for i in range(len(catalog['NAME'])): if i != obj_ind: # Compute distance between object and other source c2 = SkyCoord(ra=catalog['RA'][i]*u.deg, dec=catalog['DEC'][i]*u.deg, frame='fk5') sep = c1.separation(c2) contamination_flag = False if sep.arcsecond <= 10: contamination_flag = True print('CONTAMINATION FOUND') break 
2

1 Answer 1

2

1 Create your own separation function

This step is really easy once you look at the implementation and ask yourself: "how can I make this faster"

def separation(self, other): from . import Angle from .angle_utilities import angular_separation # I've put that in the code bellow so it is clearer if not self.is_equivalent_frame(other): try: other = other.transform_to(self, merge_attributes=False) except TypeError: raise TypeError('Can only get separation to another SkyCoord ' 'or a coordinate frame with data') lon1 = self.spherical.lon lat1 = self.spherical.lat lon2 = other.spherical.lon lat2 = other.spherical.lat sdlon = np.sin(lon2 - lon1) cdlon = np.cos(lon2 - lon1) slat1 = np.sin(lat1) slat2 = np.sin(lat2) clat1 = np.cos(lat1) clat2 = np.cos(lat2) num1 = clat2 * sdlon num2 = clat1 * slat2 - slat1 * clat2 * cdlon denominator = slat1 * slat2 + clat1 * clat2 * cdlon return Angle(np.arctan2(np.hypot(num1, num2), denominator), unit=u.degree) 

It calculates a lot of cosines and sines, then creates an instance of Angle and converts to degrees then you convert to arc seconds.

You might not want to use Angle, nor do the tests and conversions at the beginning, nor doing the import in the function, nor doing so much variable assignment if you need performance.

The separation function feels a bit heavy to me, it should just take numbers and return a number.

2 Use a quad tree (requires a complete rewrite of your code)

That said, let's look at the complexity of your algorithm, it checks every element against every other element, complexity is O(n**2) (Big O notation). Can we do better...

YES You could use a Quad-tree, worst case complexity of Quad tree is O(N). What that basically means if you're not familiar with Big O is that for 15 000 element, the lookup will be 15 000 times what it is for 1 element instead of 225 000 000 times (15 000 squared)... quite an improvement right... Scipy has a great Quad tree library (I've always used my own).

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.