4
\$\begingroup\$

I have created a text-based program that finds the nearest school using your coordinates. The CSV DataFrame is available through LimeWire.

I'd like to know how I can improve it.

import geopy # used to get location from geopy.geocoders import Nominatim from geopy import distance import pandas as pd from pyproj import Transformer import numpy as np try: geolocator = Nominatim(user_agent="Everywhere") # name of app user_input = input("Enter number and name of street/road ") location = geolocator.geocode(user_input) except AttributeError: # skips print('Invalid location') print(user_input) your_location = (location.latitude, location.longitude) try : your_location except NameError: input("Enter number and name of street/road ") except AttributeError: print('Location could not be found') df = pd.read_csv('longitude_and_latitude.csv', encoding= 'latin1') # encoding makes file readable t = Transformer.from_crs(crs_from="27700",crs_to="4326", always_xy=True) # instance of transformer class df['longitude'], df['latitude'] = t.transform((df['Easting'].values), (df['Northing'].values)) # new def FindDistance(): Distance = [] for lon,lat in zip(df['latitude'],df['longitude']): school_cordinates = lon, lat distance_apart = distance.distance(school_cordinates, your_location).miles Distance.append(distance_apart) return Distance df.replace([np.inf, -np.inf], np.nan, inplace=True) # converts infinite vales to Nan df.dropna(subset=["latitude", "longitude"], how="all", inplace=False) # removes the rows/colums missing values from dataframe df = df.dropna() # new dataframe Distance = FindDistance() df['Distance'] = Distance schools = df[['EstablishmentName','latitude','longitude','Distance']] New_order = schools.sort_values(by=["Distance"]) # ascending order print(New_order) 
New contributor
Litcoder is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
\$\endgroup\$
1
  • \$\begingroup\$ The LimeWire link to the .CSV data will expire in a few days. Is that data © Crown copyright and available under the terms of the Open Government 3.0 licence? Please tell us how we may search / download the 24,919 education records you obtained. \$\endgroup\$ Commented 12 hours ago

2 Answers 2

1
\$\begingroup\$

Layout

Move the function to the top after the import lines. Having it in the middle of the code interrupts the natural flow of the code (from a human readability standpoint).

There is some minor inconsistent whitespace usage that can automatically be adjusted using the black program.

Naming

The PEP 8 style guide recommends snake_case for function and variable names.

FindDistance would be find_distance.

Distance would be distance.

Documentation

The PEP 8 style guide recommends adding docstrings for functions and at the top of the code. For example:

def FindDistance(): """ Find the distance to a school. Returns an array of ... """ 

A docstring at the top of the could would:

  • Summarize the purpose of the code
  • Specify the input file name, path, type and expected contents
  • Describe the output

UX

When I run the code, I get a helpful prompt message, instructing me to enter a street address. However, the code dies badly with a very long stack trace, and I have no idea what the problem is. I think the code makes a lot of assumptions that I have things installed or permissions to access things. Perhaps there could be more checking up front.

\$\endgroup\$
1
\$\begingroup\$

CLI

 user_input = input("Enter number and name of street/road ") 

The input() function has its place. But consider accepting an address as an optional command line arg, and only doing an interactive prompt as a fallback.

locale

Though the Review Context and source comments don't mention it, this script and its data appears to be focused on the U.K. Consider explaining that to the user, especially in cases where we error out or find the nearest school is quite far away.

Including a geographic place name hint in the .CSV filename wouldn't hurt.

I see

>>> pp(sorted(df.Region.unique())) ['East Midlands', 'East of England', 'London', 'North East', 'North West', 'South East', 'South West', 'West Midlands', 'Yorkshire and the Humber'] 

notebook confusion

your_location = (location.latitude, location.longitude) try : your_location except NameError: input("Enter number and name of street/road ") 

What?!? Maybe this used to be within a notebook cell?

Assume that both location attributes exist. Unconditionally assigning to your_location is guaranteed to define that name. You then evaluate it for side effects, but NameError cannot possibly trigger here. And even if it did, it seems cruel to interact with the user and then callously discard the input() result.

In this situation, merely mentioning the variable's name won't have the effect of displaying its contents.

comments

Many of your comments just say what the code already said well, and so they may be discarded.

ClassName

def FindDistance(): 

There's two typos in that. First, PEP 8 asks that you put a snake_case underscore in the middle of the name. But worse, that capital F is telling the reader we have a class here, though clearly you're defining a function. Please don't do that. It only leads to some confused maintainer down the road introducing a code defect or taking too long to debug something.

Similarly for each Distance -- you meant to write distance. Also, new_order.

(y, x) vs (x, y)

 for lon,lat in zip(df['latitude'], df['longitude']): 

Are you sure that's what you wanted? I have trouble believing that the rest of the code computes what we think it should compute.

S.I. units

 distance_apart = distance.distance( ... ).miles 

We computed geodesic distance on the WGS84 ellipsoid which has well known SI measurements, and then we convert to... miles? OK. But give us a hint about that, please. Adjust the label, mention "miles" in the narrative output, or something. It was only when I started entering an address of "1 James Holt Ave" that I realized it couldn't possibly be "meters", but at that point I was still guessing "km".

discarded result

df.dropna(subset=["latitude", "longitude"], how="all", inplace=False) 

I don't know what you expected that would accomplish, given that you didn't use df = df.drop... to assign the result back.

head()

print(New_order) 

Rather than display the first and last few rows, consider sticking to just the first few. Those are probably the ones of greatest interest to your user.

\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.