Effective geocoding in Python free of charge

Geocoded addresses Python

In this short text, I would like to share my approach regarding the simple Python geocoding tool. This tool is completely free and I’ve implemented some elements for its usability, which should make it effective. The most important feature of this small tool is no API requirement at all. As mentioned in the title, this way of geocoding is completely chargeless.

We are beginning our task with a small list of addresses, as presented below (Pic. 1).

Excel list of addresses

Pic. 1 List of addresses in Excel (.csv file).

Let’s assume, that name of our file is Addresses4.csv. Next, we can prepare our simple code in the console, which could look like this:

import pandas as pd
from geopy.geocoders import Nominatim

geolocator = Nominatim(timeout=10, user_agent="Krukarius")

def find_location(row):
    place = row['Address']
    location = geolocator.geocode(place)

    if location != None:
    return location.latitude, location.longitude
    else:
    return 0


points = pd.read_csv("Addresses4.csv")

points[['Lat','Lng']] = points.apply(find_location, axis="columns", result_type="expand")

print(points)

and next, it should be explained a bit.

Firstly we are defining our geolocator, which has its own timeout and user agent.  The timeout is usually set as default and means the time (in seconds) to wait for the geocoding service to respond. If the time is too short, the geocoder might throw a timeout error. The value provided in the code overrides the default value. The user_agent option is needed for Nominatim as it requires this value to be set to your application name. The goal is the limitation the number of requests per application, which is about 1 request per second.
Next, we are defining the function, which will be based on a particular row in our .csv file, in which the address list (Pic. 1) is stored. The tool picks up the particular address from the list and geocoded it when everything is correct. If the address is wrong it should return 0.
The basic problem with Nominatim is, that the geocoded addresses come from the OpenStreetMap service. This one, because is self-contributed doesn’t include all of them around the globe like Google Maps does. As a result, the Nominatim geocoder throws a lot of “empty” locations despite their correct presence. We will deal with this problem later as it’s the primary goal of this article. For now, the code looks good and the major product will be the address list equipped with additional columns Lat and Lng created for geolocation purposes (Pic.2).

Nominatim geocoder Python list of codes

Pic. 2 List of geocoded addresses by the Nominatim tool in Python.

Now, the result has been printed out to the console. We can have it in the .csv document too. It can be done by alteration of our code:

points.to_csv('NewAddresses4B.csv')

where we simply add this line. The file will be created in the same directory, where the geocoding tool is located, and will include exactly the same content as reflected in the console (Pic. 3).

Python geocoded addresses in the .csv file

Pic. 3 Geocoded addresses in the newly created .csv file.

Another step, inherent with these locations can be displaying all of them on the map. For this purpose, another library is required – the Folium one.

import folium
import pandas as pd
from geopy.geocoders import Nominatim

Next, after the moment, where our new .csv file is exported we can set up another function, which will place all our geolocated content on the map. Before we do it, we must set up the map view like this:

points_map = folium.Map(location=[51,-1],zoom_start=8)

where we will define both, the location of the center of our map, as well as the initial zoom level. Since the map has been set up, we can take care of our content.

def create_map_markers(row, map_name):
folium.Marker(location=[row['Lat'],row['Lng']]).add_to(map_name)

point_locations = points[points['Address'] != "Not Found"]

point_locations.apply(create_map_markers, map_name=points_map, axis='columns')

points_map.save(outfile='Geolocation_sample_map.html')

In the create_map_markers function, we will define the appearance of our marker. That’s exactly the primary element depicting our results on the map. It must exist at least in a very basic version. Next, important are the locations of our points, which must be always correct, otherwise, they won’t be shown.
When these two elements are ok, we can save our map as a .html file.
Optionally if we wish to have our map always opened automatically, we can add another library – the webbrowser.

import folium
import webbrowser
import pandas as pd
from geopy.geocoders import Nominatim

Next, the .html file must be assigned there.

points_map.save(outfile='Geolocation_sample_map.html')
webbrowser.open('Geolocation_sample_map.html')

Afterward, just wait for a while until your map is opened in the MS Edge tab (Pic. 3).

Folium new map webbrowser open

Pic. 3 New Folium map with geolocated features opened by webbrowser.

Above we can see our map opened under parameters defined in the code (location of the map center and initial zoom level). There are some geocoded addresses marked there already.

Folium map Nominatim geocoded addresses

Pic. 4 Some addresses weren’t geocoded correctly and can be found at the very beginning of the WGS84 coordinate system.

Unfortunately, when zoomed out, some of the points occupy the very beginning of the coordinate system (Lat 0, Lon 0). This is the result of the lack of the address on OpenStreetMap, which was discussed earlier. Eventually, as per the return 0 marked in the code, they fall near the coast of Western Africa (Gulf of Guinea).
This is admittedly the whole essence of standard geocoding by Nominatim with Python. Obviously, as has been told at the very beginning – no API is required, therefore we shouldn’t expect anything impressive.
Looking at it from a different angle – we can find some workaround for it! And that’s why this text is for!

The very first thing we could do is find at least a piece of string, which would allow us to find the address anyway. It surely can be the postcode, which finding isn’t problematic in OpenStreetMaps. Therefore this particular information should be used separately for addresses, which don’t work in a standard way.
The good element, which can be used for this purpose is, for example, the newstr feature, which is able to extract the given number of characters from our string. Because the postcode includes usually a fixed number of characters, this approach can be fairly workable for us. Let’s see our code amended:

def find_location(row):
place = row['Address']
place_data = newstr = place[-8:]
location = geolocator.geocode(place)
location_overall = geolocator.geocode(place_data)

if location != None:
return location.latitude, location.longitude
else:
return location_overall.latitude, location_overall.longitude

where the newstr parameter has been applied in order to extract the last 8 characters from the string. They correspond to the vast majority of UK postcodes. Next, another variable is needed with an analog construction to the first one. This particular variable can supersede the previous location value of 0 and return the estimated address for the location (Pic. 5).

Python Nominatim geocoder list of addressess amended

Pic. 5 List of addresses after alteration. Thanks to just the postcode provided for some of the inputs you wouldn’t know which one was incorrect previously.

The situation has changed significantly because now there is no address geocoded at lat 0, lon 0 anymore. All of them are in their correct or at least estimated locations. We will obviously carry on working on them with estimated locations, as the issue still cannot be left like this. So our next task should be a clear distinction between the correct addresses from the estimated ones. First, we need to distinguish them at the stage of geocoding. This is a prerequisite for categorizing them later on the map.
The easiest approach is using the round function, which will cut down the decimals for the locations generated just by the postcode. They will effectively turn into additional “zeros” at the end, which won’t be taken into account anymore.
In fact, changing the circumstances of our function won’t be a big deal.

if location != None:
      return location.latitude, location.longitude
else:
        return round(location_overall.latitude,5),round(location_overall.longitude,5) 

 

Python geocoder Nominatim list of addresses

Pic. 6 Geocoded addresses by Nominatim tool in Python – there is no visual difference between true and just estimated locations.

However, the issues become quite tricky, when we want to categorize them on the Folium map. We won’t use simple markers anymore as the conditional statement is required. Before we do it, new variables will be needed.  The aim of these variables is to get a number of digits after the decimal point. It can be done by finding the (first) comma (or dot) in our decimal value.

def create_map_markers(row, map_name):
latitude_decimal = str(row['Lat'])[::-1].find('.')
longitude_decimal = str(row['Lng'])[::-1].find('.')

icon_wrong = folium.Icon(color="red",icon="info-sign")
icon_correct = folium.Icon(color="green", icon="ok")


if latitude_decimal == 5 and longitude_decimal == 5:
folium.Marker(location=[row['Lat'],row['Lng']],
icon = icon_wrong).add_to(map_name)
else:
folium.Marker(location=[row['Lat'],row['Lng']],
icon = icon_correct).add_to(map_name)

By the way, some custom icons can be set for our markers. Preferably we could use the green thick for the correct location and the red information alert for just the assessed location. Next, we can include everything in conditional-based Folium markers as shown in the code. The result should be nice distinctions of markers on our Folium map (Pic. 7).

Python geocoder Nominatim list of addresses

Pic. 7 Geocoded addresses by Nominatim tool in Python – there is no visual difference between true and just estimated locations.

Python geocoding Nominatim FOlium map address distinction

Pic. 8 Distinguished addresses visible on the Folium map, where the green icons show the addresses geolocated correctly but the red icons flag up the locations, which should be validated.

Basically, we would say, that the task is completed. On the other hand, there are still things, which could make this geocoder more efficient. The primary element would be the validation of these addresses i.e. with Google Maps.
Before we reach this point, it could be vital to provide additional information for the user, that this certain set of addresses must be validated. We can prepare the tooltip, which will appear when the location has hovered.

if latitude_decimal == 5 and longitude_decimal == 5:
folium.Marker(location=[row['Lat'],row['Lng']],
tooltip="Check address with Google Maps!",
icon = icon_wrong).add_to(map_name)
Python Nominatim geocoder Folium address validation

Pic. 9 An example of a tooltip pointing out the necessity of further address validation.

This is an additional alert for everyone that this particular geolocated address requires further validation, as it has been proceeded only by the postcode. In the UK the postcode coincides with the full address quite often unlike the other countries.
Let’s validate our addresses with Google Maps. For this purpose, we have to create a popup, in which the link will be included. Before this task reaches completion, let’s focus on the popup with some basic content itself, as follows:

if latitude_decimal == 5 and longitude_decimal == 5:
folium.Marker(location=[row['Lat'],row['Lng']],
popup="<a href='#' target='blank'>"+row['Address']+"</a>",
tooltip="Check address with Google Maps!",
icon = icon_wrong).add_to(map_name)
else:
folium.Marker(location=[row['Lat'],row['Lng']],
popup="<a href='#' target='blank'>"+row['Address']+"</a>",
icon = icon_correct).add_to(map_name)

where the popup is created already with the hyperlinked content inside. This content comes from the find_location function and represents our address cell in the .csv file (Pic. 10). The link redirects to nowhere at this time, but opens in a separate tab.

Folium geocoded addresses with popup

Pic. 10 Geocoded addresses on Folium map with popup, which contains the hyperlinked address already. It redirects to nowhere right now but opens in the new tab.

Next, we can take care of the preparation of the correct Google Maps link.
The core link is  https://www.google.com/maps/search/ to which our address should be added next. In this event…

if latitude_decimal == 5 and longitude_decimal == 5:
folium.Marker(location=[row['Lat'],row['Lng']],
popup="<a href='https://www.google.com/maps/search/"+row['Address']+"' target='blank'>"+row['Address']+"</a>",
tooltip="Check address with Google Maps!",
icon = icon_wrong).add_to(map_name)
else:
folium.Marker(location=[row['Lat'],row['Lng']],
popup="<a href='https://www.google.com/maps/search/"+row['Address']+"' target='blank'>"+row['Address']+"</a>",
icon = icon_correct).add_to(map_name)

…just expanding the link is needed, which must include the relevant concatenation. To the generic Google Maps link shown above we need to stick the address, appropriate to our individual geolocation circumstances. In turn, the validation of our address is possible (Pic. 11-12).

Python Nominatim geolocation Folium address validation with Google Maps

Pic. 11 Validation of address geolocated by Nominatim in Python by using Google Maps. The black rounded shape shows the address indications as per Nominatim geocoding for the provided address.

Python Nominatim geolocation Folium address validation with Google Maps2

Pic. 12 Validation of correctly geocoded address by Nominatim in Python. As we can see, the address is shown on the nearest street just side out of the target building.

The inaccuracy of the Nominatim geocoder is expressed usually by locating the given address, not at the building, but side out, at the street nearby. In fact, we can treat this address as correct, but some small discrepancies might occur anyway.
Sometimes, especially when the address string is too complicated (i.e. includes more than 1 dwelling under just one unit) Google Maps won’t show it correctly redirecting you to the geolocated place (Pic. 13).

Python Nominatim geolocation Folium address validation with Google Maps3

Pic. 13 Too detailed an address cannot be found by Google Maps later.

There is also a solution for it. We must replace the slash “\” with the dash “” and everything will be alright. Remember, that it applies just to the first character in our string.

address_correct = row['Address'].replace("/","-",1)

if latitude_decimal == 5 and longitude_decimal == 5:
folium.Marker(location=[row['Lat'],row['Lng']],
popup="<a href='https://www.google.com/maps/search/"+address_correct+"' target='blank'>"+row['Address']+"</a>",
tooltip="Check address with Google Maps!",
icon = icon_wrong).add_to(map_name)
else:
folium.Marker(location=[row['Lat'],row['Lng']],
popup="<a href='https://www.google.com/maps/search/"+address_correct+"' target='blank'>"+row['Address']+"</a>",
icon = icon_correct).add_to(map_name)

The result is nice, as our address is to be found by Google Maps correctly right now (Pic. 14).

Python Nominatim geolocation Folium address validation with Google Maps4

Pic. 14 After replacing the slash symbol with the dash symbol in our string the address can be found via Google Maps easily.

In conclusion, you can see that the Nominatim geocoding tool in Python can be a useful feature despite having difficulties with quite a lot of addresses. This is a nice workaround, which allows you to geolocate the addresses, which unfortunately don’t exist in OpenStreetMap yet. In fact, the geolocation process isn’t ideal but we must remember that no API is required for this tool, so it’s free. I think this approach is best for the Nominatim geocoding functionality, which can be competitive with other geocoding tools not free of charge.

Mariusz Krukar


Links:

  1. https://levelup.gitconnected.com/simple-geocoding-in-python-fb28ee5272e0
  2. https://geopy.readthedocs.io/en/stable/
  3. https://melaniewalsh.github.io/Intro-Cultural-Analytics/07-Mapping/01-Mapping.html
  4. https://python.hotexamples.com/examples/geopy/Nominatim/geocode/python-nominatim-geocode-method-examples.html
  5. https://pypi.org/project/geocoder/
  6. https://www.askpython.com/python/python-geopy-to-find-geocode-of-an-address
  7. https://geopandas.org/en/v0.6.0/geocoding.html
  8. https://linuxhint.com/remove-string-commas-python/
  9. https://bobbyhadz.com/blog/python-get-number-of-digits-after-decimal-point
  10. https://operations.osmfoundation.org/policies/nominatim/
  11. W3schools.com: Python ROUND function

 

Forums:

  1. https://stackoverflow.com/questions/8949252/why-do-i-get-attributeerror-nonetype-object-has-no-attribute-something
  2. https://stackoverflow.com/questions/48217934/geopy-nonetype-object-has-no-attribute-latitude
  3. https://pt.stackoverflow.com/questions/493558/attributeerror-nonetype-object-has-no-attribute-latitude-python
  4. Github.com: Nominatim always return “None” #49
  5. Help.openstreetmap.org: using-nominatim-with-python-geocoder
  6. https://stackoverflow.com/questions/25261396/geopy-error-attributeerror-nonetype-object-has-no-attribute-address
  7. https://www.reddit.com/r/learnpython/comments/hnvs7z/unable_to_obtain_the_coordinates_for_a_particular/
  8. https://stackoverflow.com/questions/48217934/geopy-nonetype-object-has-no-attribute-latitude
  9. https://stackoverflow.com/questions/71961569/python-pandas-geopy-attributeerror-nonetype-object-has-no-attribute-raw-ge
  10. https://stackoverflow.com/questions/455612/limiting-floats-to-two-decimal-points
  11. https://gis.stackexchange.com/questions/293615/user-agent-argument-in-nominatim-in-geopy
  12. https://snyk.io/advisor/python/geopy/functions/geopy.geocoders.Nominatim
  13. https://stackoverflow.com/questions/27914648/geopy-catch-timeout-error

 

My questions:

  1. https://stackoverflow.com/questions/75299652/conditional-based-geolocator-in-python/

 

Youtube:

You may also like...

1 Response

  1. 注册 says:

    Thanks for sharing. I read many of your blog posts, cool, your blog is very good.

Leave a Reply

Your email address will not be published. Required fields are marked *