HIGH-PERFORMANCE SEARCHES WITH MAPS IN DRUPAL
Today I come to talk about my SideProject Vabiso.com, which is going to be one year old next month, and how the backend is implemented to support a massive volume of data and show it all on a map.
I'm going to separate this article into the sections that I think are most relevant from the point of view of how to set up a search-focused project.
First of all, I have to make a summary of the project to put you in the situation and explain what the map is for.
Vabiso.com is a job search engine. It automatically indexes job offers from other websites, geolocates them and saves them in the Drupal database.
At the time of indexing, categories are also automatically detected, including remote jobs. Remote jobs are not shown on the map but in a separate listing/search engine.
Old job offers are removed from time to time, as it does not make sense to display expired job offers from months ago.
The trick of this project is to know how to manage the multiple automated imports, to know how to manage the large volume of data and to have a geolocalized search with a high-performance map.
If you have logged into Vabiso.com you will have seen that it has 60k items on the map and 17k items that are not on the map. That's 77k items that are displayed and filtered from different search pages.
That 77k is only the visible jobs, as many are queued for indexing, others that have been unpublished and then removed.
At one time this site had over 200k visible items on the map and about 30K remote jobs.
The maximum volume of data indexed was close to 2 million jobs, many of which were old jobs. At the time, the web code was improved to allow automatic removal of all those old jobs that should not be displayed.
Limit of technology
In a nutshell, there are several ways to send data on a map and be able to perform proximity calculations. It will largely depend on exactly what you do with the data and the map, but it may help to know which types of technologies are going to be better than others.
I list in simplified form the limits of map items that can be displayed by each. The lower limit is an approximation of when you start to notice slowness, the upper limit is that basically, the map is unusable.
- JSON or other formats and have the calculation done by JS: The limit is between 100 and 1000 points on the map.
- PHP: The limit would be between 500 and 1000 points.
- Mysql: It starts to be slow over 10k, it is unfeasible to use it with more than 100k.
- SOLR or ElasticSearch: Honestly they have no limits, the higher the volume the slower the response, but it is still much faster than any of the others.
A couple of things to clarify:
The upper limits I comment above are that the slowness of the system is exponential. I mean that queries are going to take a minimum of 1s, but can easily shoot up to 10 seconds or more.
This depends a lot on what data you show on the map, how you show it and what kind of calculations you do. Calculating proximity is not the same as filtering by zones.
Drupal and MYSQL performance
Drupal, as I guess you know, works with MYSQL.
One of the best modules it has is the views module, which allows you to create lists with filters without needing to know how to program them.
Views by default work with the Drupal database, which means that it works with MYSQL.
The first thing you should know is that geospatial searches in MYSQL are not optimal in performance.
As I told you in the previous point, performing calculations with spatial coordinates and getting the proximity or being able to filter by areas of the map are slow calculations, very slow in MYSQL, especially if you have between 10-100k results.
Search API Module
In Drupal, we have the Search API module, which allows us to have external databases to Drupal. I have used this module in most projects with a high volume of data in which it is necessary to perform searches.
We can have several types of search engines:
MYSQL: it doesn't make sense with geospatial searches, but it can make sense if you only want to perform text searches and have faceted filters.
SOLR: It is one of the longest-running/oldest engines and allows geospatial searches, text searches and faceted filters.
ElasticSearch: Much newer than SOLR, it allows doing the same as SOLR but with different technology.
For Vabiso.com I decided to use ElasticSearch, but it would be feasible to use SOLR. My reasons were:
I have more knowledge of Elastic than not SOLR.
Elastic is more modern and I believe it offers higher performance (smaller server = lower cost)
Several code examples I found on the web used Elastic and not SOLR.
Elastic is more modern and I believe it offers better performance (smaller server = lower cost).
Several code examples I found on the web used Elastic and not SOLR.
But there was a major problem. A year ago, when I created Vabiso, the Drupal module "ElasticSearch Connector" (for Drupal to send data to an ElasticSearch index) had no support for Drupal "geolocation" type fields.
It was my opportunity to learn more about Elastic, and create a contributed module to solve the problem: https://www.drupal.org/project/search_api_geolocation.
It is a module that is still in BETA, but it allows to index geolocation fields and performs geospatial searches.
Tips for maps
There are several things to keep in mind that can improve the performance of a map:
Don't load all points.
It's a no-brainer, but few people think about it.
In Vabiso I am sending Elastic 5 data points that act as filters:
The 4 corners of the map.
The Zoom level of the map
With this Elastic returns me only the points that are within the area that the user is viewing, and thanks to the elastic zoom it can return approximate and grouped locations in terms of precision.
The zoom grouping is very important. In Vabiso you can see that there are clusters (groups of points) larger than others, and when you zoom those clusters are recalculated. These calculations are done by Elastic. So we can have a very low zoom level and have a cluster with 10k points, but really what we receive from elastic are the coordinates of 1 cluster, not 10k points. This takes a lot of work in the frontend at the time of rendering the clusters!
Keep in mind that with this configuration a request is being made to ElasticSearch every time the user moves around the map or every time he zooms. That is, there are many many requests.
If the user is viewing the city of Madrid, it doesn't make sense to load points from cities like Barcelona, Paris or New York on the map. Only points that are visible on the current map should be loaded.
If the user has a map with a high zoom level (see a street), then the point has to be shown with high precision to see the exact location. But if we are at a low zoom level (see all Spain or Europe) then it does not make sense to have high precision (the user will see the same as long as the point is within a few km around).
Do not load all the information at once
It is very typical to have a normal MYSQL view (for example using OpenLayers or Leaflet modules) and load all the information from the Drupal entity of all the points.
If you want a map and clickable points to open a popup with information ... That should be an AJAX call to request the info of that single point where the user has clicked.
It doesn't make sense to make Drupal load the info for 1000 nodes if the user only wants to click twice on two points and see only those 2 nodes.
In Vabiso you can see that I have the results in a sidebar or bottom bar (depends if you look at it with mobile or desktop). Those results only show the first 5, the rest are loaded by ajax. Which is independent of the map and makes the illusion that everything works smooth and fast.
Not using Google maps
There is no performance problem in using Google Maps in your Drupal projects.
But you have to be aware that Google Maps are paid. If you have a lot of visitors, it will cost you a lot of money.
I have been in projects in which monthly were paying hundreds of euros for the use of google maps.
The solution is simple, you can use Leaflet. And if you want a visual style similar to Google then you can use a Google layer without an API key and that is free and unlimited.
It is not the same to use the google layer/layer on top of Leaflet, than using a map with Google Maps libraries.
Vabiso uses just Leaflet + a free Google Layer and visually it doesn't look bad, right?
Many things have to be taken into account when you do things with maps.
And it is not the same to make a map that has to show 10 points to one that has to show 10k points. Performance affects a lot!
If you have any doubt about Drupal, search and maps, you can contact me and we can talk.
Have Any Project in Mind?
If you want to do something in Drupal maybe you can hire me.
Either for consulting, development or maintenance of Drupal websites.