Update (2013-12-19): I have found a great technique that allows custom queries to be written and accessed through the API, and without having to customise the API at all. It involves creating a simple module then switching its main table for a view. I’ll write this up in a later article, but what it does mean, is that an efficient query on contacts by email address can be done through the API with very little setup. I’ll release an installable module to expose the contacts and their emails as a simple Entry early 2014.
I quite often find myself Googling for the answer to a common problem, then end up finding one of my old posts somewhere with the solution. Here is one example:
The problem is: you want to search for a contact by email, using the SugarCRM SOAP or “REST” API. There are a number of ways to do it, and which you choose will depend on the size of your CRM and the speed of your API connection. If your database is big – lots of contacts – then you will want to avoid sub-queries. If you have a slow connection, then you will want to avoid too many API queries.
So the first method – several queries, but less load on the database:-
The contacts you want to find are all in the Contacts module, and you would fetch them as Contact Entries. An Entry in SugarCRM speak is a single record of something in the database. When you fetch a Contact Entry, it comes with email addresses (email1 being the primary address, and email2 being the second email address). However, the email addresses are not stored on the Contact entity; they are returned as a convenience.
Now, when you request an entity from SugarCRM through the API, you can inject some SQL into the WHERE-clause of the query in order to modify the selection. That SQL gets added to the main query only, that is the query on the Contacts table in this case. What does not happen in the main query is the selection of the email addresses; that happens in a separate query that SugarCRM performs before returning your data. So what can you do, if you cannot inject
WHERE email1 = 'email@example.com'
One solution is to realise that email addresses are stored in a module of their own (EmailAddresses, so can be queried as entries too. You can first select all email addresses that match the one you are looking for, and an injected WHERE-clause works find for that. When these entries are returned, they each come with the Contact ID of the contact they are linked to. Using those contact IDs, is then simple to select all the contacts that those emails link to, using “[WHERE] contact.id IN (<contact id 1>, <contact id 2>, etc)”.
That should only involve two requests to the API, so is not too heavy on resources, and it makes good use of indexes, so is fast.
Update: However…I have not managed to get this to work. You can fetch contacts through the API and ask the API to follow the contacts/email addresses relationship to fetch all email addresses for those contacts. That works. If you try to follow the same relationship in the opposite direction, then it does not work. You can fetch email addresses on their own, but not be able to follow any of the relationships those email addresses have with any other modules. That is a bug in SugarCRM, so far as I can see, and has been there forever. The next solution involving the injection of a sub-query is far from ideal on MySQL. With a CRM of some 30,000 contacts, I am finding the subquery makes a simple mS query stretch to over five seconds, and that is really unacceptable. The only real solution will be to make those relationships from email addresses to contacts followable through the API, or create a new API function that specifically fetches entries that have a matching email address. None of this is trivial, and there does not appear to be any drive from SugarCRM to fix this, otherwise they would have fixed this by now.
The second method can be done in one API call, but can be slow depending on how well MySQL optimises the query. Up to MySQL 5.5 at least, it is not optimised well, but if your dataset is small – you do not have too many contacts – then it is not bad. If you are not using MySQL, then try it out and see how it goes.
Basically, it uses a sub-select to query the email table and select all linking and matching emails to get a list of contact IDs. The outer select on Contacts then has to match any of that list of contact IDs. MySQL can be bad at this, and end up running the sub-select for every row found in the outer select. Other relational database systems are cleverer (and have been for well over a decade) and realise the sub-select only needs to be executed once if it does not reference any column in the outer select. But we have to go with the tools we have, I suppose.
The example below, I use in PHP applications that need the sub-select. Here $filter_by_email has the sub-select that is injected into the WHERE-clause.
$email_escaped = strtoupper(addslashes($email));
$filter_by_email = "
contacts.id IN (
FROM email_addr_bean_rel AS eabr_scauth
INNER JOIN email_addresses AS ea_scauth
ON ea_scauth.deleted = 0
AND eabr_scauth.email_address_id = ea_scauth.id
AND ea_scauth.email_address_caps = '$email_escaped'
WHERE eabr_scauth.deleted = 0
AND eabr_scauth.bean_module = 'Contacts'
AND eabr_scauth.primary_address = 1
The features of this query are:
- I am only looking at primary emails.
- I am only looking at non-deleted emails.
- I am comparing the CAPS version of the email address. This is unlikely to be noticed, but will depend on the way your database is set up (queries may or may not be case-sensitive).
- I’m escaping the email address for quotes.
- I’ve tried to name-space the table aliases a little more, so they are a little less likely to clash with the outer query in future SugarCRM versions.
Now, there are some issues with this, and issues that are not easily solved. They are mainly a consequence of the way SugarCRM has developed over the years. If starting again today, I doubt anyone would take this approach today. The problem is the SQL injection.
The obvious problem is that the remote application needs intimate knowledge of the database structure. That is not good at all. A remote application should be totally isolated from that. However, it is the ONLY way to modify what gets selected. Phil Sturgeon makes it very clear that database results should not be directly exposed to the results of API calls. I would add that database query details should not be modifiable directly through the API. I suspect Phil considered even that to be such a ridiculous approach to even have to mention, but here we are working with what we have been given.
The other main problem is how strings are escaped. We should be using mysql_real_escape_string() to escape any special characters in a string, so it can be quoted to go into the database query. mysql_real_escape_string() looks at the database to check character encoding, so it can make sure things are escaped correctly (for good security reasons). But in this case, where is the database? It is somewhere else, on the other side of the API, so mysql_real_escape_string() has no access to it. I am going to have to find another way around this some day, perhaps writing a version of mysql_real_escape_string() that can be told what encoding is being used locally and on the database.
Of course, the string binding should not be done in the application using the API. That should be done on SugarCRM. We should not have to write SQL, and nothing we send should ever go into an SQL statement without escaping and parsing to make it is not doing any damage. But that is just the way it is, unfortunately.
What I intended to do here, was save my sub-query so I had easy access to it, in case the original post on the SugarCRM forums went missing. As usual, I’ve rambled on far too long. I hope it helps in a deeper understanding of the framework and of the issues. Oh, and how not to write an API.
Update: I have been reading up on query optimisation of sub-queries in MySQL. It seems it has always been poor. The optimiser generally will treat a sub-query as an EXISTS() statement, which means being executed many times. From MySQL 5.6.7 a new optimisation feature can be turned on: materialization. By setting this option “on”, MySQL will recognise independent sub-queries and “materialize” them into temporary tables. This should improve performance enormously, but I have not got a 5.6 instance to test this out yet. Some details of the materialization optimisation can be found here: http://guilhembichot.blogspot.ru/2012/10/cost-based-choice-between-subquery.html