SpotRM Web Data Security

There can’t be an absolute guarantee that a query posted to a computer system remains secret. However the query is encrypted or encoded, at some point within the system it has to be decrypted so that the software itself can process the query and a system administrator with the skills and deep enough access could in principle intercept or read the query at that point. However, in the case of SpotRM Web that would be quite difficult and the following sections try to explain why we believe that to be the case.

Data transmission to Google

The SpotRM website is only available under https so queries are encrypted between the user and the Google data centre in Europe. Responses are also encrypted so the results obtained, including the images of query structures highlighted with the reactive metabolite alerts, are also secure in transit. Note however that the drug monographs and MiCs are currently served under http and hence it could be possible for a third party to see what monographs and MiCs your alerts led you to consult.

Transmission within Google

There are likely to be some stages in which the query is routed internally within Google from the front-end web-server which maintains the encrypted https connection to the server running the SpotRM application. These stages are potentially visible to an administrator within Google but are not to Awametox’s administrators of SpotRM web.

SpotRM application

The SpotRM application runs as a Docker container on the Google servers. Docker containers don’t have any saved state so no information about usage or query history can be stored persistently. Container images in CloudRun are run up by Google in response to requests so the query and response only exist in the container while it is actively processing requests and are lost when the container exits. See the Google Container Contract for more details: https://cloud.google.com/run/docs/reference/container-contract.

Queries can be submitted in several ways and the processing differs between file-based and other queries:

  1. The query is submitted as a single structure by sketching or as SMILES or InChI, or is a text-based query. In this case the query is assigned to a variable and held in memory only. This variable will go out of scope, and the query information be lost, immediately on the completion of query processing.
  2. The query is uploaded as an SD file or a file of SMILES. In this case the query file is first saved to a temporary area in the container storage and read from there for processing. This information will remain stored until the temporary area is next cleared.

However part of the response for all structure based queries are images of the query molecule(s) highlighted with the alerts found, and these are generated and saved to the same temporary directory as is used to store query files. Hence for all queries there is stored information, either query files or images files, in the temporary directory until the directory is cleared. The temporary directory cannot be cleared immediately query processing finishes because a grace period is required to allow the user to download the result. Instead at the beginning of query processing the temporary area is cleared of any query and image related files with an age > 10 minutes. That results in two possibilities:

  1. There are a steady stream of queries so the container stays alive and the query history is deleted at the start of processing of the first query submitted > 10 minutes after your query.
  2. There are no further queries and after a grace period the container is terminated and deleted at which point any stored query information is lost because the containers storage directories are virtual, held in RAM only, and not on any form of long-term storage device. Google are unspecific about the length of the grace period allowed but it is likely to be substantially less than 10 minutes.

So the outcome is that any data relating to a query would be only available in a short window following the query of a little over 10 minutes and in many cases the period is likely to be substantially less.

Although it is possible in general to log into a running Docker container Google doesn’t allow this for containers run in the CloudRun environment used for SpotRM so Awametox’s SpotRM administrators cannot see into the container and hence cannot access the unencrypted queries being actively processed even if they hit the short window where the information is available. This may however, be possible for an internal Google system administrator.

Finally, although Awametox’s SpotRM administrators can see the logs of queries submitted which show the time, success / failure, size, response time, web-browser and route these don’t include details of the query. A successful structure search query gets logged like this:

2020-09-17T07:56:13.219471Z POST 200 11.69 KB 229 ms Chrome 85 https://spotrm.com/structure_search

So we can see the fact that a query has been submitted and the web-browser used but are unable to access any information about the query itself.