Overview
The LRS can accelerate its dashboard and statement viewer by leveraging
Elasticsearch in addition to MongoDB. When configured with the optional Elasticsearch connection, the LRS will synchronize xAPI statements in real time between these two database products in real time. The system will detect analytics queries that can be fulfilled by Elasticsearch, then translate and dispatch them automatically. This process happens under-the-hood and is invisible to both the user interface and the API. However, the effects will be obvious — on larger data sets, many queries can be hundreds of times faster when accelerated in this way.
To synchronize data with Elasticsearch, the LRS must be supplied a connection string to an Elasticsearch server. We currently support Elasticsearch version 7.x. The LRS will also function when connected to OpenSearch in Elasticsearch 7.17 compatibility mode.
The Elasticsearch integration is required for users who wish to analyze large datasets. If the number of xAPI statements in an LRS is greater than the maxVQLWindowSize setting value, then some analytics queries will not consider the entire dataset. The interaction between maxVQLWindowSize and the given query is complex, and in many cases the results will be correct. However, in this state it's possible that the server will return incomplete results. If you wish to use MongoDB alone to analyze large data sets, then set maxVQLWindowSize to some arbitrarily large number. Also be aware of the globalMongoTimeout setting, which controls the maximum amount of time a query may process.
Configuration
There are several settings relevant to the Elasticsearch connection. As with all settings, these can be supplied as environment variables, command line parameters, or defined in the .env file.
Setting Name |
Discussion
|
elasticSearchServer
|
The connection string to Elasticsearch. This is always a web address, for example, http://192.168.1.2:9200. It should include the port number on which Elasticsearch is listening, unless that port is the default port for the given protocol (80 for HTTP and 443 for HTTPS). It should not include a path. If your Elasticsearch deployment is configured to use authentication, then supply the name and password as HTTP basic authentication parameters in URL form. For example, https://username:password@myserver.org:9200.
|
elasticReconnectTimeout
|
A human interval like 30 seconds. The time after which the server will attempt to reconnect to Elasticsearch when a connection fails to be established. Default: 30 seconds.
|
esIndexExtensions
|
Should xAPI extensions be included in the Elasticsearch mapping? Default: true. Because xAPI allows arbitrary data to be included in extensions, it is possible to generate data that Elasticsearch cannot ingest. This can happen when extensions change the data type at a particular path, such as using a string in some statements, but an integer in others. In these cases, Elasticsearch will drop the data. This can lead to large elasticMissingRatioError values. Eventually, if too much data fails to be stored in this way, then the LRS will assume the Elasticsearch index is "unhealthy" and refuse to use it.
|
elasticTimeout
|
A human interval like 30 seconds. The maximum amount of time the LRS will wait for a response from Elasticsearch. Requests longer than this time will be canceled. Default: 20 seconds. When such a request times out, downstream effects will occur. You may experience dashboard graphs disappear or render the "No data to display" error message. |
elasticMissingRatioError
|
A number between 0 and 1. Defines the ratio of the count of statements between Elasticsearch and MongoDB that is considered normal. Default: 0.99. This means that by default, an Elasticsearch index containing 99% of the MongoDB statements is healthy enough to use. While it may be tempting to set this value to 1.0 for maximum consistency, when under load statements are always stored in MongoDB before Elasticsearch. This can lead to a situation where Elasticsearch never has the same count as MongoDB, because of the ongoing traffic. Lower this ratio if, under high load, you see a notification describing the index as unhealthy.
|
esDynamicExtensions
|
When the LRS encounters xAPI extensions, if esIndexExtensions is true, then how are they treated? Default: true. Use true for regular Elasticsearch dynamic mapping or runtime to use runtime mapping. A discussion of the ramifications of each selection is beyond the scope of this document.
|
esKeywordMaxLength
|
A number. Strings beyond this length will be truncated in the mapping. Default: 256. Increase this value if you have identifiers that are very long. Too short a value can lead to different xAPI objects, agents, or activities to be considered the same when their IDs differ.
|
Permissions
The LRS will create and destroy Elasticsearch indexes as LRS tenants are created and destroyed. This means that the user account the LRS users to access Elasticsearch should have permission to create and delete indexes. The LRS will also query, update, and change the mapping of the indexes it controls.
Connectivity
As of LRS version 1.12.6, the LRS will attempt to reconnect to Elasticsearch if connectivity is interrupted. Normally, this should never occur, and you should endeavor to make sure Elasticsearch is always available. The reconnection feature is intended to mitigate only transient connection problems. If the Elasticsearch server must be shut down and you cannot stop the LRS from accepting new data, then the index will be out of date when connectivity is restored. Once this occurs, the index will have to be resynchronized from MongoDB.
Possible Error States
Issues with the Elasticsearch connection and index will be displayed in warning banners on the LRS home page, or the homepage for a particular tenant.
It looks like this server does not have an Elasticsearch connection. While optional, an Elasticsearch connection is highly recommended.
This message means that the server is not connected to Elasticsearch. It does not mean that no connection is configured — it only means that one is not active. This may mean either none is configured; or, if one is configured, then either that configuration is incorrect or the connection could not be established. Check the server configuration and logs for more information.
The analytics index for this LRS seems to be missing data. If this persists for more than a few minutes, please use the database management page to rebuild the analytics index.
The analytics index for the LRS is unhealthy. This can mean that one does not exist, or that one exists but contains an unexpected amount of data. The index will need to be rebuilt or resynced. In this state, new data will be written to Elasticsearch, but the analytics system will not accelerate queries.
The analytics index was built with a previous version of the LRS and needs to be rebuilt. Please use the database management page to rebuild the analytics index.
The LRS can detect differences between the expected and actual configuration of the Elasticsearch index for a given tenant. This message means that the LRS software was updated or downgraded, or the Elasticsearch index settings were manually changed. Either way, the Elasticsearch index is not in the expected state. It will need to be rebuilt.
Rebuilding the Index
Because the LRS stores data primarily in MongoDB, we can rebuild the Elasticsearch database from the information stored in MongoDB. Various error conditions may require you to rebuild the index or resync the data. Each of the two options below will launch a background processing job on the LRS that will complete asynchronously. These jobs can last from seconds to hours depending on the data sizes. Expect the process to take, generally, one second for every thousand xAPI statements. You can capture new data while this process runs but note that significant server resources will be required. Plan to run this process during times of low load. Each of these processes can be launched from All Management Tools > Database Management in each tenant.
Rebuild Analytics Index
This process will completely rebuild the analytics index for the given tenant. If an index exists, then it will be destroyed and recreated. If no index exists, then a new one will be created and configured. Use this when connecting to Elasticsearch for the first time, or if the existing index is built with an incompatible version of the LRS.
Resync Analytics Index
The LRS keeps track in MongoDB whether or not a statement was successfully inserted into Elasticsearch. This process will insert into Elasticsearch only statements that have not been successfully inserted before. Note that we do not compare actual IDs — if you manually remove data from Elasticsearch, then the LRS will be unaware of exactly what statements are missing, and a full rebuild will be required. Use this process to fix the index after an intermittent connection issue.