Advanced Search and Aggregation

Advanced Search and Aggregation

Veracity exposes a few extensions to our interfaces that allow you additional options for querying and analyzing data.

Advanced Search

Our query builder interface lets you filter on stacks of almost any property in the official xAPI specification, but it can still be somewhat… limiting.
Find statements where the device-type is “Mobile” and the actor's name is “Harriet Chapman”.
{
    "context.extensions.https://w3id.org/device-type": "Mobile",
    "actor.name": "Harriet Chapman"
}
Our advanced search interface — available as filter commands in VQL and as HTTP requests via API — gives you a more powerful language to express complex queries. For example, if you use the $and operator, then you can have multiple conditions on the same field.
Find statements where the scaled score is between 0.75 and 0.80.
{
    "$and": [
        {
            "result.score.scaled": {
                "$gt": 0.75
            }
        },
        {
            "result.score.scaled": {
                "$lt": 0.80
            }
        }
    ]
}
There's a long list of special query operators you can use, like $in, $gt (greater than), $not… Read the MongoDB Manual for additional information. Using these special query operators, your search can look something like this:
Find statements where the actor's name is either “Harriet Chapman” or “Pearl Drake”, or where the scaled score is greater than 0.75.
{
    "$or": [
        {
            "actor.name": {
                $in: [
                    "Harriet Chapman",
                    "Pearl Drake"
                ]
            }
        },
        {
            "result.score.scaled": {
                "$gt": 0.75
            }
        }
    ]
}
Using Advanced Search, Default Mode (Deprecated)
Warning
This documentation is deprecated! Check the Using Advanced Search, V2 Mode section below for more information. The API described here will work but expect it to change.
The advanced search endpoint is an extension to the xAPI request. Use the normal xAPI endpoint for your LRS and the typical basic authentication keys. The key that you use needs a special permission before it can access the advanced query API. Check the box called Advanced Queries in the Key Create or Edit form. If my LRS were called test lrs, then I would issue a GET request to this address:
https://test-lrs.lrs.io/xapi/statements/search?query= 
The value of the query should be the URL-encoded JSON representation of the query. Here's a query object, and the associated URL:
{
    "$or": [ 
        {
            "actor.name": {
                $in: [
                    "Harriet Chapman",
                    "Pearl Drake"
                ]
            }
        },
        {
            "result.score.scaled": {
                "$gt": 0.75
            }
        }
    ]
}

https://test-lrs.lrs.io/xapi/statements/search?qu
ery=%7B%22%24or%22%3A%5B%7B%22actor.name%22%3A%7B
%22%24in%22%3A%5B%22Harriet%20Chapman%22%2C%22Pea
rl%20Drake%22%5D%7D%7D%2C%7B%22result.score.scale
d%22%3A%7B%22%24gt%22%3A0.75%7D%7D%5D%7D
If you've disabled the Strict API Mode flag in Edit LRS Settings, you can omit the quotations and the URL component encoding.

Using Advanced Search, V2 Mode

The deprecated default search mode uses an expansion algorithm to make forming queries simpler. You'll notice in the section below on aggregation that our internal representation stores the statements as a child of the root document, where the key name includes statement. Thus, an actual query should be:
{
    "statement.actor.name": {
        "$in": [
            "Harriet Chapman",
            "Pearl Drake"
        ]
    }
}
The default search mode does this for you automatically. However, there are some MongoDB queries that cannot be corrected automatically. For this reason, you should use the V2 mode of this API.

You activate V2 mode search by supplying an additional mode parameter in the query string and setting the value to v2:
https://test-lrs.lrs.io/xapi/statements/search?mode=v2&query=
Notes
Search in V2 mode does not support automatically turning strings into regular expressions.
Search in V2 mode supports automatic escaping of extensions. See the section below for more information.

Regex

In default mode, the search API automatically supports regex. That is, when you use a string as the value to test against an object path, we automatically treat that string as a regex. However, this is only the case where the string is a direct comparison.
Find statements with an actor whose name includes “Harriet” or “Pearl”.
{
    "$or": [
        {
            "actor.name": "Harriet"
        },
        {
            "actor.name": "Pearl"
        },
    ]
}
Harriet and Pearl are treated as regexes in this pattern, so you'll get back statements for Tommy Pearlson and Harriet Reid, as well as Harriet Chapman and Pearl Drake.

If you try to make the syntax more compact by changing the $or operator to $in
{
    "actor.name": {
        "$in": [
            "Harriet",
            "Pearl"
        ]
    }
}
Then this pattern won't return any statements! Since it's matching an array — instead of a string — as the value, it won't use regex; so, the strings in the array must match exactly. In this example, include full names to get statements.

You can use regexes to get clever with your queries. The following query matches Harriet, but with any number of r's. It'll return statements where the actor's name is Haiet, Hariet, Harriet, Harrriet, and so on.
{
    "actor.name": "Har*iet"
}

You can read up on the syntax in this JavaScript Regex guide.
Notes
If you use the search API in V2 mode, then you must use the $regex operator to create a regex. See the MongoDB Manual for more information.

Escaping Extensions

The xAPI specification allows for custom fields in several places. Because the specification requires that the names of these extensions be fully qualified URLs, the format can look somewhat odd. The underlying database does not allow key names to include the dot symbol (.), but this symbol is required in a URL. Therefore, statements are “escaped” before they are stored. The dot in an extension is replaced by the character sequence *`*. You may have seen some queries written as:
{
    "statement.context.extensions.http://activitystream*`*ms/attempt": "matchvalue"
}
You can manually escape values in this way if you prefer, but Veracity automatically escapes such queries as necessary. You can write the below query which will behave exactly as the one above but may be easier to understand.
{
    "statement.context.extensions.http://activitystream.ms/attempt": "matchvalue"
}
Note that this applies only to result, context, and object definition extensions. Extension values for context activity definitions are not currently escaped automatically.


Aggregation

The aggregate API is the nuclear option for data queries. Using this feature, you can perform complex analysis over the entire database of xAPI statements without retrieving them from the server. You represent the algorithm you wish to run over the statements and POST it to the server. The analysis is computed within the Veracity Learning database, returning to you only the results.

Because of the possibility for massively complex computations, to ensure fair access for all users, accounts using the shared infrastructure options may be limited in the number of queries and total used processing time. Upgrade to a dedicated infrastructure or an on-premises install for unlimited data aggregation.

The aggregate endpoint is an extension to the xAPI request. Use the normal xAPI endpoint for your LRS, and the typical basic authentication keys. The key that you use needs a special permission before it can access the aggregate API. Check the Advanced Queries box in the Create an Access Key form. If my LRS were called test lrs, then I would issue a POST request to this address:
https://test-lrs.lrs.io/xapi/statements/aggregate
The content-type header of the post must be application/json, and the post body must include the JSON serialization of an aggregation pipeline. An aggregation pipeline is a set of operations for transforming the data into a result. If the Strict API Mode flag in Edit LRS Settings is disabled, then the payload may be the more permissive JSON5 encoding of the pipeline.

Here's an example of an aggregation pipeline that counts the number of statements by verb.id within a date range:
[
    {
        "$match": {
            "$and": [
                {
                   "statement.timestamp": {
                        "$gt": {
                            "$parseDate": {
                                "date": "Tue Mar 20 2017 16:25:40 GMT-0400 (Eastern Daylight Time)"
                            }
                        }
                    }
                },
                {
                    "statement.timestamp": {
                        "$lt": {
                            "$parseDate": {
                                "date": "Tue Mar 27 2018 16:25:40 GMT-0400 (Eastern Daylight Time)"
                            }
                        }
                    }
                }
            ]
        }
    },
    {
        "$group": {
            "_id": "$statement.verb.id",
            "count": {
                "$sum": 1
            }
        }
    }
]

In this (relatively simple) example, we first select all the statements whose timestamp is greater than one date and less than another. That data is then processed, summing up the number of statements with each verb.id. The results of an aggregation call are always JSON arrays. In this case, the result will be:
[
    {
        "_id": "http://adlnet.gov/expapi/verbs/failed",
        "count": 524
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/responded",
        "count": 12798
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/terminated",
        "count": 3588
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/passed",
        "count": 535
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/completed",
        "count": 3588
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/initialized",
        "count": 3588
    }
]

Special Operators

Our aggregate API differs only slightly from MongoDB. Because we expose the API over a web service, it can be tricky to input certain data types that don't parse into JSON nicely. To overcome this, we've added a few additional operators.
  1. $parseDate
  2. $parseNumber
  3. $parseRegex
Each of these commands accepts a child as an object with a special key/value pair. The value will be parsed into a date, number, or regular expression before the aggregation is run. It's important to understand that these are not part of the MongoDB aggregation pipeline — we parse the input using these conventions before sending them to the database.
{
    "$match": {
        "someKey": {
            "$parseDate": {
                "date": "This should be a date string"
            }
        },
        "someKey": {
            "$parseNum": {
                "num": "This should be a number string"
            }
        },
        "someKey": {
            "$parseRegex": {
                "regex": "This should be a regexp string"
            }
        }
    }
}

Correlation

Veracity exposes a special $correlate aggregation pipeline stage that can compute correlations between sets of documents.
{
    "$group": {
        "_id": {
            "course": "$statement.context.contextActivities.grouping.id",
            "actor": "$statement.actor.id"
        },
        "courseActivity": {
            "$sum": 1
        }
    }
},
...
{
    "$correlate": {
        "join": {
            "actor": "$_id.actor"
        },
        "xval": "$courseActivity",
        "yval": "$averageScore",
        "groupId": "$_id.course",
        "pipeline": [
            {
                "$group": {
                    "_id": 1,
                    "averageScore": {
                        "$avg": "$statement.result.score.scaled"
                    }
                }
            }
        ]
    }
}
This $correlate pipeline stage will return a stream of documents where, for each course, the correlation between $courseActivity and $averageScore is given as rho. The result stream would look something like:
[
    {
        "_id": "Course ID 1",
        "rho": 0.1
    },
    {
        "_id": "Course ID 2",
        "rho": 0.9
    }
]
An English description of the algorithm:
For each unique pair of context-grouping ID and actor’s mailbox, count the number of statements, and then call that count “course activity.” For each course activity value, look up all statements where the actor in the statement is the actor for whom we computed the course activity. Sum those statements for each actor and then compute the average score. Call that value “average score”. For each unique course for which we computed course activity, find the correlation between the average score and the course activity.
In other words:
For all courses, what is the correlation between the number of statements for each actor, and that actors average score?
Note that for simplicity, we are actually taking the actors average score over all statements, not just over that course.

The general pattern of the $correlate stage is that it will compute pairs of documents, where the value is joined by the join field. Before the correlation is performed, you can optionally transform the documents with a sub-pipeline. These pairs are grouped up by the value of the groupId field, and the correlation between the xval and yval fields is returned. Thus, you'll get an output document for each unique value of the groupId. The pipeline field allows you to compute a complex query, the result of which is the input to the correlation computations.
  1. join — You can join on either the actor, verb, or object. The fields here represent which value in the input stream identifies the actor ID. If the inputs to this stage are raw xAPI statements, then the value for join.actor should be the object path $statement.actor.id. The same pattern applies for object and verb, so join.verb would be $statement.verb.id and join.object would be $statement.object.id.
  2. xval — The object path in the incoming document stream to correlate against the yval.
  3. yval — The object path in the sub-pipeline results documents to correlate against the xval.
  4. groupId — After pairs are generated by joining the sub-pipeline with the incoming stream, how should the pairs be grouped for correlation? If you expect to output only one correlation, then you can pass in a 1.
  5. pipeline — A query to transform the paired documents. If you wish to correlate against statements directly, then just pass []. This is used so that you can correlate against computed values like averages, instead of simply against statement values.

Restricted Pipeline Stages

While the data in each LRS is completely siloed, and cannot be modified by other LRSs, we still worry that exposing the aggregation pipeline could lead users to break their account, or leak information about internal system configuration. We therefore only accept a limited subset of the MongoDB aggregation pipeline stages. These stages are safe, in that they don't reveal information about the database system internals or modify the state of the data in the system. We only allow the following:
  1. $addFields
  2. $bucketAuto
  3. $bucket
  4. $count
  5. $facet
  6. $geoNear
  7. $graphLookup
  1. $group
  2. $limit
  3. $lookup
  4. $match
  5. $project
  6. $redact
  7. $replaceRoot
  1. $sample
  2. $skip
  3. $sort
  4. $sortByCount
  5. $unwind
For users with on-premises installs or dedicated cloud hosting, contact us to remove these restrictions.

Understanding the Schema

To write aggregation queries, you'll need to understand the format of our representation in the database. Each document in our database includes the original posted statement (modified slightly to make uniform according to the spec). This statement value is exactly what you see in the Statement Viewer page, with a few exceptions. Actors and authority have an additional ID field. This allows us to aggregate over agents that have the same IFI but otherwise have different names. The statement.actor.id is the IFI of the actor, or a hash of the object when that object is a group. Keys that have a dot in the key name (usually extensions), will have the dot characters replaced by the string *`*.

We also keep a set of indexes on the root document to aid in searching. These indexes are:
  1. agent — A string array of the IFIs of all agents in this statement. Usually one entry, but multiple are possible when the agent is a group.
  2. verb — A string array of the verb ID only.
  3. activity — A string array of the IDs of activities that should match this statement.
  4. registration — A string array of the registrations.
  5. relatedAgents — A string array of the list of all agent IFIs that match this statement when the xAPI query includes relatedAgents=true.
  6. relatedActivities — A string array of the list of all activity IDs that match this statement when the xAPI query includes relatedActivities=true.
  7. voided — Is this statement voided (Boolean)?
  8. voids — If so, then this is a string of the ID the statement that this statement voids.
  9. statement — The entire statement as posted, plus the modifications described above.
You can use these fields to access data that can be hard to compute during queries and therefore is generated on-store.


    • Related Articles

    • Veracity Learning Basics

      Veracity Learning is an easy to use, cloud or self-hosted Learning Record Store (LRS). The most essential feature of an LRS is that it receives and processes eXperience API (xAPI) data transactions. xAPI is a new standard for moving around ...
    • Plugins

      Veracity Learning 1.6 and later support plugins! Plugins are collections of JavaScript code that can extend the system to add new capabilities or modify how the system works. Overview A plugin is a collection of JavaScript code that adds or modifies ...
    • LRS Management

      Veracity Learning LRS gives you many tools to manage your LRS and the data in it. These include backup and restore features, import and export features, as well as tools to update, migrate, and modify the data in the LRS. LRS Management Basics Now ...
    • Versions of Veracity Learning

      Veracity Learning LRS comes in a few versions. The main version, and the one you're most likely looking at right now, is LRS.io. LRS.io (SaaS Hosted Plans) This is our cloud hosted SaaS service, where anyone can create an LRS. You can use the free ...
    • Power BI Integration

      Integration with Power BI Integration is accomplished by pulling a saved statement viewer report into PowerBI. Before you can do this, you'll need an API key. Navigate to "All Mangement Tools" then find "Security" Click "API Keys" Click "Create New ...