Queries Documentation
Multiple schema querying
Elasticsearch and SEL support to store and query multiple schema. You can even query several indexes in the same time with different schemas, basically it’s how works wildcard foo_*
.
By default SEL will take last inserted schema to build the query. It will only be an issue if you have the same field with two different types or more into schemas.
Queries’ structure
Query string
Queries string are composted of three parts: filters, aggregations and sorts, ordered as bellow:
{filters} {aggregations} {sorts}
Filters limit the number of hits and sort allowed to order returned hits.
Aggregations does not have any impact on returned documents, but are only made on matched documents. You can do several aggregations in a single query, it will not impact each other and a key allow to differentiate them. Aggregations can also contains a specific filter with the where
keyword, which will not impact the main query neither other aggregations. Of course sort does not have any impact on aggregations.
Sort query order matter and it’s apply in the same order into Elasticsearch. The first sort is applied, and, on equals values, according on this sort, the second sort is applied, etc.
Auto sort
By default there is an auto sort system in SEL, see conf.ini
to disable it.
It take all your filters in the same order to make a sort on the inner DefaultObjectSortField
field (conf.ini
), with the original query as filter where
.
Those generated sorts are put after sorts you have set into your query.
Object
{
"query": (Optional) String or Object,
"aggregations": (Optional) Object - For object query only,
"sort": (Optional) Object List - For object query only,
"meta": (Optinal) Object - Limit hits,
"extended": (Optional) Object - Additional ES query keys
}
Meta
Used for paging, it does not impact aggregations, such:
{
"size": (Optional) Int - default 20
"from": (Optional) Int - default 0
}
Extended
Allowed keys: _source
, fields
, script_fields
, explain
, version
, indices_boost
, min_score
.
See ES Search request body
Examples
Query string
{
"query": "label = bag aggreg: label.model sort: like",
"meta": {"from": 100, "size": 20}
}
Query object
{
"query": {"field": "label", "value": "bag"},
"aggregations": {"myaggreg": {"field": "label.model"}},
"sort": [{"field": "like"}],
"meta": {"from": 100, "size": 20}
}
Field path
To simplify queries, SEL allow to use short path for field path, if it’s ambiguous or if it does not found the field, it will raise an error.
If you have this field path into your schema for eg. media.label.id
, can use directly id
.
But if there are other fields id
into your schema, it will raise an error and you will need to specify which one you would like to use.
If you have for exemple media.label.id
and id
at root, you can match them respectivly with label.id
and .id
.
As you can see it’s possible to use dot .
at the beginning to specify that it’s at the root level.
Filters
🗒️ Note: All examples bellow refer to Schema example
.
String comparison
Equals
🗒️ Note: The default comparator is =
label.color = blue
{"field": "label.color", "value": "blue"}
{"field": "label.color", "comparator": "=", "value": "blue"}
Not Equals
label.color != blue
{"field": "label.color", "comparator": "!=", "value": "blue"}
Spaces and Special characters
label.brand = "foo bar"
{"field": "label.brand", "value": "foo bar"}
label.entity = 'bg:model'
{"field": "label.entity", "value": "bg:model"}
Query String
Query string use ES query string format
label ~ "*pant*"
{"field": "label", "comparator": "~", "value": "*pant*"}
Not match query string
Query string use ES query string format
label !~ "*pant*"
{"field": "label", "comparator": "!~", "value": "*pant*"}
Prefix
label prefix h
{"field": "label", "comparator": "prefix", "value": "h"}
Not prefix
label nprefix h
label not prefix h
{"field": "label", "comparator": "nprefix", "value": "h"}
In
label in [human, person]
{"field": "label", "comparator": "in", "value": ["human", "person"]}
Not in
label nin [human, person]
label not in [human, person]
{"field": "label", "comparator": "nin", "value": ["human", "person"]}
Numerical comparison
Equals
🗒️ Note: The default comparator is =
label.model.score = 0.8
{"field": "label.model.score", "value": 0.8}
{"field": "label.model.score", "comparator": "=", "value": 0.8}
Not Equals
label.model.score != 0.8
{"field": "label.model.score", "comparator": "!=", "value": 0.8}
Greater than or Equals
label.model.score >= 0.8
{"field": "label.model.score", "comparator": ">=", "value": 0.8}
Greater than
label.model.score > 0.8
{"field": "label.model.score", "comparator": ">", "value": 0.8}
Less than or Equals
label.model.score <= 0.8
{"field": "label.model.score", "comparator": "<=", "value": 0.8}
Less than
label.model.score < 0.8
{"field": "label.model.score", "comparator": "<", "value": 0.8}
Range
label.model.score range (> 0.2, < 0.8)
0.2 < label.model.score < 0.8
{"field": "label.model.score", "comparator": "range", "value": {">": 0.2, "<": 0.8}}
Not in range
label.model.score nrange (> 0.2, < 0.8)
label.model.score not range (> 0.2, < 0.8)
{"field": "label.model.score", "comparator": "nrange", "value": {">": 0.2, "<": 0.8}}
not 0.2 < label.model.score < 0.8
{"not": {"field": "label.model.score", "comparator": "range", "value": {">": 0.2, "<": 0.8}}}
Date comparison
Date formats
date >= 2014
{"field": "date", "comparator": ">=", "value": "2014"}
date >= 2014-05
{"field": "date", "comparator": ">=", "value": "2014-05"}
date >= 2014-10-04
{"field": "date", "comparator": ">=", "value": "2014-10-04"}
date >= "2014-10-04 15"
{"field": "date", "comparator": ">=", "value": "2014-10-04 15"}
date >= "2014-10-04 15:42"
{"field": "date", "comparator": ">=", "value": "2014-10-04 15:42"}
date >= "2014-10-04 15:42:10"
{"field": "date", "comparator": ">=", "value": "2014-10-04 15:42:10"}
Range
date range (>= 2018, <= 2019)
2018 <= date <= 2019
{"field": "date", "comparator": "range", "value": {">=": "2018", "<=": "2019"}}
Not in range
date nrange (>= 2018, <= 2019)
date not range (>= 2018, <= 2019)
{"field": "date", "comparator": "nrange", "value": {">=": "2018", "<=": "2019"}}
not 2018 <= date <= 2019
{"not": {"field": "date", "comparator": "nrange", "value": {">=": "2018", "<=": "2019"}}}
Query string
Query string will match the DefaultQueryStringFieldPath
(from conf.ini
) with ES query string format.
"foam cage"
{"query_string": "foam cage"}
The field path can be set with the following syntax:
mycontent ~ "foam cage"
{"field": "mycontent", "comparator": "~", "value": "foam cage"}
AND / OR
By default, query string use OR operator
"foam OR cage"
{"query_string": "foam OR cage"}
AND operator is possible as well
"foam AND cage"
{"query_string": "foam AND cage"}
Grouping
"(foam AND cage) OR (all AND blue)"
{"query_string": "foam OR cage"}
Wildcards
Use ?
to replace a single character, *
to replace zero or more characters.
"foa?"
{"query_string": "foa?"}
"foa*"
{"query_string": "foa*"}
Regular expressions
Regular expression patterns can be embedded in the query string by wrapping them in forward-slashes ("/"
)
"/joh?n(ath[oa]n)/"
{"query_string": "/joh?n(ath[oa]n)/"}
Fuzziness
We can search for terms that are similar to, but not exactly like our search terms, using the “fuzzy” operator
"quikc~"
{"query_string": "quikc~"}
The default edit distance is 2
, but you can edit the distance
"quikc~1"
{"query_string": "quikc~1"}
Exact match
It’s also possible to do exact match query with double quote.
'"foam cage"'
{"query_string": "\"foam cage\""}
Proximity search
While a phrase query (eg "john smith"
) expects all of the terms in exactly the same order, a proximity query allows the specified words to be further apart or in a different order. In the same way that fuzzy queries can specify a maximum edit distance for characters in a word, a proximity search allows us to specify a maximum edit distance of words in a phrase:
'"foam cage"~3'
{"query_string": "\"foam cage\"~3"}
Boosting
Use the boost operator ^
to make one term more relevant than another. For instance, if we want to find all documents about foxes, but we are especially interested in quick foxes:
"quick^2 fox"
{"query_string": "quick^2 fox"}
The default boost
value is 1, but can be any positive floating point number. Boosts between 0 and 1 reduce relevance.
Boosts can also be applied to phrases or to groups:
'"john smith"^2'
{"query_string": "\"john smith\"^2"}
"(foo bar)^4"
{"query_string": "(foo bar)^4"}
Boolean operators
By default, all terms are optional, as long as one term matches. A search for foo bar baz
will find any document that contains one or more of foo
or bar
or baz
. There are also boolean operators which can be used in the query string itself to provide more control.
The preferred operators are +
(this term must be present) and -
(this term must not be present). All other terms are optional. For example, this query:
"quick brown +fox -news"
{"query_string": "quick brown +fox -news"}
states that:
fox
must be presentnews
must not be presentquick
andbrown
are optional — their presence increases the relevance
The familiar operators AND
, OR
and NOT
(also written &&
, ||
and !
) are also supported. However, the effects of these operators can be more complicated than is obvious at first glance. NOT
takes precedence over AND
, which takes precedence over OR
. While the +
and -
only affect the term to the right of the operator, AND
and OR
can affect the terms to the left and right.
Rewriting the above query using AND
, OR
and NOT
demonstrates the complexity:
Functions
Exists
Returns documents that contain at least one value for author.id
.
author.id.exists = true
{"field": "author.id.exists", "value": true}
Missing
Returns documents that only have null values or not any value for this field.
label.color.missing = true
{"field": "label.color.missing", "value": true}
Not
We can use not
keyword to not match a query.
It can be very useful on complexe queries.
not label = bag
{"not": {"field": "label", "value": "bag"}}
not label = bag where color = blue
{"not": {"field": "label", "value": "bag", "where": {"field": "color", "value": "blue"}}}
not (label = bag where color = blue or label = hat where color = red)
{"not": {"operator": "or", "items": [
{"field": "label", "value": "bag", "where": {"field": "color", "value": "blue"}},
{"field": "label", "value": "hat", "where": {"field": "color", "value": "red"}}
]}}
Combinations
Filters can not be just separated by spaces, it need a combination instruction.
And combination
⚠️ Warning: This query return an image containing bag and something in red.
label = bag and label.color = red
{"operator": "and", "items": [
{"field": "label", "value": "bag"},
{"field": "label.color", "value": "red"}
]
Or combination
🗒️ Note: And combination is priority, thus brackets is not necessary.
(label.color = blue and label.color = green) or label.color = yellow
label.color = blue and label.color = green or label.color = yellow
{"operator": "or", "items": [
{"operator": "and", "items": [
{"field": "label.color", "value": "blue"},
{"field": "label.color", "value": "green"}
]},
{"field": "label.color", "value": "yellow"}
]
Where combination
The where
combination is different of the and
combination.
The where syntax allow to make a filter on a specific item of a list, instead of the whole document.
For example the following query return all documents with a bag and something in red, which is not necessary the bag.
label = bag and label.color = red
💡 Note: You can do an infinite number of where
inside a where
, there is no depth limit.
This query will return all red bag
label = bag where color = red
{"field": "label", "value": "bag", "where": {"field": "color", "value": "red"}}
This one will get all red foo bag.
label = bag where (color = red and model = foo)
{"field": "label", "value": "bag", "where": {"operator": "and", "items": [
{"field": "color", "value": "red"},
{"field": "model", "value": "foo"},
]}
But the where syntax in not perfect, because it can only be applied on nested
type object, and not simple object
.
Thus the where syntax automatically selection the deeper nested field to apply the filter, then you can not control it, and the where order can sometimes just not work.
For example the first query will not work but the second will.
attribute = foo where label = bag
label = bag where attribute = foo
Here label and attribute are nested object, and attribute are under label, but label are not under attribute, this is why it does not work.
On not nested object, the both order will work.
label = pants where texture = denim
texture = denim where label = pants
Here texture is a simple object
, then the deeper nested field is label, then the both query will apply on label level.
Context syntax
The context
syntax works in the same way than where
syntax but allow a better control on the nested level. Basically you can chose the nested level instead of let SEL decide for you.
💡 Note: You can do an infinite number of where
inside a where
, there is no depth limit.
For example if we want to select bag or leather of red color.
label where ((label = bag or texture = leather) and color = red)
{"field": "label", "where": {"operator": "and", "items": [
{"operator": "or", "items": [
{"field": "label", "value": "bag"},
{"field": "texture", "value": "leather"}
]}
{"field": "color", "value": "red"}
]}}
Just for the example, even if this is not optimal, we can do this:
label where (label = top and label where (gender = female and color where (color = multicolor)))
{"field": "label", "where": {"operator": "and", "items": [
{"field": "label", "value": "top"},
{"field": "label", "where": {"operator": "and", "items": [
{"field": "gender", "value": "female"},
{"field": "color", "where": {"field": "color", "value": "multicolor"}
]}}
]}}
Sorts
🗒️ Note: All examples bellow refer to Schema example
.
Must be placed at the end of the query string. Numerous sorts can be put, the order is important, from left to right.
For query object, use the key "sort"
as a list.
{"sort": [
{"field": "label"},
{"field": "label.model"},
]}
Auto sort
By default there is an auto sort system in SEL, see conf.ini
to disable it.
Auto sort are made for each fields containing DefaultObjectSortField
, such as:
label = bag # generate following sorts
sort: label desc where label = bag # In order to return best bag first.
🗒️ Note: Your own sorts are put before auto sorts.
Special values
Disabled auto sort: sort: null
Enable auto sort: sort: auto
Random sort: sort: random
💡 Note: A special integer parameter “seed” can control the random: sort: random seed 1
Order
Sort in descending order.
🗒️ Note: It’s the default order, you don’t have to specify it.
sort: label
sort: label desc
{"field": "label"}
{"field": "label", "order": "desc"}
Sort in ascending order
sort: label asc
{"field": "label", "order": "asc"}
Mode
The mode allow to control the manner to build the score for a single document / if there is numerous score present in a single document.
Sort by average of values
🗒️ Note: It’s the default order, you don’t have to specify it.
sort: label mode avg
{"field": "label", "mode": "avg"}
Sort by the minimum of all values in the document.
sort: label mode min
{"field": "label", "mode": "min"}
Sort by the maximum of all values in the document.
sort: label mode max
{"field": "label", "mode": "max"}
Sort by the sum of all values the document.
sort: label mode sum
{"field": "label.model", "mode": "sum"}
Sort by the median of all values in the document.
sort: label mode median
{"field": "label.model", "mode": "median"}
Where filters
Apply filter on sort.
💡 Note: The whole filter syntax is available here.
Get best bag first.
sort: label where (label = bag)
{"field": "label", "where": {"field": "label", "value": "bag"}}
Under
Under syntax is used to apply given nested context for the where
filter.
Get best bags’ attribute first.
sort: label.attribute under label where (label = bag)
{
"field": "label.attribute",
"under": "label",
"where": {"field": "label", "value": "bag"}
}
This sort can not works without under
, because label
field is not contained inside the attribute nested object. See schema.
Aggregations
🗒️ Note: All examples bellow refer to Schema example
.
Aggregations must be at the end of the query string.
Query object
In query object keys are used to find back your aggregations in returned results, it’s useful if you are doing numerous aggregations in a single query.
{"aggregations": {
"my_first_aggreg": {"field": "label"},
"my_second_aggreg": {"type": "histogram", "field": "date"}
}}
Return example
{
"results": {
"aggregations": {
"my_first_aggreg": {
"buckets": [ ... ]
},
"my_second_aggreg": {
"buckets": [ ... ]
}
}
}
}
Aggregation types
Aggreg type is how you want to aggregate values.
aggreg: Return all existing values, and it’s number of occurrence.
histogram: Count occurrence of range of values, for numerical and date. Default
size
is0
.count: Count the number of occurrence of the field, whatever it’s value.
distinct: Count the number of unique values for the field.
min: Return the minimum value of the field, for numerical and date only.
max: Return the maximum value of the field, for numerical and date only.
sum: Return the sum of values of the field, for numerical and date only.
average: Return the average of values of the field, for numerical and date only.
stats: Return numerous stats on the field: min, max, average, count, …, for numerical only.
Examples
Get all values of labels and it’s number of occurrence.
🗒️ Note: In query object type
is aggreg
by default.
aggreg: label
{"field": "date"}
{"type": "aggreg", "field": "date"}
Count posts per day.
💡 Note: aggreg
type on date type field is actually doing an histogram aggregations.
💡 Note: By default histogram size
is set to 0
.
aggreg: date
histogram: date
{"type": "histogram", "field": "date"}
Get all occurrences of labels, whatever it’s value.
count: label
{"type": "count", "field": "label"}
Get number of different labels’ value.
distinct: label
{"type": "distinct", "field": "label"}
Named aggregations
You can named to your aggregations in queries, which is useful when you are doing numerous aggregations in the same query.
aggreg bag_texture: label.texture where label = bag
aggreg shoes_texture: label.texture where label = shoes
{"aggregations": {
"bag_texture": {
"field": "label.texture",
"where": {"field": "label", "value": "bag"}
},
"shoes_texture": {
"field": "label.texture",
"where": {"field": "label", "value": "shoes"}
},
}}
Where filters
Filter aggregations with a query.
💡 Note: The whole filter syntax is available here.
⚠️ Warning: Apply the filter on the nested context of the aggregated field, use under
to change the nested field.
Examples
Limit aggregation results
aggreg: label.color where (color in [red, blue])
{
"field": "label.color",
"where": {"field": "color", "comparator": "in", "value": ["red", "blue"]}
}
Get all labels, excepted dress.
aggreg: label where label != dress
{
"field": "label",
"where": {"field": "label", "comparator": "!=", "value": "dress"}
}
Under
Specify the nested level on which the filter (where keyword) will be applied.
Get attributes of skirt.
aggreg: attribute under label where label = skirt
{
"field": "attribute",
"under": "label",
"where": {"field": "label", "value": "dress"}
}
Limit size
Select the number of top results.
🗒️ Note: Default is Aggregations.DefaultSize
in conf.ini
. A warning will be send if there is more values.
🗒️ Note: For histogram the default size is 0.
aggreg: label size 40
{"field": "label", "size": 40}
Return all results
aggreg: author size 0
{"field": "author", "size": 0}
Histogram interval
Select the date aggregation interval for numerical or date type.
🗒️ Note: Default is Aggregations.DefaultDateInterval
in conf.ini
.
Available expressions for interval: year, quarter, month, week, day, hour, minute, second
aggreg: date interval month
{"field": "date", "interval": "month"}
histogram: date interval year
{"type": "histogram", "field": "date", "interval": "year"}
Advance interval exists, such as:
<number><interval_letter>
2d # 2 days
3h # 3 hours
See Time units
Sub aggregations
Will proceed an aggregation under each bucket of the parent aggregation.
Sub aggregations are named and several sub aggregations can be done on the same parent aggregation.
For example to get all color values of each labels:
aggreg: label subaggreg my_aggreg (aggreg: color size 2)
{
"field": "label",
"subaggreg": {"my_aggreg": {"field": "color"}}
}
To get all colors and textures values for each labels.
aggreg: label subaggreg col (aggreg: color size 2) subaggreg tex (aggreg: texture size 2)
{
"field": "label",
"subaggreg": {
"col": {"field": "color"},
"tex": {"field": "texture"}
}
}
It will return something like:
{
"buckets": [
{
"key": "dress",
"doc_count": 203,
"col": {
"buckets": [
{"key": "black", "doc_count": 88},
{"key": "brown", "doc_count": 80}
]
},
"tex": {
"buckets": [
{"key": "printed", "doc_count": 52},
{"key": "floral", "doc_count": 29}
]
}
},
{
"key": "shoes",
"doc_count": 162,
"col": {"buckets": []},
"tex": {
"buckets": [
{"key": "leather", "doc_count": 3},
{"key": "printed", "doc_count": 2}
]
}
},
...
}
Big Examples
🗒️ Note: All examples bellow refer to Schema example
.
Trend moodboard
trend.entity = "pants:women:denim:straight"
AND .author.geozone = eu
AND 2021-07 <= date <= 2021-12
SORT: date
Fetch all trend signals
label WHERE (
gender = female
AND label.trend.entity = "pattern:plain"
AND label IN ["tops", "outerwear", "pants", "shorts", "dresses", "skirts"]
)
AND 2015-08-10 <= date <= 2021-02-14
AGGREG labels: label WHERE (gender = female AND label.trend.entity = "pattern:plain" AND label IN ["tops", "outerwear", "pants", "shorts", "dresses", "skirts"])
SUBAGGREG raw_signal (AGGREG: date INTERVAL week WHERE author.geozone = eu)
SUBAGGREG mainstream (AGGREG: date INTERVAL week WHERE (follower < 12000 AND author.geozone = eu))
SUBAGGREG trendy (AGGREG: date INTERVAL week WHERE (12000 <= follower < 40000 AND author.geozone = eu))
SUBAGGREG edgy (AGGREG: date INTERVAL week WHERE (follower >= 40000 AND author.geozone = eu))
SUBAGGREG followers_low (AGGREG: date INTERVAL week WHERE (follower < 1350 AND author.geozone = eu))
SUBAGGREG followers_mid (AGGREG: date INTERVAL week WHERE (1350 <= follower < 7000 AND author.geozone = eu))
SUBAGGREG followers_high (AGGREG: date INTERVAL week WHERE (follower >= 7000 AND author.geozone = eu))
SUBAGGREG fashion_forwards (AGGREG: date INTERVAL week WHERE panel_name = fashion_forward)
The heaviest query ever used. First part of the query filter documents and second part return timeseries’ histogram for each type of label.
The label query must be repeat inside the first aggregation level to be sure the aggregation is apply on the trend. For eg. tops
can be in labels but not on the expected trend.
Finally each subaggreg will return a specific timeserie used for trend metrics and forecasting and will be applied on each labels.
This query can be too heavy to apply at once if there is lot of data in ES or if the cluster has not enought computation power and results to timeout. In that case the best is to apply each subaggreg query one by one.
Key Metrics
label.model.entity = "bag:30montaigneleatherflap"
AND panel_category IN ["fashion", "sport"]
AND tag != commercial
AND tag != owned_content WHERE (tag.brand.entity = "br:750")
AND 2020-06-19 <= date <= 2020-07-19
COUNT post: .id
SUM follower: follower
SUM engagement: engagement
Evolution chart
panel_category IN ["fashion", "sport"]
AND tag != commercial
AND tag != owned_content WHERE (tag.brand.entity = "br:750")
AND 2020-06-19 <= date <= 2020-07-19
AGGREG norm: date WHERE (label.model.exists = true)
AGGREG models: date INTERVAL day WHERE (label.model.entity = "bag:30montaigneleatherflap")
Evolution of audience breakdown
label.model.entity = "bag:30montaigneleatherflap"
AND panel_category IN ["fashion", "sport"]
AND tag != commercial
AND tag != owned_content WHERE (tag.brand.entity = "br:750")
AND 2020-06-19 <= date <= 2020-07-19
AGGREG voice: date INTERVAL day WHERE (follower < 1000 OR follower.exists = false)
AGGREG micro: date INTERVAL day WHERE 1000 <= follower < 10000
AGGREG macro: date INTERVAL day WHERE 10000 <= follower < 500000
AGGREG superstar: date INTERVAL day WHERE follower >= 500000
Geographical breakdown
label.model.entity = "bag:30montaigneleatherflap"
AND panel_category IN ["fashion", "sport"]
AND tag != commercial
AND tag != owned_content WHERE (tag.brand.entity = "br:750")
AND 2020-06-19 <= date <= 2020-07-19
COUNT norm: .id
AGGREG models: continent SIZE 0
Commercial content
label.model.entity = "bag:30montaigneleatherflap"
AND panel_category IN ["fashion", "sport"]
AND tag != owned_content
AND 2020-06-19 <= date <= 2020-07-19
COUNT norm: .id
COUNT commercial: .id WHERE tag = commercial
COUNT not_commercial: .id WHERE tag != commercial
Brand mentions
label.model.entity = "bag:30montaigneleatherflap"
AND tag != commercial
AND tag != owned_content WHERE (tag.brand.entity = "br:750")
AND panel_category IN ["fashion", "sport"]
AND 2020-06-19 <= date <= 2020-07-19
COUNT norm: .id
COUNT brand_mention: .id WHERE (tag = brand_mention WHERE (brand.entity = "br:750"))