sel package

sel.sel module

class sel.sel.SEL(elastic: ~elasticsearch.client.Elasticsearch, conf: ~configparser.ConfigParser = <configparser.ConfigParser object>, log_level=20)[source]

Bases: object

Simple Elastic Language make ES query easier

Parameters:

elastic – Elasticsearch connection
conf – Configuration of the query system, default: conf.ini
log_level – Log level to use, default: logging.INFO

clear_scroll(scroll_id: str) → None[source]

Clear scroll even before the end of the cash time to free ES memory

Parameters:: scroll_id – Scroll id to clear
Returns:: None

> sel.clear_scroll("cXVlc...")

delete_documents(index: str, query: dict, undelete: bool = False, deleted_info: Any = None) → dict[source]

Delete documents of indexes based on SEL query

Parameters:

index – Index(es) to delete documents, eg. “foo” or “foo,bar”
query – SEL query (string or object) to match documents to delete. Can also contains “ids” to simplify query
undelete – to unflag documents, default: False
deleted_info – Any information you want in deleted documents

Returns:

Dictionary action_id, count

> query = {"query": ".id = 1435886281564398679"}                       # Query String
> query = {"query": {"field": ".id", "value": "1435886281564398679"}}  # Query Object
> query = {"ids": ["1435886281564398679"]}                             # Ids format

> sel.delete_documents("foo", query)
{'action': 'delete', 'count': 1}

download_aggreg(index: str, base_aggreg: dict, query: dict) → Generator[dict, None, None][source]

Return all buckets of one aggregation

Order is not warranty
Keys can be returned multiple times, add all doc_count to have the total

It partion index(es) with base_aggreg to proceed the query aggregation.

Parameters:

index – Index(es) to process the aggregation, eg. “foo” or “foo,bar”
base_aggreg – Base SEL aggregation to partion the index(es)
query – SEL query (string or object) with one aggregation

Returns:

Generator of buckets

> base_aggreg = {"field": "date", "interval": "week"}
> query = {"aggregations": {"my_aggreg": {"field": ".id"}}}
> list(sel.download_aggreg("foo", base_aggreg, query))
[{'key': '1446587002614128796', 'doc_count': 1}, ...]

generate_query(query: dict, schema: dict = None, index: str = None, no_deleted: bool = True) → dict[source]

Generate Elasticsearch query from SEL query

Parameters:

query – SEL query (string or object)
schema – Will get it back if not given (to avoid multiple requests)
index – Index(es), can be None if schema is given, otherwise eg. “foo” or “foo,bar”
no_deleted – True to filter out deleted documents (if configured to), default: True

Returns:

Dictionary warns, elastic_query, internal_query, query_data

> query = {"query": ".id = 93428yr9"}                       # Query String
> query = {"query": {"field": ".id", "value": "93428yr9"}}  # Query Object

> sel.generate_query(query, index="foo", no_deleted=False)
{
   'warns': [],
   'elastic_query': {'query': {'term': {'id': '93428yr9'}}, 'sort': [{'id': {'order': 'desc', 'mode': 'avg'}}]},
   'internal_query': {'query': {'field': '.id', 'value': '93428yr9'}},
   'query_data': {}
}

get_one_document(index: str, doc_id: str) → dict[source]

Get one document of an index

Parameters:

index – Index(es) to get the document, eg. “foo” or “foo,bar”
doc_id – The document id

Returns:

The whole document

> sel.get_one_document("foo", "1435886281564398679")
{
   '_index': 'test_index',
   '_type': 'document',
   '_id': '1435886281564398679',
   '_score': None,
   '_source': {...},
   'sort': []
}

get_schema(index: str) → dict[source]

Get must recent schema of given index(es)

Parameters:: index – Index(es) to get schema(s), eg. “foo” or “foo,bar”
Returns:: Must recent mapping

> sel.get_schema("foo")
{mapping ... }

list_fields(index: str) → List[dict][source]

List all fields of an index

Parameters:: index – Index(es), eg. “foo” or “foo,bar”
Returns:: All found fields’ information

> sel.list_fields("foo")
[
   {
       'field': 'author',
       'element': {
          'type': 'object',
          'properties': {
             'follower': {'type': 'integer'},
             'id': {'type': 'string', 'index': 'not_analyzed'},
             'name': {'type': 'string', 'index': 'not_analyzed'}
          }
       },
       'path': ['author'],
       'str_path': 'author',
       'pretty_str_path': '.author',
       'nested': None,
       'str_nested': None,
       'format': None
   },
   ...
]

list_index(index: str = None) → List[dict][source]

List Elasticsearch indexes

Parameters:: index – Optional Index(es) to limit, eg. “foo_*” or “foo_*,bar_*”
Returns:: List of indexes with metadata (if set at creation)

> sel.list_index()
[
   {'index': 'myindex', 'meta': None, 'creation_date': datetime.datetime(2023, 9, 13, 13, 26, 42, 251000)},
   ...
]

really_delete_documents(index: str, query: dict) → int[source]

Really delete documents (not just flag them) from a SEL query

Parameters:

index – Index(es) to delete documents, eg. “foo” or “foo,bar”
query – SEL query (string or object) to match documents to delete. Can also contains “ids” to simplify query

Returns:

Number of deleted documents

> query = {"query": ".id = 1435886281564398679"}                       # Query String
> query = {"query": {"field": ".id", "value": "1435886281564398679"}}  # Query Object
> query = {"ids": ["1435886281564398679"]}                             # Ids format

> sel.really_delete_documents("foo", query)
1
> sel.really_delete_documents("foo", query)
0

scroll(index: str, query: dict, cash_time: str, scroll_id: str = None) → dict[source]

Scroll over documents with a query, can get all documents of index(es). First call without scroll_id will return a scroll_id to use for next requests.

Warning: Don’t forget to clear scroll_id after usage

Parameters:

index – Index(es) to scroll on, eg. “foo” or “foo,bar”
query – SEL query (string or object) to filter documents
cash_time – Duration of scroll cash between each call
scroll_id – Scroll id to continue scrolling

Returns:

Dictionary with scroll_id and documents

> sel.scroll("foo", None, "1m")
{'scroll_id': 'cXVlc...', 'documents': [{...}, ...]}

> sel.scroll("foo", None, "1m", scroll_id="cXVlc...")
{'scroll_id': 'cXVlc...', 'documents': [{...}, ...]}

> sel.clear_scroll("cXVlc...")

search(index: str, query: dict, no_deleted: bool = True) → dict[source]

Search with SEL query

Parameters:

index – Index(es) to search on, eg. “foo” or “foo,bar”
query – SEL query (string or object)
no_deleted – True to filter out deleted documents (if configured to), default: True

Returns:

Dictionary ‘results’ as ES results, ‘warns’ for query system warnings

> query = {"query": ".id = 1435886281564398679"}                       # Query String
> query = {"query": {"field": ".id", "value": "1435886281564398679"}}  # Query Object

> sel.search("foo", query)
{
   'results': {
      'took': 1,
      'timed_out': False,
      '_shards': {...},
      'hits': {
         'total': 1,
         'max_score': None,
         'hits': [{
            '_index': 'foo',
            '_type': 'document',
            '_id': '1435886281564398679',
            '_score': None,
            '_source': {...},
            'sort': []
         }]
      },
      'aggregations': {}
   },
   'warnings': []
 }

search_field(index: str, field_path: str) → List[dict][source]

Search for a field into an index

Parameters:

index – Index(es), eg. “foo” or “foo,bar”
field_path – The field path to search

Returns:

Potential fields

> sel.search_field("foo", "id")
[
   {
      'field': 'id',
      'element': {'type': 'string', 'index': 'not_analyzed'},
      'path': ['author', 'id'],
      'str_path': 'author.id',
      'pretty_str_path': '.author.id',
      'nested': None,
      'str_nested': None,
      'format': None,
      'score': 1.0,
      'short_path': ['author', 'id'],
      'str_short_path': '.author.id',
      'accept_function': ['exists']
   },
   ...
]

subfields(index: str, fields_path: List[str], no_empty=True) → Generator[dict, None, None][source]

Get subfields of fields of an index

Parameters:

index – Index(es), eg. “foo” or “foo,bar”
fields_path – The fields path to get subfields
no_empty – Filter out empty subfields, default: True

Returns:

All subfields of each given fields

> list(sel.subfields("foo", "media.label"))
[
   {
      'field': 'media.label',
      'subfields': ['attribute', 'color', 'model', 'style', 'texture', 'type']
   }
]