sel package
sel.sel module
- class sel.sel.SEL(elastic: ~elasticsearch.client.Elasticsearch, conf: ~configparser.ConfigParser = <configparser.ConfigParser object>, log_level=20)[source]
Bases:
object
Simple Elastic Language make ES query easier
- Parameters:
elastic – Elasticsearch connection
conf – Configuration of the query system, default: conf.ini
log_level – Log level to use, default: logging.INFO
- clear_scroll(scroll_id: str) None [source]
Clear scroll even before the end of the cash time to free ES memory
- Parameters:
scroll_id – Scroll id to clear
- Returns:
None
> sel.clear_scroll("cXVlc...")
- delete_documents(index: str, query: dict, undelete: bool = False, deleted_info: Any = None) dict [source]
Delete documents of indexes based on SEL query
- Parameters:
index – Index(es) to delete documents, eg. “foo” or “foo,bar”
query – SEL query (string or object) to match documents to delete. Can also contains “ids” to simplify query
undelete – to unflag documents, default: False
deleted_info – Any information you want in deleted documents
- Returns:
Dictionary action_id, count
> query = {"query": ".id = 1435886281564398679"} # Query String > query = {"query": {"field": ".id", "value": "1435886281564398679"}} # Query Object > query = {"ids": ["1435886281564398679"]} # Ids format > sel.delete_documents("foo", query) {'action': 'delete', 'count': 1}
- download_aggreg(index: str, base_aggreg: dict, query: dict) Generator[dict, None, None] [source]
Return all buckets of one aggregation
Order is not warranty
Keys can be returned multiple times, add all doc_count to have the total
It partion index(es) with base_aggreg to proceed the query aggregation.
- Parameters:
index – Index(es) to process the aggregation, eg. “foo” or “foo,bar”
base_aggreg – Base SEL aggregation to partion the index(es)
query – SEL query (string or object) with one aggregation
- Returns:
Generator of buckets
> base_aggreg = {"field": "date", "interval": "week"} > query = {"aggregations": {"my_aggreg": {"field": ".id"}}} > list(sel.download_aggreg("foo", base_aggreg, query)) [{'key': '1446587002614128796', 'doc_count': 1}, ...]
- generate_query(query: dict, schema: dict = None, index: str = None, no_deleted: bool = True) dict [source]
Generate Elasticsearch query from SEL query
- Parameters:
query – SEL query (string or object)
schema – Will get it back if not given (to avoid multiple requests)
index – Index(es), can be None if schema is given, otherwise eg. “foo” or “foo,bar”
no_deleted – True to filter out deleted documents (if configured to), default: True
- Returns:
Dictionary warns, elastic_query, internal_query, query_data
> query = {"query": ".id = 93428yr9"} # Query String > query = {"query": {"field": ".id", "value": "93428yr9"}} # Query Object > sel.generate_query(query, index="foo", no_deleted=False) { 'warns': [], 'elastic_query': {'query': {'term': {'id': '93428yr9'}}, 'sort': [{'id': {'order': 'desc', 'mode': 'avg'}}]}, 'internal_query': {'query': {'field': '.id', 'value': '93428yr9'}}, 'query_data': {} }
- get_one_document(index: str, doc_id: str) dict [source]
Get one document of an index
- Parameters:
index – Index(es) to get the document, eg. “foo” or “foo,bar”
doc_id – The document id
- Returns:
The whole document
> sel.get_one_document("foo", "1435886281564398679") { '_index': 'test_index', '_type': 'document', '_id': '1435886281564398679', '_score': None, '_source': {...}, 'sort': [] }
- get_schema(index: str) dict [source]
Get must recent schema of given index(es)
- Parameters:
index – Index(es) to get schema(s), eg. “foo” or “foo,bar”
- Returns:
Must recent mapping
> sel.get_schema("foo") {mapping ... }
- list_fields(index: str) List[dict] [source]
List all fields of an index
- Parameters:
index – Index(es), eg. “foo” or “foo,bar”
- Returns:
All found fields’ information
> sel.list_fields("foo") [ { 'field': 'author', 'element': { 'type': 'object', 'properties': { 'follower': {'type': 'integer'}, 'id': {'type': 'string', 'index': 'not_analyzed'}, 'name': {'type': 'string', 'index': 'not_analyzed'} } }, 'path': ['author'], 'str_path': 'author', 'pretty_str_path': '.author', 'nested': None, 'str_nested': None, 'format': None }, ... ]
- list_index(index: str = None) List[dict] [source]
List Elasticsearch indexes
- Parameters:
index – Optional Index(es) to limit, eg. “foo_*” or “foo_*,bar_*”
- Returns:
List of indexes with metadata (if set at creation)
> sel.list_index() [ {'index': 'myindex', 'meta': None, 'creation_date': datetime.datetime(2023, 9, 13, 13, 26, 42, 251000)}, ... ]
- really_delete_documents(index: str, query: dict) int [source]
Really delete documents (not just flag them) from a SEL query
- Parameters:
index – Index(es) to delete documents, eg. “foo” or “foo,bar”
query – SEL query (string or object) to match documents to delete. Can also contains “ids” to simplify query
- Returns:
Number of deleted documents
> query = {"query": ".id = 1435886281564398679"} # Query String > query = {"query": {"field": ".id", "value": "1435886281564398679"}} # Query Object > query = {"ids": ["1435886281564398679"]} # Ids format > sel.really_delete_documents("foo", query) 1 > sel.really_delete_documents("foo", query) 0
- scroll(index: str, query: dict, cash_time: str, scroll_id: str = None) dict [source]
Scroll over documents with a query, can get all documents of index(es). First call without scroll_id will return a scroll_id to use for next requests.
Warning: Don’t forget to clear scroll_id after usage
- Parameters:
index – Index(es) to scroll on, eg. “foo” or “foo,bar”
query – SEL query (string or object) to filter documents
cash_time – Duration of scroll cash between each call
scroll_id – Scroll id to continue scrolling
- Returns:
Dictionary with scroll_id and documents
> sel.scroll("foo", None, "1m") {'scroll_id': 'cXVlc...', 'documents': [{...}, ...]} > sel.scroll("foo", None, "1m", scroll_id="cXVlc...") {'scroll_id': 'cXVlc...', 'documents': [{...}, ...]} > sel.clear_scroll("cXVlc...")
- search(index: str, query: dict, no_deleted: bool = True) dict [source]
Search with SEL query
- Parameters:
index – Index(es) to search on, eg. “foo” or “foo,bar”
query – SEL query (string or object)
no_deleted – True to filter out deleted documents (if configured to), default: True
- Returns:
Dictionary ‘results’ as ES results, ‘warns’ for query system warnings
> query = {"query": ".id = 1435886281564398679"} # Query String > query = {"query": {"field": ".id", "value": "1435886281564398679"}} # Query Object > sel.search("foo", query) { 'results': { 'took': 1, 'timed_out': False, '_shards': {...}, 'hits': { 'total': 1, 'max_score': None, 'hits': [{ '_index': 'foo', '_type': 'document', '_id': '1435886281564398679', '_score': None, '_source': {...}, 'sort': [] }] }, 'aggregations': {} }, 'warnings': [] }
- search_field(index: str, field_path: str) List[dict] [source]
Search for a field into an index
- Parameters:
index – Index(es), eg. “foo” or “foo,bar”
field_path – The field path to search
- Returns:
Potential fields
> sel.search_field("foo", "id") [ { 'field': 'id', 'element': {'type': 'string', 'index': 'not_analyzed'}, 'path': ['author', 'id'], 'str_path': 'author.id', 'pretty_str_path': '.author.id', 'nested': None, 'str_nested': None, 'format': None, 'score': 1.0, 'short_path': ['author', 'id'], 'str_short_path': '.author.id', 'accept_function': ['exists'] }, ... ]
- subfields(index: str, fields_path: List[str], no_empty=True) Generator[dict, None, None] [source]
Get subfields of fields of an index
- Parameters:
index – Index(es), eg. “foo” or “foo,bar”
fields_path – The fields path to get subfields
no_empty – Filter out empty subfields, default: True
- Returns:
All subfields of each given fields
> list(sel.subfields("foo", "media.label")) [ { 'field': 'media.label', 'subfields': ['attribute', 'color', 'model', 'style', 'texture', 'type'] } ]