.. _search_dsl: Search DSL ========== The ``Search`` object --------------------- The ``Search`` object represents the entire search request: * queries * filters * aggregations * k-nearest neighbor searches * sort * pagination * highlighting * suggestions * collapsing * additional parameters * associated client The API is designed to be chainable. With the exception of the aggregations functionality this means that the ``Search`` object is immutable - all changes to the object will result in a shallow copy being created which contains the changes. This means you can safely pass the ``Search`` object to foreign code without fear of it modifying your objects as long as it sticks to the ``Search`` object APIs. You can pass an instance of the low-level `elasticsearch client `_ when instantiating the ``Search`` object: .. code:: python from elasticsearch import Elasticsearch from elasticsearch_dsl import Search client = Elasticsearch() s = Search(using=client) You can also define the client at a later time (for more options see the :ref:`configuration` chapter): .. code:: python s = s.using(client) .. note:: All methods return a *copy* of the object, making it safe to pass to outside code. The API is chainable, allowing you to combine multiple method calls in one statement: .. code:: python s = Search().using(client).query("match", title="python") To send the request to Elasticsearch: .. code:: python response = s.execute() If you just want to iterate over the hits returned by your search you can iterate over the ``Search`` object: .. code:: python for hit in s: print(hit.title) Search results will be cached. Subsequent calls to ``execute`` or trying to iterate over an already executed ``Search`` object will not trigger additional requests being sent to Elasticsearch. To force a request specify ``ignore_cache=True`` when calling ``execute``. For debugging purposes you can serialize the ``Search`` object to a ``dict`` explicitly: .. code:: python print(s.to_dict()) Delete By Query ~~~~~~~~~~~~~~~ You can delete the documents matching a search by calling ``delete`` on the ``Search`` object instead of ``execute`` like this: .. code:: python s = Search(index='i').query("match", title="python") response = s.delete() Queries ~~~~~~~ The library provides classes for all Elasticsearch query types. Pass all the parameters as keyword arguments. The classes accept any keyword arguments, the dsl then takes all arguments passed to the constructor and serializes them as top-level keys in the resulting dictionary (and thus the resulting json being sent to elasticsearch). This means that there is a clear one-to-one mapping between the raw query and its equivalent in the DSL: .. code:: python from elasticsearch_dsl.query import MultiMatch, Match # {"multi_match": {"query": "python django", "fields": ["title", "body"]}} MultiMatch(query='python django', fields=['title', 'body']) # {"match": {"title": {"query": "web framework", "type": "phrase"}}} Match(title={"query": "web framework", "type": "phrase"}) .. note:: In some cases this approach is not possible due to python's restriction on identifiers - for example if your field is called ``@timestamp``. In that case you have to fall back to unpacking a dictionary: ``Range(** {'@timestamp': {'lt': 'now'}})`` You can use the ``Q`` shortcut to construct the instance using a name with parameters or the raw ``dict``: .. code:: python from elasticsearch_dsl import Q Q("multi_match", query='python django', fields=['title', 'body']) Q({"multi_match": {"query": "python django", "fields": ["title", "body"]}}) To add the query to the ``Search`` object, use the ``.query()`` method: .. code:: python q = Q("multi_match", query='python django', fields=['title', 'body']) s = s.query(q) The method also accepts all the parameters as the ``Q`` shortcut: .. code:: python s = s.query("multi_match", query='python django', fields=['title', 'body']) If you already have a query object, or a ``dict`` representing one, you can just override the query used in the ``Search`` object: .. code:: python s.query = Q('bool', must=[Q('match', title='python'), Q('match', body='best')]) Dotted fields ^^^^^^^^^^^^^ Sometimes you want to refer to a field within another field, either as a multi-field (``title.keyword``) or in a structured ``json`` document like ``address.city``. To make it easier, the ``Q`` shortcut (as well as the ``query``, ``filter``, and ``exclude`` methods on ``Search`` class) allows you to use ``__`` (double underscore) in place of a dot in a keyword argument: .. code:: python s = Search() s = s.filter('term', category__keyword='Python') s = s.query('match', address__city='prague') Alternatively you can always fall back to python's kwarg unpacking if you prefer: .. code:: python s = Search() s = s.filter('term', **{'category.keyword': 'Python'}) s = s.query('match', **{'address.city': 'prague'}) Query combination ^^^^^^^^^^^^^^^^^ Query objects can be combined using logical operators: .. code:: python Q("match", title='python') | Q("match", title='django') # {"bool": {"should": [...]}} Q("match", title='python') & Q("match", title='django') # {"bool": {"must": [...]}} ~Q("match", title="python") # {"bool": {"must_not": [...]}} When you call the ``.query()`` method multiple times, the ``&`` operator will be used internally: .. code:: python s = s.query().query() print(s.to_dict()) # {"query": {"bool": {...}}} If you want to have precise control over the query form, use the ``Q`` shortcut to directly construct the combined query: .. code:: python q = Q('bool', must=[Q('match', title='python')], should=[Q(...), Q(...)], minimum_should_match=1 ) s = Search().query(q) Filters ~~~~~~~ If you want to add a query in a `filter context `_ you can use the ``filter()`` method to make things easier: .. code:: python s = Search() s = s.filter('terms', tags=['search', 'python']) Behind the scenes this will produce a ``Bool`` query and place the specified ``terms`` query into its ``filter`` branch, making it equivalent to: .. code:: python s = Search() s = s.query('bool', filter=[Q('terms', tags=['search', 'python'])]) If you want to use the post_filter element for faceted navigation, use the ``.post_filter()`` method. You can also ``exclude()`` items from your query like this: .. code:: python s = Search() s = s.exclude('terms', tags=['search', 'python']) which is shorthand for: ``s = s.query('bool', filter=[~Q('terms', tags=['search', 'python'])])`` Aggregations ~~~~~~~~~~~~ To define an aggregation, you can use the ``A`` shortcut: .. code:: python from elasticsearch_dsl import A A('terms', field='tags') # {"terms": {"field": "tags"}} To nest aggregations, you can use the ``.bucket()``, ``.metric()`` and ``.pipeline()`` methods: .. code:: python a = A('terms', field='category') # {'terms': {'field': 'category'}} a.metric('clicks_per_category', 'sum', field='clicks')\ .bucket('tags_per_category', 'terms', field='tags') # { # 'terms': {'field': 'category'}, # 'aggs': { # 'clicks_per_category': {'sum': {'field': 'clicks'}}, # 'tags_per_category': {'terms': {'field': 'tags'}} # } # } To add aggregations to the ``Search`` object, use the ``.aggs`` property, which acts as a top-level aggregation: .. code:: python s = Search() a = A('terms', field='category') s.aggs.bucket('category_terms', a) # { # 'aggs': { # 'category_terms': { # 'terms': { # 'field': 'category' # } # } # } # } or .. code:: python s = Search() s.aggs.bucket('articles_per_day', 'date_histogram', field='publish_date', interval='day')\ .metric('clicks_per_day', 'sum', field='clicks')\ .pipeline('moving_click_average', 'moving_avg', buckets_path='clicks_per_day')\ .bucket('tags_per_day', 'terms', field='tags') s.to_dict() # { # "aggs": { # "articles_per_day": { # "date_histogram": { "interval": "day", "field": "publish_date" }, # "aggs": { # "clicks_per_day": { "sum": { "field": "clicks" } }, # "moving_click_average": { "moving_avg": { "buckets_path": "clicks_per_day" } }, # "tags_per_day": { "terms": { "field": "tags" } } # } # } # } # } You can access an existing bucket by its name: .. code:: python s = Search() s.aggs.bucket('per_category', 'terms', field='category') s.aggs['per_category'].metric('clicks_per_category', 'sum', field='clicks') s.aggs['per_category'].bucket('tags_per_category', 'terms', field='tags') .. note:: When chaining multiple aggregations, there is a difference between what ``.bucket()`` and ``.metric()`` methods return - ``.bucket()`` returns the newly defined bucket while ``.metric()`` returns its parent bucket to allow further chaining. As opposed to other methods on the ``Search`` objects, defining aggregations is done in-place (does not return a copy). K-Nearest Neighbor Searches ~~~~~~~~~~~~~~~~~~~~~~~~~~~ To issue a kNN search, use the ``.knn()`` method: .. code:: python s = Search() vector = get_embedding("search text") s = s.knn( field="embedding", k=5, num_candidates=10, query_vector=vector ) The ``field``, ``k`` and ``num_candidates`` arguments can be given as positional or keyword arguments and are required. In addition to these, ``query_vector`` or ``query_vector_builder`` must be given as well. The ``.knn()`` method can be invoked multiple times to include multiple kNN searches in the request. Sorting ~~~~~~~ To specify sorting order, use the ``.sort()`` method: .. code:: python s = Search().sort( 'category', '-title', {"lines" : {"order" : "asc", "mode" : "avg"}} ) It accepts positional arguments which can be either strings or dictionaries. String value is a field name, optionally prefixed by the ``-`` sign to specify a descending order. To reset the sorting, just call the method with no arguments: .. code:: python s = s.sort() Pagination ~~~~~~~~~~ To specify the from/size parameters, use the Python slicing API: .. code:: python s = s[10:20] # {"from": 10, "size": 10} s = s[:20] # {"size": 20} s = s[10:] # {"from": 10} s = s[10:20][2:] # {"from": 12, "size": 8} If you want to access all the documents matched by your query you can use the ``scan`` method which uses the scan/scroll elasticsearch API: .. code:: python for hit in s.scan(): print(hit.title) Note that in this case the results won't be sorted. Highlighting ~~~~~~~~~~~~ To set common attributes for highlighting use the ``highlight_options`` method: .. code:: python s = s.highlight_options(order='score') Enabling highlighting for individual fields is done using the ``highlight`` method: .. code:: python s = s.highlight('title') # or, including parameters: s = s.highlight('title', fragment_size=50) The fragments in the response will then be available on each ``Result`` object as ``.meta.highlight.FIELD`` which will contain the list of fragments: .. code:: python response = s.execute() for hit in response: for fragment in hit.meta.highlight.title: print(fragment) Suggestions ~~~~~~~~~~~ To specify a suggest request on your ``Search`` object use the ``suggest`` method: .. code:: python # check for correct spelling s = s.suggest('my_suggestion', 'pyhton', term={'field': 'title'}) The first argument is the name of the suggestions (name under which it will be returned), second is the actual text you wish the suggester to work on and the keyword arguments will be added to the suggest's json as-is which means that it should be one of ``term``, ``phrase`` or ``completion`` to indicate which type of suggester should be used. Collapsing ~~~~~~~~~~ To collapse search results use the ``collapse`` method on your ``Search`` object: .. code:: python s = Search().query("match", message="GET /search") # collapse results by user_id s = s.collapse("user_id") The top hits will only include one result per ``user_id``. You can also expand each collapsed top hit with the ``inner_hits`` parameter, ``max_concurrent_group_searches`` being the number of concurrent requests allowed to retrieve the inner hits per group: .. code:: python inner_hits = {"name": "recent_search", "size": 5, "sort": [{"@timestamp": "desc"}]} s = s.collapse("user_id", inner_hits=inner_hits, max_concurrent_group_searches=4) More Like This Query ~~~~~~~~~~~~~~~~~~~~ To use Elasticsearch's more_like_this functionality, you can use the MoreLikeThis query type. A simple example is below .. code:: python from elasticsearch_dsl.query import MoreLikeThis from elasticsearch_dsl import Search my_text = 'I want to find something similar' s = Search() # We're going to match based only on two fields, in this case text and title s = s.query(MoreLikeThis(like=my_text, fields=['text', 'title'])) # You can also exclude fields from the result to make the response quicker in the normal way s = s.source(exclude=["text"]) response = s.execute() for hit in response: print(hit.title) Extra properties and parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To set extra properties of the search request, use the ``.extra()`` method. This can be used to define keys in the body that cannot be defined via a specific API method like ``explain`` or ``search_after``: .. code:: python s = s.extra(explain=True) To set query parameters, use the ``.params()`` method: .. code:: python s = s.params(routing="42") If you need to limit the fields being returned by elasticsearch, use the ``source()`` method: .. code:: python # only return the selected fields s = s.source(['title', 'body']) # don't return any fields, just the metadata s = s.source(False) # explicitly include/exclude fields s = s.source(includes=["title"], excludes=["user.*"]) # reset the field selection s = s.source(None) Serialization and Deserialization ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The search object can be serialized into a dictionary by using the ``.to_dict()`` method. You can also create a ``Search`` object from a ``dict`` using the ``from_dict`` class method. This will create a new ``Search`` object and populate it using the data from the dict: .. code:: python s = Search.from_dict({"query": {"match": {"title": "python"}}}) If you wish to modify an existing ``Search`` object, overriding it's properties, instead use the ``update_from_dict`` method that alters an instance **in-place**: .. code:: python s = Search(index='i') s.update_from_dict({"query": {"match": {"title": "python"}}, "size": 42}) Response -------- You can execute your search by calling the ``.execute()`` method that will return a ``Response`` object. The ``Response`` object allows you access to any key from the response dictionary via attribute access. It also provides some convenient helpers: .. code:: python response = s.execute() print(response.success()) # True print(response.took) # 12 print(response.hits.total.relation) # eq print(response.hits.total.value) # 142 print(response.suggest.my_suggestions) If you want to inspect the contents of the ``response`` objects, just use its ``to_dict`` method to get access to the raw data for pretty printing. Hits ~~~~ To access to the hits returned by the search, access the ``hits`` property or just iterate over the ``Response`` object: .. code:: python response = s.execute() print('Total %d hits found.' % response.hits.total) for h in response: print(h.title, h.body) .. note:: If you are only seeing partial results (e.g. 10000 or even 10 results), consider using the option ``s.extra(track_total_hits=True)`` to get a full hit count. Result ~~~~~~ The individual hits is wrapped in a convenience class that allows attribute access to the keys in the returned dictionary. All the metadata for the results are accessible via ``meta`` (without the leading ``_``): .. code:: python response = s.execute() h = response.hits[0] print('/%s/%s/%s returned with score %f' % ( h.meta.index, h.meta.doc_type, h.meta.id, h.meta.score)) .. note:: If your document has a field called ``meta`` you have to access it using the get item syntax: ``hit['meta']``. Aggregations ~~~~~~~~~~~~ Aggregations are available through the ``aggregations`` property: .. code:: python for tag in response.aggregations.per_tag.buckets: print(tag.key, tag.max_lines.value) ``MultiSearch`` --------------- If you need to execute multiple searches at the same time you can use the ``MultiSearch`` class which will use the ``_msearch`` API: .. code:: python from elasticsearch_dsl import MultiSearch, Search ms = MultiSearch(index='blogs') ms = ms.add(Search().filter('term', tags='python')) ms = ms.add(Search().filter('term', tags='elasticsearch')) responses = ms.execute() for response in responses: print("Results for query %r." % response._search.query) for hit in response: print(hit.title) ``EmptySearch`` --------------- The ``EmptySearch`` class can be used as a fully compatible version of ``Search`` that will return no results, regardless of any queries configured.