Search DSL¶
The Search
object¶
The Search
object represents the entire search request:
- queries
- filters
- aggregations
- sort
- pagination
- additional parameters
- associated client
The API is designed to be chainable. With the exception of the
aggregations functionality this means that the Search
object is immutable -
all changes to the object will result in a copy being created which contains
the changes. This means you can safely pass the Search
object to foreign
code without fear of it modifying your objects.
You can pass an instance of the low-level elasticsearch client when
instantiating the Search
object:
from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search
client = Elasticsearch()
s = Search(using=client)
You can also define the client at a later time (for more options see the ~:ref:connections chapter):
s = s.using(client)
Note
All methods return a copy of the object, making it safe to pass to outside code.
The API is chainable, allowing you to combine multiple method calls in one statement:
s = Search().using(client).query("match", title="python")
To send the request to Elasticsearch:
response = s.execute()
If you just want to iterate over the hits returned by your search you can
iterate over the Search
object:
for hit in s:
print(hit.title)
Search results will be cached. Subsequent calls to execute
or trying to
iterate over an already executed Search
object will not trigger additional
requests being sent to Elasticsearch. To force a request specify
ignore_cache=True
when calling execute
.
For debugging purposes you can serialize the Search
object to a dict
explicitly:
print(s.to_dict())
Delete By Query¶
You can delete the documents matching a search by calling delete
on the Search
object instead of
execute
like this:
s = Search().query("match", title="python")
response = s.delete()
Queries¶
The library provides classes for all Elasticsearch query types. Pass all the parameters as keyword arguments. The classes accept any keyword arguments, the dsl then takes all arguments passed to the constructor and serializes them as top-level keys in the resulting dictionary (and thus the resulting json being sent to elasticsearch). This means that there is a clear one-to-one mapping between the raw query and its equivalent in the DSL:
from elasticsearch_dsl.query import MultiMatch, Match
# {"multi_match": {"query": "python django", "fields": ["title", "body"]}}
MultiMatch(query='python django', fields=['title', 'body'])
# {"match": {"title": {"query": "web framework", "type": "phrase"}}}
Match(title={"query": "web framework", "type": "phrase"})
Note
In some cases this approach is not possible due to python’s restriction on
identifiers - for example if your field is called @timestamp
. In that
case you have to fall back to unpacking a dictionary: Range(**
{'@timestamp': {'lt': 'now'}})
You can use the Q
shortcut to construct the instance using a name with
parameters or the raw dict
:
Q("multi_match", query='python django', fields=['title', 'body'])
Q({"multi_match": {"query": "python django", "fields": ["title", "body"]}})
To add the query to the Search
object, use the .query()
method:
q = Q("multi_match", query='python django', fields=['title', 'body'])
s = s.query(q)
The method also accepts all the parameters as the Q
shortcut:
s = s.query("multi_match", query='python django', fields=['title', 'body'])
If you already have a query object, or a dict
representing one, you can
just override the query used in the Search
object:
s.query = Q('bool', must=[Q('match', title='python'), Q('match', body='best')])
Query combination¶
Query objects can be combined using logical operators:
Q("match", title='python') | Q("match", title='django')
# {"bool": {"should": [...]}}
Q("match", title='python') & Q("match", title='django')
# {"bool": {"must": [...]}}
~Q("match", title="python")
# {"bool": {"must_not": [...]}}
When you call the .query()
method multiple times, the &
operator will
be used internally:
s = s.query().query()
print(s.to_dict())
# {"query": {"bool": {...}}}
If you want to have precise control over the query form, use the Q
shortcut
to directly construct the combined query:
q = Q('bool',
must=[Q('match', title='python')],
should=[Q(...), Q(...)],
minimum_should_match=1
)
s = Search().query(q)
Filters¶
If you want to add a query in a filter context
you can use the filter()
method to make things easier:
s = Search()
s = s.filter('terms', tags=['search', 'python'])
Behind the scenes this will produce a Bool
query and place the specified
terms
query into its filter
branch, making it equivalent to:
s = Search()
s = s.query('bool', filter=[Q('terms', tags=['search', 'python'])])
If you want to use the post_filter element for faceted navigation, use the
.post_filter()
method.
You can also exclude()
items from your query like this:
s = Search()
s = s.exclude('terms', tags=['search', 'python'])
which is shorthand for: s = s.query('bool', filter=[~Q('terms', tags=['search', 'python'])])
Aggregations¶
To define an aggregation, you can use the A
shortcut:
A('terms', field='tags')
# {"terms": {"field": "tags"}}
To nest aggregations, you can use the .bucket()
, .metric()
and
.pipeline()
methods:
a = A('terms', field='category')
# {'terms': {'field': 'category'}}
a.metric('clicks_per_category', 'sum', field='clicks')\
.bucket('tags_per_category', 'terms', field='tags')
# {
# 'terms': {'field': 'category'},
# 'aggs': {
# 'clicks_per_category': {'sum': {'field': 'clicks'}},
# 'tags_per_category': {'terms': {'field': 'tags'}}
# }
# }
To add aggregations to the Search
object, use the .aggs
property, which
acts as a top-level aggregation:
s = Search()
a = A('terms', field='category')
s.aggs.bucket('category_terms', a)
# {
# 'aggs': {
# 'category_terms': {
# 'terms': {
# 'field': 'category'
# }
# }
# }
# }
or
s = Search()
s.aggs.bucket('articles_per_day', 'date_histogram', field='publish_date', interval='day')\
.metric('clicks_per_day', 'sum', field='clicks')\
.pipeline('moving_click_average', 'moving_avg', buckets_path='clicks_per_day')\
.bucket('tags_per_day', 'terms', field='tags')
s.to_dict()
# {
# "aggs": {
# "articles_per_day": {
# "date_histogram": { "interval": "day", "field": "publish_date" },
# "aggs": {
# "clicks_per_day": { "sum": { "field": "clicks" } },
# "moving_click_average": { "moving_avg": { "buckets_path": "clicks_per_day" } },
# "tags_per_day": { "terms": { "field": "tags" } }
# }
# }
# }
# }
You can access an existing bucket by its name:
s = Search()
s.aggs.bucket('per_category', 'terms', field='category')
s.aggs['per_category'].metric('clicks_per_category', 'sum', field='clicks')
s.aggs['per_category'].bucket('tags_per_category', 'terms', field='tags')
Note
When chaining multiple aggregations, there is a difference between what
.bucket()
and .metric()
methods return - .bucket()
returns the
newly defined bucket while .metric()
returns its parent bucket to allow
further chaining.
As opposed to other methods on the Search
objects, defining aggregations is
done in-place (does not return a copy).
Sorting¶
To specify sorting order, use the .sort()
method:
s = Search().sort(
'category',
'-title',
{"lines" : {"order" : "asc", "mode" : "avg"}}
)
It accepts positional arguments which can be either strings or dictionaries.
String value is a field name, optionally prefixed by the -
sign to specify
a descending order.
To reset the sorting, just call the method with no arguments:
s = s.sort()
Pagination¶
To specify the from/size parameters, use the Python slicing API:
s = s[10:20]
# {"from": 10, "size": 10}
If you want to access all the documents matched by your query you can use the
scan
method which uses the scan/scroll elasticsearch API:
for hit in s.scan():
print(hit.title)
Note that in this case the results won’t be sorted.
Highlighting¶
To set common attributes for highlighting use the highlight_options
method:
s = s.highlight_options(order='score')
Enabling highlighting for individual fields is done using the highlight
method:
s = s.highlight('title')
# or, including parameters:
s = s.highlight('title', fragment_size=50)
The fragments in the response will then be available on reach Result
object
as .meta.highlight.FIELD
which will contain the list of fragments:
response = s.execute()
for hit in response:
for fragment in hit.meta.highlight.title:
print(fragment)
Suggestions¶
To specify a suggest request on your Search
object use the suggest
method:
s = s.suggest('my_suggestion', 'pyhton', term={'field': 'title'})
The first argument is the name of the suggestions (name under which it will be
returned), second is the actual text you wish the suggester to work on and the
keyword arguments will be added to the suggest’s json as-is which means that it
should be one of term
, phrase
or completion
to indicate which type
of suggester should be used.
If you only wish to run the suggestion part of the search (via the _suggest
endpoint) you can do so via execute_suggest
:
s = s.suggest('my_suggestion', 'pyhton', term={'field': 'title'})
suggestions = s.execute_suggest()
print(suggestions.my_suggestion)
Extra properties and parameters¶
To set extra properties of the search request, use the .extra()
method.
This can be used to define keys in the body that cannot be defined via a
specific API method like explain
or search_after
:
s = s.extra(explain=True)
To set query parameters, use the .params()
method:
s = s.params(search_type="count")
If you need to limit the fields being returned by elasticsearch, use the
source()
method:
# only return the selected fields
s = s.source(['title', 'body'])
# don't return any fields, just the metadata
s = s.source(False)
# explicitly include/exclude fields
s = s.source(include=["title"], exclude=["user.*"])
# reset the field selection
s = s.source(None)
Serialization and Deserialization¶
The search object can be serialized into a dictionary by using the
.to_dict()
method.
You can also create a Search
object from a dict
using the from_dict
class method. This will create a new Search
object and populate it using
the data from the dict:
s = Search.from_dict({"query": {"match": {"title": "python"}}})
If you wish to modify an existing Search
object, overriding it’s
properties, instead use the update_from_dict
method that alters an instance
in-place:
s = Search(index='i')
s.update_from_dict({"query": {"match": {"title": "python"}}, "size": 42})
Response¶
You can execute your search by calling the .execute()
method that will return
a Response
object. The Response
object allows you access to any key
from the response dictionary via attribute access. It also provides some
convenient helpers:
response = s.execute()
print(response.success())
# True
print(response.took)
# 12
print(response.hits.total)
print(response.suggest.my_suggestions)
If you want to inspect the contents of the response
objects, just use its
to_dict
method to get access to the raw data for pretty printing.
Hits¶
To access to the hits returned by the search, access the hits
property or
just iterate over the Response
object:
response = s.execute()
print('Total %d hits found.' % response.hits.total)
for h in response:
print(h.title, h.body)
Result¶
The individual hits is wrapped in a convenience class that allows attribute
access to the keys in the returned dictionary. All the metadata for the results
are accessible via meta
(without the leading _
):
response = s.execute()
h = response.hits[0]
print('/%s/%s/%s returned with score %f' % (
h.meta.index, h.meta.doc_type, h.meta.id, h.meta.score))
Note
If your document has a field called meta
you have to access it using
the get item syntax: hit['meta']
.
Aggregations¶
Aggregations are available through the aggregations
property:
for tag in response.aggregations.per_tag.buckets:
print(tag.key, tag.max_lines.value)
MultiSearch
¶
If you need to execute multiple searches at the same time you can use the
MultiSearch
class which will use the _msearch
API:
from elasticsearch_dsl import MultiSearch, Search
ms = MultiSearch(index='blogs')
ms = ms.add(Search().filter('term', tags='python'))
ms = ms.add(Search().filter('term', tags='elasticsearch'))
responses = ms.execute()
for response in responses:
print("Results for query %r." % response.search.query)
for hit in response:
print(hit.title)