Faceted Search
The library comes with a simple abstraction aimed at helping you develop faceted navigation for your data.
Note
This API is experimental and will be subject to change. Any feedback is welcome.
Configuration
You can provide several configuration options (as class attributes) when
declaring a FacetedSearch
subclass:
index
the name of the index (as string) to search through, defaults to
'_all'
.doc_types
list of
Document
subclasses or strings to be used, defaults to['_all']
.fields
list of fields on the document type to search through. The list will be passes to
MultiMatch
query so can contain boost values ('title^5'
), defaults to['*']
.facets
dictionary of facets to display/filter on. The key is the name displayed and values should be instances of any
Facet
subclass, for example:{'tags': TermsFacet(field='tags')}
Facets
There are several different facets available:
TermsFacet
provides an option to split documents into groups based on a value of a field, for example
TermsFacet(field='category')
DateHistogramFacet
split documents into time intervals, example:
DateHistogramFacet(field="published_date", calendar_interval="day")
HistogramFacet
similar to
DateHistogramFacet
but for numerical values:HistogramFacet(field="rating", interval=2)
RangeFacet
allows you to define your own ranges for a numerical fields:
RangeFacet(field="comment_count", ranges=[("few", (None, 2)), ("lots", (2, None))])
NestedFacet
is just a simple facet that wraps another to provide access to nested documents:
NestedFacet('variants', TermsFacet(field='variants.color'))
By default facet results will only calculate document count, if you wish for
a different metric you can pass in any single value metric aggregation as the
metric
kwarg (TermsFacet(field='tags', metric=A('max',
field=timestamp))
). When specifying metric
the results will be, by
default, sorted in descending order by that metric. To change it to ascending
specify metric_sort="asc"
and to just sort by document count use
metric_sort=False
.
Advanced
If you require any custom behavior or modifications simply override one or more of the methods responsible for the class’ functions:
search(self)
is responsible for constructing the
Search
object used. Override this if you want to customize the search object (for example by adding a global filter for published articles only).query(self, search)
adds the query position of the search (if search input specified), by default using
MultiField
query. Override this if you want to modify the query type used.highlight(self, search)
defines the highlighting on the
Search
object and returns a new one. Default behavior is to highlight on all fields specified for search.
Usage
The custom subclass can be instantiated empty to provide an empty search
(matching everything) or with query
, filters
and sort
.
query
is used to pass in the text of the query to be performed. If
None
is passed in (default) aMatchAll
query will be used. For example'python web'
filters
is a dictionary containing all the facet filters that you wish to apply. Use the name of the facet (from
.facets
attribute) as the key and one of the possible values as value. For example{'tags': 'python'}
.sort
is a tuple or list of fields on which the results should be sorted. The format of the individual fields are to be the same as those passed to
sort()
.
Response
the response returned from the FacetedSearch
object (by calling
.execute()
) is a subclass of the standard Response
class that adds a
property called facets
which contains a dictionary with lists of buckets -
each represented by a tuple of key, document count and a flag indicating
whether this value has been filtered on.
Example
from datetime import date
from elasticsearch_dsl import FacetedSearch, TermsFacet, DateHistogramFacet
class BlogSearch(FacetedSearch):
doc_types = [Article, ]
# fields that should be searched
fields = ['tags', 'title', 'body']
facets = {
# use bucket aggregations to define facets
'tags': TermsFacet(field='tags'),
'publishing_frequency': DateHistogramFacet(field='published_from', interval='month')
}
def search(self):
# override methods to add custom pieces
s = super().search()
return s.filter('range', publish_from={'lte': 'now/h'})
bs = BlogSearch('python web', {'publishing_frequency': date(2015, 6)})
response = bs.execute()
# access hits and other attributes as usual
total = response.hits.total
print('total hits', total.relation, total.value)
for hit in response:
print(hit.meta.score, hit.title)
for (tag, count, selected) in response.facets.tags:
print(tag, ' (SELECTED):' if selected else ':', count)
for (month, count, selected) in response.facets.publishing_frequency:
print(month.strftime('%B %Y'), ' (SELECTED):' if selected else ':', count)