Getting started with elasticsearch: a basic overview
Published By Suraj Sharma on 1-1-2020
In this post, we will cover some basic concepts of elasticsearch, mapping and indexing the data, and searching on that data.
What is Elasticsearch?
Elastic search is an open source, distributed NoSQL database, full text search engine based on the lucene library. It is built on Java. It uses schema or table free JSON documents and provides REST APIs interface to interact with it.
Elasticsearch is scalable and very useful for big data up to petabytes of structured and unstructured data.
Before we start
There are some useful key concepts of elasticsearch.
Document
The document expressed as a JSON object. It is the collection of keys and values. It is stored in the Index. We will search for documents in indices.
Indices
The index is the optimized collection of different types of documents. We will index documents so that we can search in the index for a document.
By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure. It provides functionality to mappings the data dynamically or explicitly to define mappings to take full control of how fields are stored and indexed.
Index contain inverted indices that let you search across everything.
Node
Any time that you start an instance of Elasticsearch, you are starting a node.
Cluster
A collection of connected nodes is called a cluster. If you are running a single node of Elasticsearch, then you have a cluster of one node. It provides aggregated indexing and search capabilities across all nodes.
Shard
A shard is a single Lucene instance. The index is grouping of one or more physical shards. By breaking index into shards which makes them independent and can be stored in any node.
Replicas
Elasticsearch allows us to create replicas of shards and indexes. Which provides the availability of the data in case of failure and also improves the performance of searching by running a parallel search in these replicas.
Installation
You can download and follow the setup guide from here.
To interact with RESTFUL Apis we are going to use Postman.
We are going to use the default URL and port http://localhost:9200
for this post.
Mapping
Mapping is a schema definition where we define the data type of all fields in a document. Schema stored in an index.
Manually Mapping to create an index.
PUT http://localhost:9200/index_name
Content-Type: application/json PUT /students
{
"mappings": {
"properties": {
"age": {
"type": "integer"
},
"email": {
"type": "keyword"
},
"name": {
"type": "text"
},
"school": {
"type": "text"
}
}
}
}
View Mapping.
We can view the mapping of an index.
GET http://localhost:9200/index_name/_mapping
GET /students/_mapping
Update Mapping.
We can add new fields in the existing mapping of an index. We can’t update existing fields for that we need to delete and create a new one.
PUT http://localhost:9200/students/_mapping
Content-Type: application/json PUT /students/_mapping
{
"properties": {
"city": {
"type": "text",
"index": false
}
}
}
Delete Mapping
We can delete a Mapping of an index.
DELETE http://localhost:9200/index_name/
DELETE /students/
Create Document
We can create an index document with a simple PUT request we can provide unique id at the end of the URL.
POST student/_doc/id
{
"age":10,
"email":"rajesh@gmail.com",
"name":"Rajesh",
"school":"xyz school",
"city":"New Delhi"
}
Create Document without providing ID.
Get All Documents
We can get all the documents in our student index.
GET http://localhost:9200/students/_search
GET /students/_search
Search or Query in Elastic Search
We can search or query our data in two ways:
- URI Search
- Request Body Search
URI Search
We can search by URI Search where we mention our index name after that q param which has a field name with value separated by a colon. Request URI searches do not support the full Elasticsearch Query DSL but are handy for testing.
GET http://localhost:9200/students/_search?q=name:ramesh
GET /students/_search?q=field_name:value
Spaces, special characters, etc needs to be URL encoded which can be a little tricky to write that’s why I recommend Request Body Search.
Request Body Search
URI search is not the best way to query in Elasticsearch. So better is we use Request Body Search. The search request can be executed with a search DSL, which includes the Query DSL within its body.
Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses:
Leaf query clauses Leaf query clauses look for a particular value in a particular field, such as the match, term or range queries. These queries can be used by themselves.
Compound query clauses Compound query clauses wrap other leaf or compound queries and are used to combine multiple queries in a logical fashion (such as the bool or dismax query), or to alter their behavior (such as the constantscore query).
In Elasticsearch when we query document, it matches documents and sort by relevance score which is present in the meta field of the result.
Query Type can calculate relevance scores differently so it depends upon where the query clause is in Query or Filter Context.
Query Context
It provides the relevance score in the meta field according to how well the query matches the document. Query context is in effect whenever a query clause is passed to a query parameter, such as the query parameter in the search API.
Filter Context
It tells that the query matches the document. Yes or no is the only answer no scores are calculated.
Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or mustnot parameters in the bool query, the filter parameter in the constantscore query, or the filter aggregation.
http://localhost:9200/students/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"school": "xyz"
}
}
],
"filter": [
{
"range": {
"age": {
"gt": 10
}
}
}
]
}
}
}
In the above Query, Query Parameter represents the Query Context and Bool, Must and Match Clauses lie in the query context and describes how well each document will match the school: “xyz”,
Filter Parameter represents the Filter Context and Filter and Range clauses lie in the filter context and provide yes or no that document’s age is greater than 10.
Match Query
It finds all the documents which match the query’s input value should be in the field. An order doesn’t matter.
GET http://localhost:9200/students/_search
{
"query":{
"match":{
"school":"zy school"
}
}
}
Match Phrase Query
It finds all the documents which match the query’s input value should be in the same order.
GET http://localhost:9200/students/_search
{
"query":{
"match_phrase":{
"school":"zy school"
}
}
}
Proximity Query
With High Slop you can get results on the basis that documents contained the words in the phrase. But order can be changed on the basis of slop value.
Slop determines how many words being in between phrase terms. Higher the value of slop higher the number of words can appear in between the phrase terms.
With Low slop
GET http://localhost:9200/students/_search
{
"query": {
"match_phrase": {
"school": {
"query": "school jkl",
"slop": 0
}
}
}
}
With High slop
GET http://localhost:9200/students/_search
{
"query": {
"match_phrase": {
"school": {
"query": "school jkl",
"slop": 100
}
}
}
}
Pagination
We can implement pagination in our query result for that we can send from
and size
parameters in our query where from
defines the offset from the first result you want to return and size defines the total or the maximum number of the result you want to return.
GET http://localhost:9200/students/_search
Conclusion
Now we have covered some basic concepts of the elastic, CRUD in Mapping, CRUD in Document, Query or Searching the Document and Pagination. You can learn more from Elasticsearch documentation. Thanks for reading!