Getting Started with Elasticsearch: A basic overview

In this post, we will cover some basic concepts of elasticsearch, mapping and indexing the data, and searching on that data.

What is Elasticsearch?

Elastic search is an open source, distributed NoSQL database, full text search engine based on the lucene library. It is built on Java. It uses schema or table free JSON documents and provides REST APIs interface to interact with it.

Elasticsearch is scalable and very useful for big data up to petabytes of structured and unstructured data.

Before we start

There are some useful key concepts of elasticsearch.

Document

The document expressed as a JSON object. It is the collection of keys and values. It is stored in the Index. We will search for documents in indices.

Indices

The index is the optimized collection of different types of documents. We will index documents so that we can search in the index for a document.

By default, Elasticsearch indexes all data in every field and each indexed field has a dedicated, optimized data structure. It provides functionality to mappings the data dynamically or explicitly to define mappings to take full control of how fields are stored and indexed.

Index contain inverted indices that let you search across everything.

Node

Any time that you start an instance of Elasticsearch, you are starting a node.

Cluster

A collection of connected nodes is called a cluster. If you are running a single node of Elasticsearch, then you have a cluster of one node. It provides aggregated indexing and search capabilities across all nodes.

Shard

A shard is a single Lucene instance. The index is grouping of one or more physical shards. By breaking index into shards which makes them independent and can be stored in any node.

Replicas

Elasticsearch allows us to create replicas of shards and indexes. Which provides the availability of the data in case of failure and also improves the performance of searching by running a parallel search in these replicas.

Installation

You can download and follow the setup guide from here.

To interact with RESTFUL Apis we are going to use Postman.

We are going to use the default URL and port http://localhost:9200 for this post.

Mapping

Mapping is a schema definition where we define the data type of all fields in a document. Schema stored in an index.

Manually Mapping to create an index.

PUT http://localhost:9200/index_name

Content-Type: application/json  PUT /students

{
    "mappings": {
        "properties": {
            "age": {
                "type": "integer"
            },
            "email": {
                "type": "keyword"
            },
            "name": {
                "type": "text"
            },
            "school": {
                "type": "text"
            }
        }
    }
}

View Mapping.

We can view the mapping of an index.

GET http://localhost:9200/index_name/_mapping

GET /students/_mapping

Update Mapping.

We can add new fields in the existing mapping of an index. We can’t update existing fields for that we need to delete and create a new one.

PUT  http://localhost:9200/students/_mapping

Content-Type: application/json  PUT /students/_mapping
{
    "properties": {
        "city": {
            "type": "text",
            "index": false
        }
    }
}

Delete Mapping

We can delete a Mapping of an index.

DELETE http://localhost:9200/index_name/

DELETE /students/

Create Document

We can create an index document with a simple PUT request we can provide unique id at the end of the URL.

POST student/_doc/id

{
  "age":10,
	"email":"rajesh@gmail.com",
	"name":"Rajesh",
	"school":"xyz school",
	"city":"New Delhi"

}

Create Document without providing ID.

Get All Documents

We can get all the documents in our student index.

GET http://localhost:9200/students/_search

GET /students/_search

Search or Query in Elastic Search

We can search or query our data in two ways:

URI Search
Request Body Search

URI Search

We can search by URI Search where we mention our index name after that q param which has a field name with value separated by a colon. Request URI searches do not support the full Elasticsearch Query DSL but are handy for testing.

GET http://localhost:9200/students/_search?q=name:ramesh

GET /students/_search?q=field_name:value

Spaces, special characters, etc needs to be URL encoded which can be a little tricky to write that’s why I recommend Request Body Search.

Request Body Search

URI search is not the best way to query in Elasticsearch. So better is we use Request Body Search. The search request can be executed with a search DSL, which includes the Query DSL within its body.

Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses:

Leaf query clauses Leaf query clauses look for a particular value in a particular field, such as the match, term or range queries. These queries can be used by themselves.

Compound query clauses Compound query clauses wrap other leaf or compound queries and are used to combine multiple queries in a logical fashion (such as the bool or dismax query), or to alter their behavior (such as the constantscore query).

In Elasticsearch when we query document, it matches documents and sort by relevance score which is present in the meta field of the result.

Query Type can calculate relevance scores differently so it depends upon where the query clause is in Query or Filter Context.

Query Context

It provides the relevance score in the meta field according to how well the query matches the document. Query context is in effect whenever a query clause is passed to a query parameter, such as the query parameter in the search API.

Filter Context

It tells that the query matches the document. Yes or no is the only answer no scores are calculated.

Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or mustnot parameters in the bool query, the filter parameter in the constantscore query, or the filter aggregation.

http://localhost:9200/students/_search

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "school": "xyz"
                    }
                }
            ],
            "filter": [
                {
                    "range": {
                        "age": {
                            "gt": 10
                        }
                    }
                }
            ]
        }
    }
}

In the above Query, Query Parameter represents the Query Context and Bool, Must and Match Clauses lie in the query context and describes how well each document will match the school: “xyz”,

Filter Parameter represents the Filter Context and Filter and Range clauses lie in the filter context and provide yes or no that document’s age is greater than 10.

Match Query

It finds all the documents which match the query’s input value should be in the field. An order doesn’t matter.

GET http://localhost:9200/students/_search
{
	"query":{
		"match":{
			"school":"zy school"
		}
	}
}

Match Phrase Query

It finds all the documents which match the query’s input value should be in the same order.

GET http://localhost:9200/students/_search
{
	"query":{
		"match_phrase":{
			"school":"zy school"
		}
	}
}

Proximity Query

With High Slop you can get results on the basis that documents contained the words in the phrase. But order can be changed on the basis of slop value.

Slop determines how many words being in between phrase terms. Higher the value of slop higher the number of words can appear in between the phrase terms.

With Low slop

GET http://localhost:9200/students/_search
{
    "query": {
        "match_phrase": {
            "school": {
                "query": "school jkl",
                "slop": 0
            }
        }
    }
}

With High slop

GET http://localhost:9200/students/_search

{
    "query": {
        "match_phrase": {
            "school": {
                "query": "school jkl",
                "slop": 100
            }
        }
    }
}

Pagination

We can implement pagination in our query result for that we can send from and size parameters in our query where from defines the offset from the first result you want to return and size defines the total or the maximum number of the result you want to return.

GET http://localhost:9200/students/_search

Conclusion

Now we have covered some basic concepts of the elastic, CRUD in Mapping, CRUD in Document, Query or Searching the Document and Pagination. You can learn more from Elasticsearch documentation. Thanks for reading!

Categories: ElasticsearchDatabase

Tags: SearchingWeb DevelopmentCrudNosql Database

Getting started with elasticsearch: a basic overview

Published By Suraj Sharma on 1-1-2020

What is Elasticsearch?

Before we start

Document

Indices

Node

Cluster

Shard

Replicas

Installation

Mapping

Manually Mapping to create an index.

View Mapping.

Update Mapping.

Delete Mapping

Create Document

Get All Documents

Search or Query in Elastic Search

URI Search

Request Body Search

Query Context

Filter Context

Match Query

Match Phrase Query

Proximity Query

With Low slop

With High slop

Pagination

Conclusion

Getting started with elasticsearch: a basic overview

Published By Suraj Sharma on 1-1-2020

What is Elasticsearch?

Before we start

Document

Indices

Node

Cluster

Shard

Replicas

Installation

Mapping

Manually Mapping to create an index.

View Mapping.

Update Mapping.

Delete Mapping

Create Document

Get All Documents

Search or Query in Elastic Search

URI Search

Request Body Search

Query Context

Filter Context

Match Query

Match Phrase Query

Proximity Query

With Low slop

With High slop

Pagination

Conclusion

If you liked this post? Share it !