Type for search...
codete Everything You Must Know About Searching Algorithms 1 main 62d7995d99
Codete Blog

Everything You Must Know About Searching Algorithms

Robert Wojciechowicz d599e5e8f3

03/11/2021 |

18 min read

Robert Wojciechowicz

When you search for data using an application, you might notice a difference between a well-designed and fast search functionality and a slow one. This boils down to the search algorithm used there. 

A searching algorithm is a basic and fundamental step in computing. It defines a step-by-step method for locating specific data in a data set. Every search algorithm uses a search key in order to complete the process and returns a success or failure status. 

When building search functionalities, you can choose from different types of searching algorithms. How you use the algorithm of your choice will have an impact on the performance and efficiency of the search process. 

In this article, we cover all the fundamentals of search algorithms to show you what they are, how they work, and how you can use them to your advantage. Keep on reading to learn everything that you need to know about search algorithms. 

 

Table of contents:

  1. What is a search algorithm?
  2. Search algorithm categories
  3. Most popular types of searching algorithms
    1. Linear search algorithm
    2. Binary search algorithm
    3. Jump search algorithm
    4. Interpolation search algorithm
    5. Exponential search algorithm
    6. Fibonacci search algorithm
  4. And the best searching algorithm is...?

What is a search algorithm? 

A searching algorithm is basically an algorithm that solves the search problem – it’s able to retrieve information stored within some type of data structure or calculated in the search space of a problem domain (with either continuous or discrete values).

The purpose of search algorithms is to check or retrieve an element from a data structure where it’s stored. It’s that simple. This type of algorithm searches for a target (called the key) in the search space – which can be anything from a shortlist of numbers to a massive database with customer data. 

By running an operation like that, you get two possible outcomes: success or failure. Success is when the algorithm manages to find a target. And failure is when the target isn’t found. 

Search algorithm categories

You can basically divide these algorithms into two different categories based on the type of search operations they perform: sequential search algorithms and interval search algorithms. 

Sequential search algorithms

This type of algorithm runs through a list or array sequentially and checks every element. Linear search is a good example of a sequential search algorithm. 

Interval search algorithms 

This type of searching algorithm is specifically designed for searching in assorted data structures. These are far more efficient than linear search methods and repeatedly target the center of the search structure to divide the search space into halves. The binary search algorithm is an example of an interval search algorithm. 

Most popular types of searching algorithms

Here are the most common types of search algorithms in use today:

  • linear search,
  • binary search,
  • jump search,
  • interpolation search,
  • exponential search,
  • Fibonacci search.

Naturally, the algorithms listed above are just a sample of a large selection of searching algorithms developers, and data scientists can use today. But covering all of them in one article is impossible. In the next part, we will cover these commonly used search algorithms.

1. Linear search algorithm

A linear search algorithm finds an element within the list by sequentially checking every single element on the list until it finds a match or finishes searching through the entire list. 

This type of algorithm runs in worst-case linear time and makes as many comparisons as there are items on the list. As you can imagine, if each element is equally likely to get searched, the process might take some time if the target is located at the end of the list. 

Linear search isn’t practical because of its slow pace. However it might be a good choice for small data sets because it doesn’t require sorting of the data. Algorithms and schemes like binary search or hash tables are much faster. That’s why linear search is considered one of the most basic of all search algorithms.

Python already ships with linear search. The “sequence” data types, for example, expose a method “index” that will return the index of an element or raise an exception in case the element was not found.

 

Here is an example of a linear search algorithm in Python and its output:

#!/usr/bin/env python3

def linear_search(iterable, search_item):
    """
    Linear search algorithm
    Return the lowest index in `iterable` where `search_item` is found.
    Return -1 on failure.
    """
    for index, item in enumerate(iterable):
        if item == search_item:
            return index
    return -1

def linear_search_builtin(iterable, search_item):
    """
    Linear search using built-in `index` method
    Return the lowest index in `iterable` where `search_item` is found.
    Return -1 on failure.
    """
    try:
        return iterable.index(search_item)
    except ValueError:
        return -1

def _test():
    assert linear_search([8, 2, 6], 2) == 1
    assert linear_search([8, 2, 6, 2], 2) == 1
    assert linear_search('zxcbcdefg', 'g') == 8
    assert linear_search('zxcbcdefg', 'y') == -1
    assert linear_search((123, 456, 989), 255) == -1

    assert linear_search_builtin([8, 2, 6], 2) == 1
    assert linear_search_builtin([8, 2, 6, 2], 2) == 1
    assert linear_search_builtin('zxcbcdefg', 'g') == 8
    assert linear_search_builtin('zxcbcdefg', 'y') == -1
    assert linear_search_builtin((123, 456, 989), 255) == -1

if __name__ == '__main__':
    _test()

2. Binary search algorithm

This type of searching algorithm comes in handy for finding the position of a specific value in a sorted array. The algorithm uses the principle of divide and conquer to do the job. 

And it works really well – today, binary search is considered as one of the most efficient searching algorithms thanks to its incredible speed. 

 

How does it work? 

The binary search algorithm starts by searching in the middle of the array and then goes down the lower or upper half of a given sequence. If the median value is lower than the target value, the search goes higher. If it’s the opposite, the algorithm looks next into the descending portion of the array. 

The Binary Search Tree is a node-based tree data structure that comes with the following properties:

  • The left subtree of a node includes only nodes with keys smaller than the node’s key.
  • On the right, you can find a subtree with nodes with keys that are greater than the node’s key.
  • The idea is that left, and right subtrees are also binary search trees. No duplicate notes are allowed there.

Binary search is a fast and efficient method for finding a specific target value from a set of ordered items. Thanks to starting in the middle of the sorted list, the algorithm can efficiently decrease the search space by half and determine whether it’s worth to ascend or descend the list based on the values it encounters compared to the target value. 

 

Here’s an example of how the binary search algorithm works:

  1. Let’s imagine that our target value is 8 and the search space we have ranges from 1 to 11.
  2. The algorithm finds the median value – in our case, it’s 6.
  3. Now the algorithm compares the target of 8 to the current value (6). 6 is smaller than 8, so it means that the target needs to be in the upper half of the array.
  4. The algorithm finds the median value in the new range (upper half of the array) - in our case, it’s 9.
  5. Now the algorithm compares the target of 8 to the current value (9). 9 is greater than 8, so it means that the target needs to be in the lower half of the array.
  6. The algorithm again finds the median value in the new range (lower half of the array from the previous step) – in our case, it’s 7.
  7. Now it compares the target of 8 to the current value (7). 7 is smaller than 8, so it means that the target needs to be in the upper half of the array from the previous step.
  8. The algorithm again finds the median value in the new range (upper half of the array from the previous step) – in our case, it’s 8.
  9. The algorithm checks the next value (8), compares it to the target and finds that it’s an exact match.

It gives you a success notice saying that the target has been found. 

This example shows why binary search is so efficient. In our scenario, the target had only to be compared to four values. If we used linear search, the algorithm would have to start from the very first value and move up – comparing the target to eight values instead of three. 

But binary search is only an option if you have an ordered set of data. If your data is arranged randomly, then you need to use linear search – it’s the only one that will give you the result you need. 

Binary search can be implemented in Python in three ways:

 

Here’s an example of a binary search algorithm in Python and its output:

#!/usr/bin/env python3

import bisect

def binary_search(sorted_iterable, search_item, left=None, right=None):
    """
    Binary search algorithm implemented iteratively
    Return the lowest index in `sorted_iterable` where `search_item` is found.
    Return -1 on failure.
    """
    if left is None:
        left = 0
    if right is None:
        right = len(sorted_iterable) - 1

    while left <= right:
        middle = (left + right) // 2

        if sorted_iterable[middle] == search_item:
            return middle

        if sorted_iterable[middle] < search_item:
            left = middle + 1
        elif sorted_iterable[middle] > search_item:
            right = middle - 1
    return -1

def binary_search_contains(sorted_iterable, search_item):
    """
    Binary search algorithm implemented recursively
    Return True if `search_item` is found in `sorted_iterable`.
    Return False otherwise.
    """
    left, right = 0, len(sorted_iterable) - 1

    if left <= right:
        middle = (left + right) // 2

        if sorted_iterable[middle] == search_item:
            return True

        if sorted_iterable[middle] < search_item:
            return binary_search_contains(sorted_iterable[middle+1:], search_item)
        elif sorted_iterable[middle] > search_item:
            return binary_search_contains(sorted_iterable[:middle], search_item)
    return False

def binary_search_builtin(sorted_iterable, search_item):
    """
    Binary search algorithm implemented using built-in `bisect` module
    Return the lowest index in `sorted_iterable` where `search_item` is found.
    Return -1 on failure.
    """
    index = bisect.bisect_left(sorted_iterable, search_item)
    if index < len(sorted_iterable) and sorted_iterable[index] == search_item:
        return index
    return -1

def _test():
    assert binary_search(sorted([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]), 8) == 7
    assert binary_search(sorted([8, 2, 6]), 2) == 0
    assert binary_search(sorted([8, 2, 6, 1]), 2) == 1
    assert binary_search(sorted('zxbcdefg'), 'g') == 5
    assert binary_search(sorted('zxbcdefg'), 'y') == -1
    assert binary_search(sorted((123, 456, 989)), 255) == -1

    assert binary_search_contains(sorted([8, 2, 6]), 2) == True
    assert binary_search_contains(sorted([8, 2, 6, 1]), 2) == True
    assert binary_search_contains(sorted('zxbcdefg'), 'g') == True
    assert binary_search_contains(sorted('zxbcdefg'), 'y') == False
    assert binary_search_contains(sorted((123, 456, 989)), 255) == False

    assert binary_search_builtin(sorted([8, 2, 6]), 2) == 0
    assert binary_search_builtin(sorted([8, 2, 6, 1]), 2) == 1
    assert binary_search_builtin(sorted('zxbcdefg'), 'g') == 5
    assert binary_search_builtin(sorted('zxbcdefg'), 'y') == -1
    assert binary_search_builtin(sorted((123, 456, 989)), 255) == -1

if __name__ == '__main__':
    _test()

3. Jump search algorithm

Just like binary search, jump search (also known as block search) is a perfect match for a sorted array of data. The idea here is to check fewer elements than linear search would by jumping ahead using fixed steps or skipping some elements instead of searching all of them. 

Imagine that you have an array arr[] of size n and block to be jumped of size m. Then you search at the indexes arr[0], arr[m], arr[2m]…

Once we find the interval (arr[km] < x < arr[(k+1)m]), we perform a linear search operation from the index km to find the element x.

 

Here is an example of a jump search algorithm:

Consider the following array of data: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610). The length of our array in this case is 16. It has been mathematically proven that the optimal jump is the square root of the input data set length – in our case it’s 4.

Jump search can find the value of 55 by following these steps, assuming that the block size it can jump to is 4. 

  1. The algorithm jumps from index 0 to index 4.
  2. The algorithm jumps from index 4 to index 8.
  3. The algorithm jumps from index 8 to index 12.
  4. Since the element at index 12 is greater than 55, it’s time to jump back a step. The algorithm does that and arrives at index 8.
  5. Now it’s time to perform a linear search from index 8 to finally get to our target element, 55. It will be found at index 10.

 

Here is an example of a jump search algorithm in Python and its output:

#!/usr/bin/env python3

import math

def jump_search(sorted_iterable, search_item):
    """
    Jump search algorithm
    Return the lowest index in `sorted_iterable` where `search_item` is found.
    Return -1 on failure.
    """
    n = len(sorted_iterable)
    jump = int(math.sqrt(n))
    prev, next = 0, jump

    # find the interval where item is present
    while sorted_iterable[min(next, n) - 1] < search_item:
        prev = next
        next += jump
        if prev >= n:
            return -1

    # perform linear search in the found interval
    while sorted_iterable[prev] < search_item:
        prev += 1
        if prev == min(next, n):
            return -1

    if sorted_iterable[prev] == search_item:
        return prev

    return -1


def _test():
    assert jump_search(sorted([0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]), 55) == 10
    assert jump_search(sorted([8, 2, 6]), 2) == 0
    assert jump_search(sorted([8, 2, 6, 1]), 2) == 1
    assert jump_search(sorted('zxbcdefg'), 'g') == 5
    assert jump_search(sorted('zxbcdefg'), 'y') == -1
    assert jump_search(sorted((123, 456, 989)), 255) == -1

if __name__ == '__main__':
    _test()

4. Interpolation search algorithm

Interpolation search is very similar to binary search. First described in 1957, this search algorithm works by probing the position of the required value. 

In a binary search, we always start searching from the middle of the list, whereas in the interpolation search we determine the starting position depending on the item to be searched. In the interpolation search algorithm, the starting search position is most likely to be the closest to the start or end of the list depending on the search item. If the search item is near to the first element in the list, then the starting search position is likely to be near the start of the list.

For the algorithm to work properly, the data collection needs to be in a sorted form and distributed equally. Interpolation search is a type of searching algorithm used most often for searching for a key in an array ordered by numerical values assigned to the keys (called key values).

 

The formula for finding the starting position looks as follows:

start_position = first_index + \

    (last_index - first_index) * \

    (search_item - input_list[first_index]) // \

    (input_list[last_index] - input_list[first_index])

first_index - index of the smallest value in the input_list

last_index - index of the greatest value in the input_list

input_list[first_index] - the lowest value in the input_list

input_list[last_index] - the greatest value in the input_list

search_item - the value of the item that is to be searched


Here is an example of interpolation search in Python:

#!/usr/bin/env python3


def find_starting_point(sorted_iterable, first_index, last_index, search_item):
    """
    Find starting point for the interpolation search
    """
    return first_index + \
        (last_index - first_index) * \
        (search_item - sorted_iterable[first_index]) // \
        (sorted_iterable[last_index] - sorted_iterable[first_index])


def interpolation_search(sorted_iterable, search_item):
    """
    Interpolation search algorithm implemented iteratively
    Return the lowest index in `sorted_iterable` where `search_item` is found.
    Return -1 on failure.
    """
    left, right = 0, len(sorted_iterable) - 1

    while left <= right:
        start = find_starting_point(sorted_iterable, left, right, search_item)

        if start > right or start < left:
            return -1

        if sorted_iterable[start] == search_item:
            return start

        if sorted_iterable[start] < search_item:
            left = start + 1
        elif sorted_iterable[start] > search_item:
            right = start - 1
    return -1

def interpolation_search_contains(sorted_iterable, search_item):
    """
    Interpolation search algorithm implemented recursively
    Return True if `search_item` is found in `sorted_iterable`.
    Return False otherwise.
    """
    left, right = 0, len(sorted_iterable) - 1

    if left <= right:
        start = find_starting_point(sorted_iterable, left, right, search_item)

        if start > right or start < left:
            return False

        if sorted_iterable[start] == search_item:
            return True

        if sorted_iterable[start] < search_item:
            return interpolation_search_contains(sorted_iterable[start+1:], search_item)
        elif sorted_iterable[start] > search_item:
            return interpolation_search_contains(sorted_iterable[:start], search_item)
    return False

def _test():
    assert interpolation_search(sorted([44, 60, 75, 100, 120, 230, 250]), 120) == 4
    assert interpolation_search(sorted([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]), 8) == 7
    assert interpolation_search(sorted([8, 2, 6]), 2) == 0
    assert interpolation_search(sorted([8, 2, 6, 1]), 2) == 1
    assert interpolation_search(sorted((123, 456, 989)), 255) == -1

    assert interpolation_search_contains(sorted([8, 2, 6]), 2) == True
    assert interpolation_search_contains(sorted([8, 2, 6, 1]), 2) == True
    assert interpolation_search_contains(sorted((123, 456, 989)), 255) == False

if __name__ == '__main__':
    _test()

5. Exponential search algorithm

Also called doubling or galloping search or Struzik search, the exponential search algorithm is suitable for searching sorted, unbounded/infinite lists. This algorithm comes in handy for finding the range where the search key may be present. 

The name of this algorithm refers to the way it searches for an element – an exponential search involves two fundamental steps. First of all, it finds the range where the element is present, and second, it performs binary search in the found range to find the target for the key value. 

 

Here’s how it works:

In the first stage, assuming that the list is sorted in ascending order, the algorithm looks for the first exponent, j, where the value 2j is greater than the search key. This value, 2j becomes the upper bound for the binary search, 2j - 1 being the lower bound for the binary search.

Exponential search is particularly useful if you’re dealing with unbounded searches where the size of the array is infinite. It works much better than binary search for bounded arrays or when the element that needs to be searched for is closer to the first element. 

 

Here is an example of an exponential search algorithm implemented in Python:

#!/usr/bin/env python3

from binary_search import binary_search

def exponential_search(sorted_iterable, search_item):
    """
    Exponential search algorithm
    Return the lowest index in `sorted_iterable` where `search_item` is found.
    Return -1 on failure.
    """
    n = len(sorted_iterable)
    i = 1
    # determine a range in which a `search_item` would reside
    while i < n and sorted_iterable[i] < search_item:
        i *= 2

    # perform binary search on the found range
    return binary_search(sorted_iterable, search_item, i / 2, min(i + 1, n))


def _test():
    assert exponential_search(sorted([0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]), 55) == 10
    assert exponential_search(sorted([8, 2, 6]), 2) == 0
    assert exponential_search(sorted([8, 2, 6, 1]), 2) == 1
    assert exponential_search(sorted('zxbcdefg'), 'g') == 5
    assert exponential_search(sorted('zxbcdefg'), 'y') == -1
    assert exponential_search(sorted((123, 456, 989)), 255) == -1

if __name__ == '__main__':
    _test()

6. Fibonacci search algorithm

In the Fibonacci searching algorithm, a sorted array uses a divide and conquer algorithm to narrow down possible locations with the help of Fibonacci numbers. 

Let’s compare the Fibonacci search to the binary search algorithm. In binary search, the sorted array is divided into two equal-size parts, and only one is examined further. In Fibonacci search, on the other hand, the array is divided into two parts that have sizes of consecutive Fibonacci numbers. 

Another difference with the binary search is that the Fibonacci search doesn’t use a division operator to divide the range, but + and -. The division operator may be costly on some CPUs.

Fibonacci search examines relatively closer elements in subsequent steps. So when the input array is big and cannot fit in the CPU cache or even in RAM, Fibonacci Search can be useful.

 

Here is an example of the Fibonacci search search algorithm implemented in Python:

#!/usr/bin/env python3

def fibonacci_search(sorted_iterable, search_item):
    """
    Fibonacci search algorithm
    Return the lowest index in `sorted_iterable` where `search_item` is found.
    Return -1 on failure.
    """
    n = len(sorted_iterable)
    start = -1
    # init the first three Fibonacci numbers
    fibM_prev_2 = 0
    fibM_prev_1 = 1
    fibM = 1
    # find the smallest Fibonacci number greater than or equal to the size of the input list
    while(fibM < n):
        fibM_prev_2 = fibM_prev_1
        fibM_prev_1 = fibM
        fibM = fibM_prev_1 + fibM_prev_2

    while(fibM > 1):
        index = min(start + fibM_prev_2, n - 1)
        if sorted_iterable[index] < search_item:
            # move the values one step down in the Fibonacci sequence
            # and reset the index to the index of the element
            fibM = fibM_prev_1
            fibM_prev_1 = fibM_prev_2
            fibM_prev_2 = fibM - fibM_prev_1
            start = index
        elif sorted_iterable[index] > search_item:
            # move the values two steps down in the Fibonacci sequence
            fibM = fibM_prev_2
            fibM_prev_1 = fibM_prev_1 - fibM_prev_2
            fibM_prev_2 = fibM - fibM_prev_1
        else:
            return index
    if fibM_prev_1 and (sorted_iterable[n - 1] == search_item):
        return n - 1
    return -1


def _test():
    assert fibonacci_search(sorted([1, 2, 3, 4, 5, 6, 7]), 6) == 5
    assert fibonacci_search(sorted([0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]), 55) == 10
    assert fibonacci_search(sorted([8, 2, 6]), 2) == 0
    assert fibonacci_search(sorted([8, 2, 6, 1]), 2) == 1
    assert fibonacci_search(sorted('zxbcdefg'), 'g') == 5
    assert fibonacci_search(sorted('zxbcdefg'), 'y') == -1
    assert fibonacci_search(sorted((123, 456, 989)), 255) == -1

if __name__ == '__main__':
    _test()

And the best searching algorithm is...?

This question is hard to answer because it all depends on your use case! But knowing what you can choose from is the first step to building a solid search functionality that finds the target as fast as possible.

We hope that this review of searching algorithms helps you appreciate the sheer variety of algorithms available for this particular function. 

Are you using any of the algorithms we mentioned in your project? Which ones are you planning to use in the future? Give us a shout-out in the comments section to share your expertise with junior developers and data scientists looking to learn more about searching algorithms.

Rated: 5.0 / 1 opinions
Robert Wojciechowicz d599e5e8f3

Robert Wojciechowicz

Software Developer. Passionate about creating readable and testable code. Python enthusiast since version 2.0.

Our mission is to accelerate your growth through technology

Contact us

Codete Przystalski Olechowski Śmiałek
Spółka Komandytowa

Na Zjeździe 11
30-527 Kraków

NIP (VAT-ID): PL6762460401
REGON: 122745429
KRS: 0000696869

Offices
  • Kraków

    Na Zjeździe 11
    30-527 Kraków

  • Lublin

    Wojciechowska 7E
    20-704 Lublin

  • Berlin

    Wattstraße 11
    13355 Berlin

Copyright 2022 Codete