Unlocking Efficiency with LinkedLists in Python

Harshit Singhai

April 12, 2023

Imagine this: You're facing a massive list of items, and your mission is to remove certain ones based on specific conditions. Your first instinct might be to reach for Python's lists, iterate through the elements, and remove them as needed. But what if I told you there's a more efficient way?

Let's dive into a scenario to illustrate this. Picture a function that's tasked with cleaning up a list of email addresses by removing any with invalid formats. Whether you use a list or a linked list, both approaches require iterating through each address and performing the validation check. That's the common ground. But the difference emerges when it comes to deletion.

Lists

Here's the catch with lists: each time you remove an invalid email, the remaining elements need to be shifted to fill the gap. This shifting process, often hidden behind the scenes, can significantly impact performance, especially for large lists.

Imagine a list of 1,000 email addresses with a 10% rate of invalid ones. That means about 100 deletions. While reading all addresses takes 1,000 steps, the deletion process can add up to 100,000 extra steps due to the shifting of elements!

LinkedLists

This is where LinkedLists step in to save the day. In a LinkedList, each element (or "node") holds a reference to the next one, forming a chain-like structure. The secret sauce? When you delete a node, you simply update the link to point to the appropriate next node, without any shifting involved.

In our email example, using a LinkedList would mean a total of approximately 1,100 steps: 1,000 for reading and only 100 for deletion. That's a significant difference!

Ready for Action

Let's get hands-on and explore code examples to witness this efficiency in action. We'll put both lists and LinkedLists to the test and measure their performance to see the real-world impact of choosing the right data structure for the job.

Setting up a list

Instead of email validator, we're going to generate random names. We will iterate through the generated names list and remove the name which starts with "Adam".

Create first_last_name.json file and the contents.

[
  { "first_name": "Sibel", "last_name": "McGrory" },
  { "first_name": "Reeba", "last_name": "Consterdine" },
  { "first_name": "Ford", "last_name": "Rushmer" },
  { "first_name": "Hildy", "last_name": "Pilsworth" },
  { "first_name": "Julissa", "last_name": "Jorn" },
  { "first_name": "Jason", "last_name": "Paish" },
  { "first_name": "Rorie", "last_name": "Tarbox" },
  { "first_name": "Goldie", "last_name": "Whitters" },
  { "first_name": "Sandy", "last_name": "Kennion" },
  { "first_name": "Sallyanne", "last_name": "Full" },
  { "first_name": "Haskell", "last_name": "Negro" },
  { "first_name": "Sharona", "last_name": "Eberst" },
  { "first_name": "Hayley", "last_name": "Caron" },
  { "first_name": "Gladi", "last_name": "Bullimore" },
  { "first_name": "Ernestus", "last_name": "Dallison" },
  { "first_name": "Nolana", "last_name": "Diplock" },
  { "first_name": "Heida", "last_name": "Suggey" },
  { "first_name": "Darin", "last_name": "Ouchterlony" },
  { "first_name": "Jania", "last_name": "Knappitt" },
  { "first_name": "Tomlin", "last_name": "Bispham" },
  { "first_name": "Chrissy", "last_name": "Everil" },
  { "first_name": "Ozzy", "last_name": "Carolan" },
  { "first_name": "Grantham", "last_name": "Hawyes" },
  { "first_name": "Filippo", "last_name": "Robertis" },
  { "first_name": "Hewe", "last_name": "Butson" },
  { "first_name": "Alden", "last_name": "Gager" },
  { "first_name": "Farlee", "last_name": "Siemianowicz" },
  { "first_name": "Gigi", "last_name": "McGeachy" },
  { "first_name": "Lynn", "last_name": "Crone" },
  { "first_name": "Tobey", "last_name": "Alflatt" },
  { "first_name": "Jerry", "last_name": "Sprankling" },
  { "first_name": "Major", "last_name": "Skeath" },
  { "first_name": "Clint", "last_name": "Hender" },
  { "first_name": "Fleurette", "last_name": "Halpeine" },
  { "first_name": "Rayshell", "last_name": "Pieter" },
  { "first_name": "Gaylord", "last_name": "Clendennen" },
  { "first_name": "Ronalda", "last_name": "Peoples" },
  { "first_name": "Ollie", "last_name": "Trainer" },
  { "first_name": "Opalina", "last_name": "Bootes" },
  { "first_name": "Evelyn", "last_name": "Cowlard" },
  { "first_name": "Nada", "last_name": "Wimpeney" },
  { "first_name": "Maisey", "last_name": "Brik" },
  { "first_name": "Gavra", "last_name": "Pennuzzi" },
  { "first_name": "Risa", "last_name": "Petrescu" },
  { "first_name": "Norton", "last_name": "Ortiga" },
  { "first_name": "Maryann", "last_name": "Hanscom" },
  { "first_name": "Kirsteni", "last_name": "Andreou" },
  { "first_name": "Hannie", "last_name": "Hurren" },
  { "first_name": "Tilda", "last_name": "Lumbers" },
  { "first_name": "Carlyn", "last_name": "Wilbud" }
]

Generate Random Names

from time import perf_counter
import random
import json

def get_dummy_data():
    with open("first_last_name.json", "r") as f:
        return json.load(f)


def generate_last_name():
    return random.choice(last_names)

data = get_dummy_data()
first_names = [x['first_name'] for x in data]
last_names = [x['last_name'] for x in data]


def generate_name():
    return "".join(random.choice(first_names)+" "+random.choice(last_names))

Once every 10 iteration we will explicitly insert a name starting with Adam

k = 5
generated_name = []
total_generated_name = 1000000
for i in range(total_generated_name):
    if i+1 == k:
        generated_name.append("Adam " + generate_last_name())
        k += 10
    generated_name.append(generate_name())

print('generated_name: ', len(generated_name))

Total number of items in the list.

generated_name:  1100000

Implementing LinkedList

Node class

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

LinkedList

class LinkedList:
    def __init__(self, nodes=None):
        self.head = None
        if nodes:
            node = Node(nodes[0])
            self.head = node
            for elem in nodes[1:]:
                new_node = Node(elem)
                node.next = new_node
                node = new_node

    def delete_node_starting_with_adam(self):
        curr = self.head
        prev = None
        while curr:
            if curr.data.startswith("Adam"):
                prev.next = curr.next
            else:
                prev = curr
            curr = curr.next

    def __iter__(self):
        current = self.head
        while current:
            yield current.data
            current = current.next

Testing with Simple Python List

start_time = perf_counter()
for name in generated_name:
    if name.startswith("Adam"):
        generated_name.remove(name)
stop_time = perf_counter()
print("Elapsed time  during the whole program in seconds:", stop_time-start_time)

Total time taken by Python List

Elapsed time  during the whole program in seconds: 1414.1675999180006

That is close to 24 min.

Testing with LinkedList

linked_list = LinkedList(generated_name)
t1_start = perf_counter()
linked_list.delete_node_starting_with_adam()
t1_stop = perf_counter()
print("Elapsed time during the whole program in seconds:", t1_stop-t1_start)

Time take by LinkedList

Elapsed time during the whole program in seconds: 0.1479311709990725

This is significantly less than 24 min took by the List.

Validate if all the Adam has been removed in Linkedlist.

for name in linked_list:
    if name.startswith("Adam"):
        print("Adam Exists", name)

Conclusion

Linked lists are an amazing data structure for moving through an entire list while making insertions or deletions, as we never have to worry about shifting other data as we make an insertion or deletion.

The operation with Python List took 24 min, whereas LinkedList took only 0.42 seconds. That's percentage drop of approximately 99.99%.

That’s it for today, see you soon. :)

Fictionally Irrelevant.