Data structures for effective Python applications

Because computers rely on data to execute instructions, computing will always include data interaction. The amount of data can be overwhelming in real world applications, so developers must consistently devise methods to access it quickly and efficiently in a programmatic way.

A solid understanding of data structures is a great advantage for teams that specialize in developing tools and systems. Organizing data optimally maximizes efficiency and makes data processing easy and seamless. In this tutorial, you will learn about data structures in Python, and how you can use them to build efficient and highly performant applications. You will also learn how to automate tests for Python applications using continuous integration.

Prerequisites

The following items are required to complete this tutorial:

Python 3+ installed on your system
A CircleCI account
A GitHub account and knowledge of using Git and GitHub
An HTTP client; either Postman or Insomnia
Knowledge of Python and Flask
Understanding of how APIs work

What are data structures?

A data structure a method of organizing and managing data in memory to efficiently perform operations on that data. There are best practices for building data structures using different data types and for defining variables that hold data.

Why do you need Python data structures?

As system complexity grows, so does the data. Today, dealing with massive amounts of data has frequently resulted in issues with processor speeds, inefficiency in searching and sorting data, and also issues when handling multiple user requests. These issues are critical to performance and they must be addressed for maximum efficiency of any system.

Data structures are used to determine how a program or a system operates. Sequentially searching for data in an array will be time-consuming and resource-intensive. This can be resolved by using a data structure such as a hash table.

Data abstraction hides the complex details of a data structures so that a client program does not have to know about the implementation details. This is accomplished through abstract data types, which provide abstraction to your applications.

Data structures overview in Python

Python provides built-in data structures such as lists, dictionaries, sets, and tuples. Python users can create their own data structures and ultimately control how they are implemented. Stacks, queues, trees, linked lists, and graphs are examples of user-defined data structures.

This tutorial will focus on lists and dictionaries and how developers can use them for optimizing data storage and retrieval within an application.

The focus of the next section is on using optimized operations of lists and dictionaries to store, process, and retrieve data from the data structures.

Lists

A list is an ordered collection of elements. Because lists are mutable, their values can change. Items are the values contained within a list.

Note: The type of Python data structure determines its mutability. Mutable objects can change their state or content, while immutable objects cannot.

You denote Python lists by using square brackets. Here is an example of an empty list:

categories = [ ]

A comma (,) is used to separate items in a list:

categories = [ science, math, physics, religion ]

Lists can also contain items that are lists:

scores = [ [23, 45, 60] , [67, 69, 90] ]

The index, which is nothing more than the position of the values in a list, is what is used to access elements within a list. Here is an example of how to access various items in a list:

categories = [ science, math, physics, religion ]

Output:

categories [0]  # science
categories [1]  # math
categories [2]  # physics

You can also access items starting at the end of a list using the negative index. For example, to get to the last item in the preceding list:

categories [-1] # religion

You can add, delete, and modify items in a list because lists are mutable.

To change the value of an item in a list, reference the item’s position and then use the assignment operator:

categories [ 0 ] = “geography” # modifies the lists, replacing “science” with “geography”

To add new items to a list, use the append() method, which adds items to the end of a list:

categories .append( “linguistics” )

Another method you can use on a list is insert(), which adds items at a random position in a list. Other list objects include del(), pop(), clear(), and sort().

Dictionaries

A dictionary is a collection of key-value pair data types built into Python. Dictionaries, unlike lists, are indexed by keys, which can be strings, numbers, or tuples. In general, a dictionary key can be of any immutable type.

A dictionary’s keys must be distinct. Curly brackets {} are used to denote dictionaries.

Keys make it simple to work with dictionaries and also to store data of various types including lists or even other dictionaries. You can access, delete, and perform other operations on a dictionary using its keys. One important thing to remember about dictionaries is that storing data with an already existing key will overwrite the value that was previously associated with that key.

Here is an example using dictionaries:

student = { “name”: “Mike”, “age”: 24, “grade”: “A” }

To access items inside the dictionary:

student[ ‘name’ ] # Mike

Adding data to a dictionary is as simple as Dict[key] = value:

student[ ‘subjects’ ] = 7

Python dictionary methods include len(), pop(), index(), len(), and popitem().

These commonly used methods allow it to return values from a dictionary:

dict.items()     # return key-value pairs as a tuple
dict.keys()      # returns the dictionary's keys
dict.get(key)  # returns the value for the specified key and returns None if the key cannot be found.

Here is a diagram that shows the different types of Python data structures both built-in and user-defined.

Python data structures

In the next section of this tutorial, you will use what you have learned so far to create a simple API that will allow you to store, manipulate, and retrieve data from data structures.

API data flow diagram and storage

Now that you know what lists and dictionaries are, you can use them to create an API endpoint with login functionality and data stored only in the data structures. You can observe how the data will flow in the application and how you will use the data structures in your API.

Python API data flows

This API diagram shows a data store that is a Python dictionary. It is initialized with sample user data, which allows you to fully explore the capabilities of the data structures. Steps are labeled 1 through 4 to show the flow of data through the API.

Step one lets users create accounts by entering their first name, last name, username, and date of birth. These details are saved in the users dictionary.

Step two retrieves all of the users in the system. Before sending back the data, it is transformed from a nested dictionary to a sorted list.

Step three authenticates a user with an Id and a username.

Step four is where a data structure does the actual processing before sending back a response to the client.

Now that you know how your API will work, you can put the data structures into an actual application.

Implementing an API with data structures

Implementing with data structures consists of these steps:

Setting up an API skeleton
Initializing users
Creating users
Retrieving users

Setting up an API skeleton

To proceed with this tutorial, I encourage you to clone the application. That way you can go through the application and understand parts that are not fully documented as part of the tutorial.

git clone https://github.com/CIRCLECI-GWP/data-structure-python-applications

cd data-structure-python-applications

To install Python dependencies, you will need to set up a virtual environment using these commands:

Windows OS

py -3 -m venv venv;

venv\Scripts\activate;

Linux/macOS

python3 -m venv venv

source venv/bin/activate

Install the requirements from the requirements.txt file:

pip install -r requirements.txt

To start the API, run:

python main.py

Excellent job setting up and starting the API skeleton! The next step is to modify your routes and create a Linked list to handle user authentication and data transformation.

Initializing users

Considering your application state will last only while your server is running, you will create a users dictionary that will be initialized with sample data. To do this, manually add data to the user’s dictionary in main.py file, just after the Flask app configuration.

This is how the modified dictionary should look:

# main.py

users = {
   1: {"fname": "John", "lname": "Doe", "username": "John96", "dob":    "08/12/2000"},
   2: {
       "fname": "Mike",
       "lname": "Spencer",
       "username": "miker5",
       "dob": "01/08/2004",
   },
}

Now, even when your server is stopped, you will always have in-memory data to refer to as you test your endpoints or create new application data.

Creating users

With your users data dictionary initialized, make a create user function that will create your users. Use the requests library because this will be an API request, and user credentials will be coming in via a submission.

Use the get_json() method from the requests library - data = request.get_json() - to parse the incoming JSON request data and store it in a variable. No system should allow duplicate records, and your API is no exception. Therefore, when creating a new user, make sure that the new user’s details do not match any of the available records. If the same data is already available, notify the user and halt the process. Copy this snippet and paste it into the main.py file:

# main.py
@app.route("/user", methods=["POST"])
def create_user():

   data = request.get_json()

   if data["id"] not in users.keys():
       users[data["id"]] = {
           "fname": data["fname"],
           "lname": data["lname"],
           "username": data["username"],
           "dob": data["dob"],
       }
   else:
       return jsonify({"message": "user already exists"}), 401

   return jsonify({"message": "user created"}), 201

This block of code first determines whether the user id has already been stored in the users data store by searching for a similar id among the keys in the users dictionary. Checking the availability of an id is not programmatic; instead, you could have checked for a user’s email in production.

If that check passes, the new user information is entered into the dictionary, using the unique user id as a key. When a dictionary is stored against a user id this pattern results in a nested dictionary.

Flask includes a function called jsonify that allows you to serialize data to JSON format, which you will use to format the message that is sent back to the client.

Retrieving users

Fetching users could be as simple as returning the users dictionary, but there is a better approach. Instead, why not return all the users in descending order, with the most recently created user at the top?

Unfortunately, dictionaries are no longer orderable in Python 3, so they cannot be sorted. Instead, you can use this snippet:

# main.py
@app.route("/users", methods=["GET"])
def get_users():

   all_users = []

   for key in users:
       all_users.append(users[key])
       users[key]["id"] = key

   all_users.sort(key=lambda x: x["id"], reverse=True)

   return jsonify(users), 200

This creates an empty list and then loops through the users dictionary values, appending each to the list. Also, each user requires a unique identifier, so appending an id to the list is a great idea.

Remember that after appending to the list, you have a list of dictionaries, and you can’t fool Python by converting your nested dictionary to a list of dictionaries. That is why you should use a Lambda function to assign the id as a key for the sort method. The result is a list of dictionaries sorted in descending order by the user’s id values.

Finally, adding authentication functionality - /user/login - will be excellent after creating users and implementing a function to retrieve them in order.

# main.py
app.route("/users/login", methods=["POST"])
def login_user():

   data = request.get_json()

   id = data["id"]
   username = data["username"]

   if id in users.keys():
       if users[id]["username"] == username:
           return jsonify(f"Welcome, you are logged in as {username}"), 200

   return jsonify("Invalid login credentials"), 401

Before using an id and username, make sure that such a user exists by comparing the issued id to the records. If a match exists, you can validate the username. If a user enters valid log-in information, log them in and display a welcome message with their user name. In contrast, a failed login will simply display a message notifying them that their log-in attempt failed.

Begin testing the three endpoints that you just created: create a user, log them in, and retrieve all users added. If anything goes wrong you can always refer to the main.py file located in the cloned repository.

API call to create a user

Creating a user

API call to log in a user

User login

API call to retrieve all users

Retrieving all users

Using the power of storing data in lists and dictionaries, you can verify that the API is working as expected.

Writing tests for your API

It may seem tedious and time consuming, but adding tests to an application is never really a loss. This section of the tutorial includes tests for user creation, multiple user creation, login, and user retrieval of the API endpoints that you just created. I will guide you through testing your endpoints using Pytest, a Python application testing tool. The first test you will write will be one for creating a user:

# test_app.py
def test_create_user(client):

    response = client.post(
        "/user",
        json={
            "id": 4,
            "fname": "James",
            "lname": "Max",
            "username": "Maxy",
            "dob": "08/12/2000",
        },
    )

    assert response.headers["Content-Type"] == "application/json"
    assert response.status_code == 201

The code in this snippet creates a new user with the id of 4, first name James, and the last name Max. It then asserts that the response’s content type is JSON and that the status code is 201 for a created resource.

Next create a test to verify that the test can fetch created users:


def test_fetch_users(client):

    response = client.get("/users")

    assert response.headers["Content-Type"] == "application/json"
    assert response.status_code == 200

This test verifies that the endpoint returns a JSON response and that the status code is 200 for a successful request. These two tests are just a start; there are more tests in the in the root directory of the file test_app.py. Execute your tests by running pytest from the command line.

Successful PyTest execution

Passing tests verify that the API endpoints created from Python data structures would behave the same way as API endpoints using an actual database.

Now that your tests pass locally, integrate them with your continuous integration (CI) environment to ensure that changes deployed to your GitHub repository do not break the application. For this part of the tutorial, we will use CircleCI as the CI environment.

Integrating with CircleCI

To add CircleCI configuration to your project, create a new directory in the root of your project folder named .circleci. In that directory, create a file named config.yml. Add this configuration to the .circleci/config.yml file:

version: 2.1
orbs:
  python: circleci/python@2.1.1
jobs:
  build-and-test:
    docker:
      - image: cimg/python:3.12.1
    steps:
      - checkout
      - python/install-packages:
          pkg-manager: pip
      - run:
          name: Run tests
          command: pytest
workflows:
  sample:
    jobs:
      - build-and-test

This CircleCI configuration is a simple example of how to configure CircleCI to run your tests. It specifies that you are using a Python Docker image and installs the Python packages using the pip package manager. It then runs your tests using the pytest command.

Commit all your changed files using Git and push your changes to an existing GitHub repository.

Setting up CircleCI

Now that you have code on the remote main GitHub branch, you can set up CircleCI to run your tests. Go to the CircleCI dashboard and select the Projects tab. Find your repository in the list. For this tutorial it is the data-structure-python-applications repository.

Project repository

Select the option to Set up Project. Because you already pushed your CircleCI configuration to the remote repository, you can just type the name of the branch containing the configuration and click Set up Project.

Configuring CircleCI

Relax as your tests execute in CircleCI.

Successful CI test execution

Your tests passed successfully which can only mean one thing: it is time to celebrate!

Conclusion

By following along with this tutorial, you have gained a solid understanding of Python data structures, why you need them, and specifically, how to use the list and dictionary data structures in Python. You have also learned to write endpoints by using only data structures. You wrote tests for your API endpoints to avoid breaking existing changes. You learned how to integrate CircleCI and observed CircleCI executing your tests on the CI platform.

As always, I enjoyed creating this tutorial for you, and I hope you found it valuable. Until the next one, keep learning and keep building!

Waweru Mwaura is a software engineer and a life-long learner who specializes in quality engineering. He is an author at Packt and enjoys reading about engineering, finance, and technology. You can read more about him on his web profile.