Data structures for effective Python applications
Software Engineer
Because computers rely on data to execute instructions, computing will always include data interaction. The amount of data can be overwhelming in real world applications, so developers must consistently devise methods to access it quickly and efficiently in a programmatic way.
A solid understanding of data structures is a great advantage for teams that specialize in developing tools and systems. Organizing data optimally maximizes efficiency and makes data processing easy and seamless. In this tutorial, you will learn about data structures in Python, and how you can use them to build efficient and highly performant applications. You will also learn how to automate tests for Python applications using continuous integration.
Prerequisites
The following items are required to complete this tutorial:
- Python 3+ installed on your system
- A CircleCI account
- A GitHub account and knowledge of using Git and GitHub
- An HTTP client; either Postman or Insomnia
- Knowledge of Python and Flask
- Understanding of how APIs work
What are data structures?
A data structure a method of organizing and managing data in memory to efficiently perform operations on that data. There are best practices for building data structures using different data types and for defining variables that hold data.
Why do you need Python data structures?
As system complexity grows, so does the data. Today, dealing with massive amounts of data has frequently resulted in issues with processor speeds, inefficiency in searching and sorting data, and also issues when handling multiple user requests. These issues are critical to performance and they must be addressed for maximum efficiency of any system.
Data structures are used to determine how a program or a system operates. Sequentially searching for data in an array will be time-consuming and resource-intensive. This can be resolved by using a data structure such as a hash table.
Data abstraction hides the complex details of a data structures so that a client program does not have to know about the implementation details. This is accomplished through abstract data types, which provide abstraction to your applications.
Data structures overview in Python
Python provides built-in data structures such as lists, dictionaries, sets, and tuples. Python users can create their own data structures and ultimately control how they are implemented. Stacks, queues, trees, linked lists, and graphs are examples of user-defined data structures.
This tutorial will focus on lists and dictionaries and how developers can use them for optimizing data storage and retrieval within an application.
The focus of the next section is on using optimized operations of lists and dictionaries to store, process, and retrieve data from the data structures.
Lists
A list is an ordered collection of elements. Because lists are mutable, their values can change. Items are the values contained within a list.
Note: The type of Python data structure determines its mutability. Mutable objects can change their state or content, while immutable objects cannot.
You denote Python lists by using square brackets. Here is an example of an empty list:
categories = [ ]
A comma (,
) is used to separate items in a list:
categories = [ science, math, physics, religion ]
Lists can also contain items that are lists:
scores = [ [23, 45, 60] , [67, 69, 90] ]
The index
, which is nothing more than the position of the values in a list, is what is used to access elements within a list. Here is an example of how to access various items in a list:
categories = [ science, math, physics, religion ]
Output:
categories [0] # science
categories [1] # math
categories [2] # physics
You can also access items starting at the end of a list using the negative index. For example, to get to the last item in the preceding list:
categories [-1] # religion
You can add, delete, and modify items in a list because lists are mutable.
To change the value of an item in a list, reference the item’s position and then use the assignment operator:
categories [ 0 ] = “geography” # modifies the lists, replacing “science” with “geography”
To add new items to a list, use the append()
method, which adds items to the end of a list:
categories .append( “linguistics” )
Another method you can use on a list is insert()
, which adds items at a random position in a list. Other list objects include del()
, pop()
, clear()
, and sort()
.
Dictionaries
A dictionary is a collection of key-value
pair data types built into Python. Dictionaries, unlike lists, are indexed by keys, which can be strings, numbers, or tuples. In general, a dictionary key can be of any immutable type.
A dictionary’s keys must be distinct. Curly brackets {}
are used to denote dictionaries.
Keys make it simple to work with dictionaries and also to store data of various types including lists or even other dictionaries. You can access, delete, and perform other operations on a dictionary using its keys. One important thing to remember about dictionaries is that storing data with an already existing key will overwrite the value that was previously associated with that key.
Here is an example using dictionaries:
student = { “name”: “Mike”, “age”: 24, “grade”: “A” }
To access items inside the dictionary:
student[ ‘name’ ] # Mike
Adding data to a dictionary is as simple as Dict[key] = value:
student[ ‘subjects’ ] = 7
Python dictionary methods include len()
, pop()
, index()
, len()
, and popitem()
.
These commonly used methods allow it to return values from a dictionary:
dict.items() # return key-value pairs as a tuple
dict.keys() # returns the dictionary's keys
dict.get(key) # returns the value for the specified key and returns None if the key cannot be found.
Here is a diagram that shows the different types of Python data structures both built-in and user-defined.
In the next section of this tutorial, you will use what you have learned so far to create a simple API that will allow you to store, manipulate, and retrieve data from data structures.
API data flow diagram and storage
Now that you know what lists and dictionaries are, you can use them to create an API endpoint with login functionality and data stored only in the data structures. You can observe how the data will flow in the application and how you will use the data structures in your API.
This API diagram shows a data store that is a Python dictionary. It is initialized with sample user data, which allows you to fully explore the capabilities of the data structures. Steps are labeled 1
through 4
to show the flow of data through the API.
Step one lets users create accounts by entering their first name
, last name
, username
, and date of birth
. These details are saved in the users
dictionary.
Step two retrieves all of the users in the system. Before sending back the data, it is transformed from a nested dictionary to a sorted list.
Step three authenticates a user with an Id
and a username
.
Step four is where a data structure does the actual processing before sending back a response to the client.
Now that you know how your API will work, you can put the data structures into an actual application.
Implementing an API with data structures
Implementing with data structures consists of these steps:
- Setting up an API skeleton
- Initializing users
- Creating users
- Retrieving users
Setting up an API skeleton
To proceed with this tutorial, I encourage you to clone the application. That way you can go through the application and understand parts that are not fully documented as part of the tutorial.
git clone https://github.com/CIRCLECI-GWP/data-structure-python-applications
cd data-structure-python-applications
To install Python dependencies, you will need to set up a virtual environment using these commands:
Windows OS
py -3 -m venv venv;
venv\Scripts\activate;
Linux/macOS
python3 -m venv venv
source venv/bin/activate
Install the requirements from the requirements.txt file:
pip install -r requirements.txt
To start the API, run:
python main.py
Excellent job setting up and starting the API skeleton! The next step is to modify your routes and create a Linked list to handle user authentication and data transformation.
Initializing users
Considering your application state will last only while your server is running, you will create a users dictionary that will be initialized with sample data. To do this, manually add data to the user’s dictionary in main.py
file, just after the Flask app configuration.
This is how the modified dictionary should look:
# main.py
users = {
1: {"fname": "John", "lname": "Doe", "username": "John96", "dob": "08/12/2000"},
2: {
"fname": "Mike",
"lname": "Spencer",
"username": "miker5",
"dob": "01/08/2004",
},
}
Now, even when your server is stopped, you will always have in-memory data to refer to as you test your endpoints or create new application data.
Creating users
With your users data dictionary initialized, make a create user
function that will create your users. Use the requests library because this will be an API request, and user credentials will be coming in via a submission.
Use the get_json()
method from the requests library - data = request.get_json()
- to parse the incoming JSON request data and store it in a variable. No system should allow duplicate records, and your API is no exception. Therefore, when creating a new user, make sure that the new user’s details do not match any of the available records. If the same data is already available, notify the user and halt the process. Copy this snippet and paste it into the main.py
file:
# main.py
@app.route("/user", methods=["POST"])
def create_user():
data = request.get_json()
if data["id"] not in users.keys():
users[data["id"]] = {
"fname": data["fname"],
"lname": data["lname"],
"username": data["username"],
"dob": data["dob"],
}
else:
return jsonify({"message": "user already exists"}), 401
return jsonify({"message": "user created"}), 201
This block of code first determines whether the user id
has already been stored in the users data store by searching for a similar id among the keys in the users dictionary. Checking the availability of an id
is not programmatic; instead, you could have checked for a user’s email in production.
If that check passes, the new user information is entered into the dictionary, using the unique user id as a key. When a dictionary is stored against a user id this pattern results in a nested dictionary.
Flask includes a function called jsonify
that allows you to serialize data to JSON format, which you will use to format the message that is sent back to the client.
Retrieving users
Fetching users could be as simple as returning the users
dictionary, but there is a better approach. Instead, why not return all the users in descending order, with the most recently created user at the top?
Unfortunately, dictionaries are no longer orderable in Python 3, so they cannot be sorted. Instead, you can use this snippet:
# main.py
@app.route("/users", methods=["GET"])
def get_users():
all_users = []
for key in users:
all_users.append(users[key])
users[key]["id"] = key
all_users.sort(key=lambda x: x["id"], reverse=True)
return jsonify(users), 200
This creates an empty list and then loops through the users
dictionary values, appending each to the list. Also, each user requires a unique identifier, so appending an id
to the list is a great idea.
Remember that after appending to the list, you have a list of dictionaries, and you can’t fool Python by converting your nested dictionary to a list of dictionaries. That is why you should use a Lambda function to assign the id as a key for the sort method. The result is a list of dictionaries sorted in descending order by the user’s id
values.
Finally, adding authentication functionality - /user/login
- will be excellent after creating users and implementing a function to retrieve them in order.
# main.py
app.route("/users/login", methods=["POST"])
def login_user():
data = request.get_json()
id = data["id"]
username = data["username"]
if id in users.keys():
if users[id]["username"] == username:
return jsonify(f"Welcome, you are logged in as {username}"), 200
return jsonify("Invalid login credentials"), 401
Before using an id
and username
, make sure that such a user exists by comparing the issued id to the records. If a match exists, you can validate the username. If a user enters valid log-in information, log them in and display a welcome message with their user name. In contrast, a failed login will simply display a message notifying them that their log-in attempt failed.
Begin testing the three endpoints that you just created: create a user
, log them in
, and retrieve all users added
. If anything goes wrong you can always refer to the main.py
file located in the cloned repository.
API call to create a user
API call to log in a user
API call to retrieve all users
Using the power of storing data in lists and dictionaries, you can verify that the API is working as expected.
Writing tests for your API
It may seem tedious and time consuming, but adding tests to an application is never really a loss. This section of the tutorial includes tests for user creation, multiple user creation, login, and user retrieval of the API endpoints that you just created. I will guide you through testing your endpoints using Pytest
, a Python application testing tool. The first test you will write will be one for creating a user:
# test_app.py
def test_create_user(client):
response = client.post(
"/user",
json={
"id": 4,
"fname": "James",
"lname": "Max",
"username": "Maxy",
"dob": "08/12/2000",
},
)
assert response.headers["Content-Type"] == "application/json"
assert response.status_code == 201
The code in this snippet creates a new user with the id of 4
, first name James
, and the last name Max
. It then asserts that the response’s content type is JSON and that the status code is 201
for a created resource.
Next create a test to verify that the test can fetch created users:
def test_fetch_users(client):
response = client.get("/users")
assert response.headers["Content-Type"] == "application/json"
assert response.status_code == 200
This test verifies that the endpoint returns a JSON response and that the status code is 200
for a successful request. These two tests are just a start; there are more tests in the in the root directory of the file test_app.py
. Execute your tests by running pytest
from the command line.
Passing tests verify that the API endpoints created from Python data structures would behave the same way as API endpoints using an actual database.
Now that your tests pass locally, integrate them with your continuous integration (CI) environment to ensure that changes deployed to your GitHub repository do not break the application. For this part of the tutorial, we will use CircleCI as the CI environment.
Integrating with CircleCI
To add CircleCI configuration to your project, create a new directory in the root of your project folder named .circleci
. In that directory, create a file named config.yml
. Add this configuration to the .circleci/config.yml
file:
version: 2.1
orbs:
python: circleci/python@2.1.1
jobs:
build-and-test:
docker:
- image: cimg/python:3.12.1
steps:
- checkout
- python/install-packages:
pkg-manager: pip
- run:
name: Run tests
command: pytest
workflows:
sample:
jobs:
- build-and-test
This CircleCI configuration is a simple example of how to configure CircleCI to run your tests. It specifies that you are using a Python Docker image and installs the Python packages using the pip
package manager. It then runs your tests using the pytest
command.
Commit all your changed files using Git and push your changes to an existing GitHub repository.
Setting up CircleCI
Now that you have code on the remote main
GitHub branch, you can set up CircleCI to run your tests. Go to the CircleCI dashboard and select the Projects tab. Find your repository in the list. For this tutorial it is the data-structure-python-applications
repository.
Select the option to Set up Project. Because you already pushed your CircleCI configuration to the remote repository, you can just type the name of the branch containing the configuration and click Set up Project.
Relax as your tests execute in CircleCI.
Your tests passed successfully which can only mean one thing: it is time to celebrate!
Conclusion
By following along with this tutorial, you have gained a solid understanding of Python data structures, why you need them, and specifically, how to use the list and dictionary data structures in Python. You have also learned to write endpoints by using only data structures. You wrote tests for your API endpoints to avoid breaking existing changes. You learned how to integrate CircleCI and observed CircleCI executing your tests on the CI platform.
As always, I enjoyed creating this tutorial for you, and I hope you found it valuable. Until the next one, keep learning and keep building!