Object validation and conversion with Marshmallow in Python
Software Engineer
Marshmallow is a Python library that converts complex data types to and from Python data types. It is a powerful tool for validating and converting data.
In this tutorial, you will learn how to use marshmallow to validate a bookmarking API where users can save their favorite URLs along with a simple description. Then, you will learn how to set up tests to ensure the correctness of your validations and automate these tests using continuous integration (CI).
By the end of this tutorial, you will have a robust bookmarking API with automated validation and testing processes, enabling you to make updates faster and more reliably.
Prerequisites
Before we start, we need to make sure these requirements are met:
- Python version >= 3.5 installed on our machine
- A GitHub account.
- A CircleCI account.
You need some working knowledge of these technologies:
- Basic understanding of SQLite databases.
- Basic understanding of the Flask framework.
First, you need to clone the repository from this GitHub link. Once you have cloned the repository, create a virtual environment and activate it to install your Python packages. Use this command:
pip3 install -r requirements.txt
Note: To avoid installing packages globally, use the virtual environments that come bundled with Python 3.5 by default. The default environments allow easy management of dependencies in Python projects.
Run this command:
virtualenv api-venv
You need to activate the environment after it’s created. Learn more about Python virtual environments.
Additional technologies
In addition to marshmallow, you will use these technologies in this tutorial:
marshmallow-sqlalchemy is a marshmallow extension for SQLAlchemy. flask-marshmallow is a Flask extension that makes it easy to generate URLs and hyperlinks for marshmallow objects.
Why marshmallow?
When working with data you will encounter the need to convert data from one data structure to another. Marshmallow is a Python library that lets you convert complex data types to and from native Python data types.
The Python interpreter supports multiple built-in datatypes, including integers, Booleans, tuples, lists, dictionaries, floats, sets, and arrays. These datatypes are essential for creating complex programs that can handle different types of operations.
One advantage is that marshmallow is platform agnostic and will work with any database technology. Always a win for developers!
To start working with marshmallow, you need to understand marshmallow schemas. Schemas define the structure and validation of the data.
Let’s assume you already have a model you are validating, which has similar fields to the ones in the schema.
Here’s an example of a schema for the bookmark API:
class BookMarkSchema(ma.Schema):
title = fields.Str(required=True, allow_none=False)
url = fields.URL(
relative=True, require_tld=True, error="invalid url representation"
)
description = fields.String(required=False, allow_none=True)
created_at = fields.DateTime(required=False, allow_none=True)
updated_at = fields.DateTime(required=False, allow_none=True)
In this schema:
title
is a string field that is required and cannot beNone
.url
is a URL field that allows relative URLs and requires a top-level domain. It provides a custom error message for invalid URLs.description
is an optional string field that can beNone
.created_at
andupdated_at
are optional DateTime fields that can beNone
.
This schema ensures that the bookmark data has the correct structure and validates that required fields are present. You can use this schema to create validations and define data types your API. Your next step is serializing and deserializing your data.
Implementing marshmallow in a Flask application
You will be building a bookmark API where users can store links to their favorite websites. You will first build a BookMarkModel
class to connect to the database engine on the structure of your tables, relationship, and fields. You will also add a BookMarkSchema
class, which will serialize and deserialize data from your model. You can get these classes from the cloned repository in the /src/app.py
file).
Note: We are using SQLAlchemy (an SQL Object Relational Mapper), to show how marshmallow parses data from Python types to serialized objects that can be stored in the database. Later on, you can deserialize the objects from the database to acceptable Python data types.
Create a structure for both your model and schema definition classes:
# src/app.py
# Adding SQLAlchemy
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///' + os.path.join(BASE_DIR, 'db.sqlite3')
db = SQLAlchemy(app)
# Add Marshmallow
ma = Marshmallow(app)
# Create the API model (SQLAlchemy)
class BookMarkModel(db.Model):
pass
# Create schema (marshmallow)
class BookMarkSchema(ma.Schema):
class Meta:
pass
bookMarkSchema = BookMarkSchema()
bookMarksScehma = BookMarkSchema(many = True)
This code snippet connects SQLAlchemy
to your application. It connects SQLite by default, or a SQL database if a URL is configured. Then it instantiates marshmallow to serialize
and deserialize
data as you send and receive it from your models.
Note: bookMark = BookMarkSchema()
will be responsible for deserializing one single dataset. This dataset is your POST
, READ
and UPDATE
routes where you are interacting with a single bookmark. bookMarks = BookMarkSchema(many =True)
is used to deserialize a list of items in our dataset (get all bookmarks requested).
Serializing and deserializing data in MarshMallow
In the previous code snippet, you created a marshmallow schema based on your BookMarkModel
. In this section, you will serialize data when saving to the database and deserialize data when retrieving from the database using marshmallow.
Serializing Data
Serialization refers to the process of converting a Python object into a format that can be transmitted or stored in a database. You need to convert the SQLAlchemy objects to JSON data that you can interact with in the API. Flask uses SQLAlchemy to connect to your database, so marshmallow is a great tool to use.
In this section, you will use marshmallow to return a JSON object once you create a bookmark.
Add a new bookmark to your SQLite database:
# src/app.py
# CREATE a bookmark
@app.route("/bookmark/", methods=["POST"])
def create_bookmark():
title = request.json["title"]
description = request.json["description"]
url = request.json["url"]
book_mark = BookMarkModel(
title=title,
description=description,
url=url,
created_at=datetime.datetime.now(),
updated_at=datetime.datetime.now(),
)
result = bookMarkSchema.dump(book_mark)
db.session.add(book_mark)
db.session.commit()
return result, 201
This code snippet creates a new bookmark using the BookMarkModel
class. It uses the db.session.add
& db.session.commit
method to add and save the bookmark to the database consecutively. To serialize objects, it uses the dump
method of the BookMarkSchema
class, which then returns a formatted JSON object.
You can validate that this works by adding a bookmark to your database with Postman and then retrieving it. First run the Flask app and make an API request to create a new bookmark.
To run the application, use this command:
FLASK_APP=src/app.py flask run
Once the application is running, you can make a request to the API to create a new bookmark using Postman and the POST
route /bookmark
.
The request is successful and the returned response is a JSON object. Hurray! Now that you have a bookmark created and serialized with marshmallow, retrieve it from your database and deserialize it.
Deserializing data
You know how to serialize data; deserialization is the exact opposite. In this case it is the process of converting JSON data to SQLAlchemy objects.
When deserializing objects from the SQLite database, marshmallow automatically converts the serialized data to a Python object. You will use the marshmallow function load()
:
book_mark = BookMarkModel(
title=title,
description=description,
url=url,
created_at=datetime.datetime.now(),
updated_at=datetime.datetime.now(),
)
try:
json_input = request.get_json()
result = bookMarkSchema.load(json_input)
except ValidationError as err:
return {"errors": err.messages}, 422
This snippet for deserialization returns an SQLAlchemy object after converting it from a JSON response that would otherwise be to your API.
Now that you have some data serialized and deserialized, we can write tests to make sure that your endpoints are returning the correct data. In a later section, you’ll learn how to automate these tests in CircleCI so that they run on every code change, giving you more confidence in your updates.
Testing serialization
Software testing is a great way to make sure that your code is working as expected, and to inspire confidence in your applications. In this section, you will create a test to ensure that serialization is working as expected.
# Test if one can add data to the database
def test_add_bookmark():
my_data = {
"title": 'a unique title',
"description": 'a bookmark description',
"url": 'unique bookmark url',
}
res = app.test_client().post(
"/bookmark/",
data=json.dumps(my_data),
content_type="application/json",
)
assert res.status_code == 201
This test shows whether you can successfully create a new bookmark. You are also testing that the response is the 201 status code that you defined when you created your method. For the final step in verifying success, add the test to your CircleCI pipeline.
Setting up Git and pushing to CircleCI
To set up CircleCI, initialize a Git repository in the project by running:
git init
Create a .gitignore
file in the root directory. Inside the file add any modules you want to ignore, to prevent them from being added to your remote repository. The next step will be to add a commit and then push your project to GitHub.
Log into CircleCI and go to Projects. You should find all the GitHub repositories associated with your GitHub username or your organization. The specific repository that you want to set up in CircleCI is object-validation-and-conversion-with-marshmallow
. On the Projects dashboard, select the option to set up the selected project and use the option for an existing configuration.
Note: After initiating the build, you should expect your pipeline to fail, because you haven’t added your customized .circleci/config.yml
configuration file to GitHub. We’ll take care of that next.
Setting up CircleCI
Create a .circleci
directory in your root directory and add a config.yml
file. The config file carries the CircleCI configuration for every project.
Copy the folling configuration code into your config.yml
file. Note that we use the CircleCI Python orb to simplify setup and to execute our tests using Pytest:
version: 2.1
orbs:
python: circleci/python@1.2
workflows:
sample:
jobs:
- build-and-test
jobs:
build-and-test:
docker:
- image: cimg/python:3.8
steps:
- checkout
- python/install-packages:
pkg-manager: pip
- run:
name: Run tests
command: python -m pytest
Note:
CircleCI orbs are reusable packages of reusable YAML configuration that condenses multiple lines of code into a single line: python: circleci/python@1.2
. You might need to enable the use of orbs in your organization settings if you are the administrator. Or, you may need to request permission from your organization’s admin to allow the use of third party orbs.
After setting up the configuration, you can now push your configuration to GitHub. CircleCI will automatically start building your project.
Voila! On checking the CircleCI dashboard and expanding the build details, you can verify that your marshmallow test ran successfully and was integrated into CircleCI.
Now that you have successfully set up your CI, you can start validating data using marshmallow.
Object validation
Marshmallow provides a simple way to validate objects before sending them to the database. Marshmallow schemas are in charge of validations; use the validate()
method. In the schema for creating a bookmark, you will add validations to make sure that you allow only the strings datatype for the title of the bookmark.
# src/app.py
class BookMarkSchema(ma.Schema):
title = fields.String(required=True, allow_none=False)
...
Once you have passed on rules to the schema, you can then use the validate()
method to validate the data:
# src/app.py
def create_bookmark():
title = request.json["title"]
description = request.json["description"]
url = request.json["url"]
# Validate the data from request before serialization
error = bookMarkSchema.validate({"title": title, "description": description, "url": url})
if error:
return jsonify(error)
This code snippet uses the validate()
method to check that the returned data matches the described schema validations. In the event of an error, it will return the error to the user.
To verify that this is working, make a POST
request to Postman with an integer value on the title
. Your API should throw an error.
You know that validations are working properly, because there’s an error when an invalid title
is sent with the request.
Adding more tests
While this tutorial has not been able to cover all the endpoints used for the cloned repository, you can add tests for more endpoints like fetching all bookmarks or a single bookmark:
tests/test_bookmark.py
# Test if all bookmarks are returned
def test_get_all_bookmarks_route():
res = app.test_client().get("/bookmarks/")
assert res.headers["Content-Type"] == "application/json"
assert res.status_code == 200
# Test if a single bookmark is returned
def test_get_one_bookmark_route():
res = app.test_client().get("/bookmark/1/")
assert res.headers["Content-Type"] == "application/json"
assert res.status_code == 200
These tests verify that you can retrieve our created bookmarks, whether you want to retrieve all of them or just one of them. The tests also verify that the data received is a JSON
object, further validating the serialization process of marshmallow.
Before you celebrate, you need to save and commit your tests and push them to GitHub. A successful pipeline run signifies that everything went well.
Conclusion
In this tutorial you have explored the power of using marshmallow to deserialize and serialize data and to carry out validation. You have gone through the processes of creating models, creating schemas, and connecting your marshmallow schemas to our models. You also learned how to test marshmallow responses with Pytest and how to use validations to allow only specific types of responses. Finally, you automated your tests using a continuous integration pipeline set up in CircleCI.
I hope this tutorial was helpful for understanding how serialization and deserialization work using marshmallow.