Marshmallow is a Python library that converts complex data types to and from Python data types. It is a powerful tool for both validating and converting data. In this tutorial, we will be using Marshmallow to validate a simple bookmarks API where users can save their favorite URLs along with a short description of each site.

Prerequisites

To get the most out of the tutorial you will need:

  1. Python version >= 3.10 installed on our machine
  2. A GitHub account.
  3. A CircleCI account.
  4. Basic understanding of SQLite databases
  5. Basic understanding of the Flask framework

Our tutorials are platform-agnostic, but use CircleCI as an example. If you don’t have a CircleCI account, sign up for a free one here.

Cloning the repository and creating a virtual environment

Begin by cloning the repository from this GitHub link.

git clone https://github.com/CIRCLECI-GWP/object-validation-and-conversion-marshmallow.git

Once you have cloned the repository the next step is to create a virtual environment and activate it to install our Python packages. Use these commands:

cd object-validation-and-conversion-marshmallow

python3 -m venv .venv

source .venv/bin/activate

pip3 install -r requirements.txt

Note: Use the following commands for Windows

cd object-validation-and-conversion-marshmallow

py -3 -m venv .venv

.venv\Scripts\activate

pip3 install -r requirements.txt

Why Marshmallow?

Often when working with data, there is a need to convert it from one data structure to another. Marshmallow is a Python library that converts complex data types to native Python data types and vice versa.

The Python interpreter supports some built-in data types including integers, boolean, tuple, list, dictionary, floats, sets, and arrays. These are essential for developers who want to create complex programs that can handle different types of operations.

One advantage to Marshmallow is that it will work with any database technology. It is platform-agnostic, which is always a win for developers.

To extend Marshmallow even further, we will be using these technologies:

  • Marshmallow-sqlalchemy is an extension for SQLAlchemy, which is an SQL Object Relational Mapper.
  • Flask-marshmallow is a Flask extension for Marshmallow that makes it easy to use Marshmallow with Flask. It also generates URLs and hyperlinks for Marshmallow objects.

Understanding Marshmallow schemas

Understanding how Marshmallow schemas work is essential for working with it. Schemas serve as the core of Marshmallow by keeping track of the data through the declared schema. The schemas define the structure of the data and also the validation of the data.

An example of a schema for our bookmarks app would be:

class BookMarkSchema(ma.Schema):
    title = fields.String(
        metadata={
            "required": True,
            "allow_none": False,
            "validate": must_not_be_blank
        }
    )
    url = fields.URL(
        metadata={
            "relative": True,
            "require_tld": True,
            "error": "invalid url representation",
        }
    )
    description = fields.String(metadata={"required": False, "allow_none": True})
    created_at = fields.DateTime(metadata={"required": False, "allow_none": True})
    updated_at = fields.DateTime(metadata={"required": False, "allow_none": True})

This schema creates validations and also defines data types for the fields in our schema. With schemas out of the way, it is time to serialize and deserialize your data.

Data serialization and deserialization in Marshmallow

Implementing Marshmallow in a Flask application

To build our bookmark API, we will first build a BookMarkModel class. This class will connect to the database engine on the structure of our tables, relationship, and fields. We will also add a BookMarkSchema class to serialize and deserialize data from our model. These classes are available in the cloned repository in the /src/app.py file).

To show how Marshmallow parses data from Python types to serialized objects, we are using SQLAlchemy. The serialized objects can be stored in the database and can later be deserialized from the database to acceptable Python data types.

Start by creating a structure for both the model and schema definition classes.

# Adding SQLAlchemy
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///' + os.path.join(BASE_DIR, 'db.sqlite3')
db = SQLAlchemy(app)

# Add Marshmallow
ma = Marshmallow(app)

app.app_context().push()

# Create the API model (SQLAlchemy)
class BookMarkModel(db.Model):
    pass

# Create schema (marshmallow)
class BookMarkSchema(ma.Schema):
    class Meta:
        pass

bookMarkSchema = BookMarkSchema()
bookMarksSchema = BookMarkSchema(many=True)

This code snippet first connects SQLAlchemy to our application, using SQLite by default. When a URL is configured, it connects to that SQL database. The snipped then instantiates Marshmallow to serialize and deserialize the data as it is sent and received from our models.

The bookMark = BookMarkSchema() schema is responsible for deserializing one single dataset, (the POST, READ and UPDATE routes) when interacting with a single bookmark. In contrast, bookMarks = BookMarkSchema(many =True) is used to deserialize a list of items in the dataset, for example to get all requested bookmarks.

Serializing and deserializing data in Marshmallow

In the previous code snippet, we created a Marshmallow schema based on our BookMarkModel. In this section, we will use b Marshmallow to serialize data when saving to the database and deserialize data when retrieving from the database.

Serializing Python data

Serialization is the process of converting a Python object into a format that can be stored in a database or transmitted. In Flask we use SQLAlchemy to connect to our database. We need to convert the SQLAlchemy objects to JSON data that can then interact with our API. Marshmallow is a great tool to use for this process. In this section, we will use Marshmallow to return a JSON object once we create a bookmark. We will do this by adding a new bookmark to our SQLite database.

# CREATE a bookmark
@app.route("/bookmark/", methods=["POST"])
def create_bookmark():
    title = request.json["title"]
    description = request.json["description"]
    url = request.json["url"]

    book_mark = BookMarkModel(
        title=title,
        description=description,
        url=url,
        created_at=datetime.datetime.now(),
        updated_at=datetime.datetime.now(),
    )

    result = bookMarkSchema.load(json_input)

    db.session.add(book_mark)
    db.session.commit()
    return result, 201

This code snippet creates a new bookmark using the BookMarkModel class. It uses the db.session.add and db.session.commit methods to add and save the bookmark to the database consecutively. To serialize objects, the snippet uses the dump method of the BookMarkSchema class, which returns a formatted JSON object.

To validate that this works, we can add a bookmark to the database with Postman and retrieve it. First run the Flask app using this command:

FLASK_APP=src/app.py flask run

Once the application is running, we can now make a request to our API to create a new bookmark using Postman and the POST route /bookmark.

Using Postman to make an API request to create a bookmark

The request returns a response that is a JSON object. Success! Now that a bookmark has been created and serialized with Marshmallow, you can retrieve it from the database and deserialize it.

Deserializing JSON data back to SQLite

Deserialization is the opposite of serialization. To serialize, we converted data from Python to JSON. To deserialize, we are converting JSON data to SQLAlchemy objects. When deserializing objects from the SQLite database, Marshmallow automatically converts the serialized data to a Python object. Marshmallow uses the load() function for this.

book_mark = BookMarkModel(
        title=title,
        description=description,
        url=url,
        created_at=datetime.datetime.now(),
        updated_at=datetime.datetime.now(),
    )
    try:
        json_input = request.get_json()
        result = bookMarkSchema.load(json_input)
    except ValidationError as err:
        return {"errors": err.messages}, 422

For deserialization, this snippet returns an SQLAlchemy object that has been converted from a JSON response from our API.

Now that some data has been serialized and deserialized, the next step is to write tests. The tests will make sure that the endpoints are returning the correct data. To make completely sure that everything is okay, we will also run these tests on CircleCI.

Testing Serialization

Testing inspires confidence in your applications by verifying your code is working as expected. In this section, we will create a test to make sure that our serialization is working as expected.

# Test if one can add data to the database
def test_add_bookmark():
    my_data = {
        "title": 'a unique title',
        "description": 'a bookmark description',
        "url": 'unique bookmark url',
    }
    res = app.test_client().post(
        "/bookmark/",
        data=json.dumps(my_data),
        content_type="application/json",
    )
    assert res.status_code == 201

This test verifies that we can successfully create a new bookmark. It also tests that the response is the 201 status code we defined when we created our method. Now we can further verify success by adding the test to our CircleCI pipeline.

Setting up Git and pushing to CircleCI

To set up CircleCI, initialize a Git repository in the project by running this command:

git init

Then, create a .gitignore file in the root directory. Inside the file add any modules you want to keep from being added to your remote repository. The next step will be to add a commit, and then push your project to GitHub.

Log in to CircleCI and go to Projects, where you should see all the GitHub repositories associated with your GitHub username, or your organization. The specific repository that you want to set up for this tutorial is object-validation-and-conversion-with-marshmallow. On the Projects dashboard, select the option to set up the selected project, then use the option for an existing configuration.

Note: After initiating the build, expect your pipeline to fail. You still need to add the customized .circleci/config.yml configuration file to GitHub for the project to build properly. We’ll do that next.

Setting Up CircleCI

First, create a .circleci directory in your root directory. Add a config.yml file for the CircleCI configuration for every project. On this setup, we will use the CircleCI Python orb. Use this configuration to execute your tests.

version: 2.1
orbs:
  python: circleci/python@2.1.1

workflows:
  sample:
    jobs:
      - build-and-test

jobs:
  build-and-test:
    description: "Setup Flask and run tests"
    executor: python/default
    steps:
      - checkout
      - python/install-packages:
          pkg-manager: pip
      - run:
          name: Run tests
          command: pytest -v

Using third-party orbs

CircleCI orbs are reusable packages of reusable yaml configurations that condense multiple lines of code into a single line. To allow the use of third party orbs like python@2.1.1 you may need to:

  • Enable organization settings if you are the administrator, or
  • Request permission from your organization’s CircleCI admin.

After setting up the configuration, push the configuration to Github. CircleCI will start building the project.

Voila! Go to the CircleCI dashboard and expand the build details. Verify that the tests ran successfully and were integrated into CircleCI.

Build details showing pipeline setup success

Now that you have your CI pipeline set up, you can move on to validating data using Marshmallow.

Object validation using Marshmallow

Marshmallow provides a simple way to validate object data before sending it to the database. Marshmallow schemas use the validate() method in the schema for creating a bookmark. In this step, we will add validations to make sure that we allow only strings, and no other type, for the title of the bookmark.

class BookMarkSchema(ma.Schema):
    title = fields.String(
        metadata={
            "required": True,
            "allow_none": False,
            "validate": must_not_be_blank
        }
    )
    ...

When the rules have been passed on to the schema, we can use the validate() method to verify the data on the method that creates a new bookmark:

def create_bookmark():
    title = request.json["title"]
    description = request.json["description"]
    url = request.json["url"]

    # Validate the data from request before serialization
    error = bookMarkSchema.validate({"title": title, "description": description, "url": url})
    if error:
        return jsonify(error)

In the code snippet above, we are using the validate() method to check that the returned data matches our described schema validations and in the event of an error, we will return the error to the user.

To verify that this is working, make a POST request to Postman with an integer value in the title. Your API should throw an error.

Error showing incorrect value in title

You will know your validations are working properly when an invalid title sent with the request results in an error.

Adding tests for more endpoints

This tutorial does not cover all the endpoints used for the cloned repository. If you want to continue on your own, you can add tests for endpoints like fetching all bookmarks or fetching a single bookmark. Use this code:

# Test if all bookmarks are returned
def test_get_all_bookmarks_route():
    res = app.test_client().get("/bookmarks/")
    assert res.headers["Content-Type"] == "application/json"
    assert res.status_code == 200

# Test if a single bookmark is returned
def test_get_one_bookmark_route():
    res = app.test_client().get("/bookmark/1/")
    assert res.headers["Content-Type"] == "application/json"
    assert res.status_code == 200

# Test json data format is returned
def test_get_json_data_format_returns():
    res = app.test_client().get("/bookmarks/")
    assert res.status_code == 200
    assert res.headers["Content-Type"] == "application/json"

These tests verify that we can retrieve our created bookmarks, whether it is all of them or just one. The tests also verify that the data received is a JSON object, consistent with the serialization process of Marshmallow.

Before we can call this a party, we will need to save and commit our tests and push them to GitHub. A successful pipeline run signifies that everything went well.

Conclusion

In this article we have explored the power of using Marshmallow to deserialize and serialize data and also carry out validation. Through the article we have gone through the processes of creating models, creating schemas, and connecting them. We also learned how to use validations to allow only specific types of responses.

I hope this tutorial was helpful, and that you understand more about how serialization and deserialization work using Marshmallow. Get the rest of your team involved by adding tests for more endpoints, and applying what you have learned to your own projects.


Waweru Mwaura is a software engineer and a life-long learner who specializes in quality engineering. He is an author at Packt and enjoys reading about engineering, finance, and technology. You can read more about him on his web profile.

Read more posts by Waweru Mwaura