Skip to content Skip to footer

Python Serialization Vulnerabilities – Pickle

In programming, serialization refers to gathering data from objects, converting them to a string of bytes, and then writing them to disk. This data can later be deserialized, and the original objects can be recreated. Many programming languages, such as PHP, Java, Ruby, and , offer built-in methods for serialization.

In , the serialization process is called “pickling,” it can be achieved using the pickle module.
When working with , serialization is achieved using the pickle.dumps() function to serialize data and the pickle.loads() function to deserialize it (also known as pickling and unpickling). For example, if we have an array, we can pickle it as shown below:

import pickle
variable = pickle.dumps([1, 2, 3])
print(variable)

The output would be a byte string, which is the serialized object. Later, we can deserialize the object using pickle.loads() as shown below:

import pickle
deserialized_variable = pickle.loads(variable)

This is useful when we want to save some variables from a program on the drive as a binary file, which can be later used in other programs. For instance, we can create an array and save it as a binary file, as shown below:

import pickle
variable = pickle.dumps([1, 2, 3])
with open("myarray.pkl", "wb") as f:
    f.write(variable)

In the above example, we have saved the serialized object to the file “myarray.pkl”. We can later read this file and deserialize it using the following code:

import pickle
with open("myarray.pkl", "rb") as f:
    serialized_variable = f.read()
deserialized_variable = pickle.loads(serialized_variable)
Python

As you can see, we can now operate on the deserialized object like an array again. This feature can be helpful when a developer wants to save all the data and states of variables at a certain point, for example, when quitting the IDE.
Serialization in Web

We have talked about serialization in software applications, but what about its use in web ? Well, HTTP is a stateless protocol, meaning that one request's state does not depend on the previous request. However, there are times when it's necessary to maintain state, and that's where cookies come in. Cookies can bring a sense of statefulness to the HTTP protocol.

Serialization becomes a valuable tool to retain a user's information and some data for their next interaction with the server. We can serialize the data and store it in a cookie, which takes up space on the user's device rather than the server's. Then, we can deserialize the data and use it on the site for the subsequent request.

In Python web , pickle is often used for this purpose. However, one caveat is that the pickle is deserialized unsafely, and its content is controlled by the client. JSON serialization is a much safer alternative! Unlike other serialization formats, JSON doesn't allow executable code to be embedded within the data. This eliminates the risk of code injection vulnerabilities that can be exploited by malicious actors.

It's essential to remember that it is possible to construct malicious pickle data that will execute arbitrary code. Therefore, caution must be exercised when using pickle in web .
Pickling is a process used in Python to serialize data types, but what if we want to pickle our own custom classes? Python can deserialize well-known classes easily, but what happens when we try to deserialize custom classes like server connections and complex networking scripts? It might lead to discrepancies and errors because Python doesn't know how to handle them. However, Python provides a way to pickle custom classes, and we can define a method called “reduce” to implement support for pickling on our custom object. This method returns a function and a pair of arguments to call that function. We can embed this code in the pickle file, so when Python unpickles this string of bytes, it runs the code to reconstruct that object properly.

Here is an example: we will create a class called EvilPickle to demonstrate how to use custom pickling and unpickling code. To implement support for pickling on our custom object, we define a method called “reduce” which returns a function and pair of arguments to call that function with. In this example, we'll run a simple “cat /etc/passwd” command using the os.system function and write it to a binary file called backup.data.

However, this method can be misused to run malicious code on the user's system. For example, the deserializer can run “cat /etc/passwd” on their system. To do this, we save the above code in an “evilpickle.py” file and execute it. We then use the pickle.loads function to deserialize it. This can lead to security vulnerabilities, and attackers can gain unauthorized access to sensitive data.

Therefore, it is essential to be cautious while using pickling and unpickling in Python. It's vital to ensure that the pickle data is user-controlled and is unpickled at the server. PyTorch's ML model used pickle for serialization, which was vulnerable to arbitrary code execution. However, SafeTensors overcame this issue.

Python YAML and Python Pickle are two serialization formats in Python. Though both serve the same purpose, Python YAML is considered a safer option than Python Pickle. The reason is that Python Pickle allows the execution of arbitrary code by default, which can be a potential security threat. To demonstrate this, consider the following piece of code:

import pickle
pickle.loads(b'cos\nsystem\n(S\'cat /etc/passwd\'\ntR.')
Python

The above code would execute the ‘cat /etc/passwd' command, which is undesirable. On the other hand, Python YAML also allows the execution of arbitrary code by default, as shown below:

import yaml
document = "!!python/object/apply:os.system ['cat /etc/passwd']"
yaml.load(document)
Python

The above code would also execute the same command, but we can avoid this by using “safe_load()” instead of “load()”.
The text below seems to be a reminder of using safer alternatives to pickle modules in Python, along with some brief examples of such options:

‘Mitigation:
Pickle is a well-known module in Python, but developers must be mindful of the warning displayed on its documentation page. To avoid potential security risks, some safer alternatives to pickle are available. Here are a few examples:

  • JSON:
    To serialize, you can import the json module and use the json.dumps() function. To deserialize, use json.loads().
  • msgpack:
    To serialize, you can import the msgpack module and use the msgpack.packb() function. To deserialize, use msgpack.unpackb().

Other safe options include Google's protobuf and CBOR.'

Serialization vulnerabilities are a severe threat, and developers must take them seriously. These vulnerabilities can be easily exploited, and it is equally easy to overlook them during development. Attackers can even execute arbitrary code on machines by exploiting such vulnerabilities. As we have seen, deserialization that is insecure or uses insecure functions can put our infrastructure at risk of being compromised. Developers should read the documentation page carefully and not ignore any warnings. Using languages like JSON for data serialization/deserialization, which cannot contain executable code because it is a data-only language, is advisable. Thank you for taking the time to read this.

Leave a comment

Newsletter Signup
Address

The Grid —
The Matrix Has Me
Big Bear Lake, CA 92315

01010011 01111001 01110011 01110100 01100101 01101101 00100000
01000110 01100001 01101001 01101100 01110101 01110010 01100101

I have photographic memory! It's a curse!Nikon

Deitasoft © 2024. All Rights Reserved.