
Image by Author | Canva
What if there is a way to make your Python code faster? __slots__
in Python is easy to implement and can improve the performance of your code while reducing the memory usage.
In this article, we will walk through how it works using a data science project from the real world, where Allegro is using this as a challenge for their data science recruitment process. However, before we get into this project, let’s build a solid understanding of what __slots__
does.
What is __slots__
in Python?
In Python, every object keeps a dictionary of its attributes. This allows you to add, change, or delete them, but it also comes at a cost: extra memory and slower attribute access.
The __slots__
declaration tells Python that these are the only attributes this object will ever need. It is kind of a limitation, but it will save us time. Let’s see with an example.
class WithoutSlots:
def __init__(self, name, age):
self.name = name
self.age = age
class WithSlots:
__slots__
= ['name', 'age']
def __init__(self, name, age):
self.name = name
self.age = age
In the second class, __slots__
tells Python not to create a dictionary for each object. Instead, it reserves a fixed spot in memory for the name and age values, making it faster and decreasing memory usage.
Why Use __slots__
?
Now, before starting the data project, let’s name the reason why you should use __slots__
.
- Memory: Objects take up less space when Python skips creating a dictionary.
- Speed: Accessing values is quicker because Python knows where each value is stored.
- Bugs: This structure avoids silent bugs because only the defined ones are allowed.
Using Allegro’s Data Science Challenge as an Example
In this data project, Allegro asked data science candidates to predict laptop prices by building machine learning models.


Link to this data project: https://platform.stratascratch.com/data-projects/laptop-price-prediction
There are three different datasets:
- train_dataset.json
- val_dataset.json
- test_dataset.json
Good. Let’s continue with the data exploration process.
Data Exploration
Now let’s load one of them to see the dataset’s structure.
with open('train_dataset.json', 'r') as f:
train_data = json.load(f)
df = pd.DataFrame(train_data).dropna().reset_index(drop=True)
df.head()
Here is the output.


Good, let’s see the columns.
Here is the output.


Now, let’s check the numerical columns.
Here is the output.


Data Exploration with __slots__
vs Regular Classes
Let’s create a class called SlottedDataExploration, which will use the __slots__
attribute. It allows only one attribute called df. Let’s see the code.
class SlottedDataExploration:
__slots__
= ['df']
def __init__(self, df):
self.df = df
def info(self):
return self.df.info()
def head(self, n=5):
return self.df.head(n)
def tail(self, n=5):
return self.df.tail(n)
def describe(self):
return self.df.describe(include="all")
Now let’s see the implementation, and instead of using __slots__
let’s use regular classes.
class DataExploration:
def __init__(self, df):
self.df = df
def info(self):
return self.df.info()
def head(self, n=5):
return self.df.head(n)
def tail(self, n=5):
return self.df.tail(n)
def describe(self):
return self.df.describe(include="all")
You can read more about how class methods work in this Python Class Methods guide.
Performance Comparison: Time Benchmark
Now let’s measure the performance by measuring the time and memory.
import time
from pympler import asizeof # memory measurement
start_normal = time.time()
de = DataExploration(df)
_ = de.head()
_ = de.tail()
_ = de.describe()
_ = de.info()
end_normal = time.time()
normal_duration = end_normal - start_normal
normal_memory = asizeof.asizeof(de)
start_slotted = time.time()
sde = SlottedDataExploration(df)
_ = sde.head()
_ = sde.tail()
_ = sde.describe()
_ = sde.info()
end_slotted = time.time()
slotted_duration = end_slotted - start_slotted
slotted_memory = asizeof.asizeof(sde)
print(f"⏱️ Normal class duration: {normal_duration:.4f} seconds")
print(f"⏱️ Slotted class duration: {slotted_duration:.4f} seconds")
print(f"📦 Normal class memory usage: {normal_memory:.2f} bytes")
print(f"📦 Slotted class memory usage: {slotted_memory:.2f} bytes")
Now let’s see the result.


The slotted class duration is 46.45% faster, but the memory usage is the same for this example.
Machine Learning in Action
Now, in this section, let’s continue with the machine learning. But before doing so, let’s do a train and test split.
Train and Test Split
Now we have three different datasets, train, val, and test, so let’s first find their indices.
train_indeces = train_df.dropna().index
val_indeces = val_df.dropna().index
test_indeces = test_df.dropna().index
Now it’s time to assign those indices to select those datasets easily in the next step.
train_df = new_df.loc[train_indeces]
val_df = new_df.loc[val_indeces]
test_df = new_df.loc[test_indeces]
Great, now let’s format these data frames because numpy wants the flat (n,) format instead of
the (n,1). To do that, we need ot use .ravel() after to_numpy().
X_train, X_val, X_test = train_df[selected_features].to_numpy(), val_df[selected_features].to_numpy(), test_df[selected_features].to_numpy()
y_train, y_val, y_test = df.loc[train_indeces][label_col].to_numpy().ravel(), df.loc[val_indeces][label_col].to_numpy().ravel(), df.loc[test_indeces][label_col].to_numpy().ravel()
Applying Machine Learning Models
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import ExtraTreesRegressor
from sklearn.ensemble import VotingRegressor
from sklearn import linear_model
from sklearn.neural_network import MLPRegressor
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler, MaxAbsScaler
import matplotlib.pyplot as plt
from sklearn import tree
import seaborn as sns
def rmse(y_true, y_pred):
return mean_squared_error(y_true, y_pred, squared=False)
def regression(regressor_name, regressor):
pipe = make_pipeline(MaxAbsScaler(), regressor)
pipe.fit(X_train, y_train)
predicted = pipe.predict(X_test)
rmse_val = rmse(y_test, predicted)
print(regressor_name, ':', rmse_val)
pred_df[regressor_name+'_Pred'] = predicted
plt.figure(regressor_name)
plt.title(regressor_name)
plt.xlabel('predicted')
plt.ylabel('actual')
sns.regplot(y=y_test,x=predicted)
Next, we will define a dictionary of regressors and run each model.
regressors = {
'Linear' : LinearRegression(),
'MLP': MLPRegressor(random_state=42, max_iter=500, learning_rate="constant", learning_rate_init=0.6),
'DecisionTree': DecisionTreeRegressor(max_depth=15, random_state=42),
'RandomForest': RandomForestRegressor(random_state=42),
'GradientBoosting': GradientBoostingRegressor(random_state=42, criterion='squared_error',
loss="squared_error",learning_rate=0.6, warm_start=True),
'ExtraTrees': ExtraTreesRegressor(n_estimators=100, random_state=42),
}
pred_df = pd.DataFrame(columns =["Actual"])
pred_df["Actual"] = y_test
for key in regressors.keys():
regression(key, regressors[key])
Here are the results.


Now, implement this with both slots and regular classes.
Machine Learning with __slots__
vs Regular Classes
Now let’s check the code with slots.
class SlottedMachineLearning:
__slots__
= ['X_train', 'y_train', 'X_test', 'y_test', 'pred_df']
def __init__(self, X_train, y_train, X_test, y_test):
self.X_train = X_train
self.y_train = y_train
self.X_test = X_test
self.y_test = y_test
self.pred_df = pd.DataFrame({'Actual': y_test})
def rmse(self, y_true, y_pred):
return mean_squared_error(y_true, y_pred, squared=False)
def regression(self, name, model):
pipe = make_pipeline(MaxAbsScaler(), model)
pipe.fit(self.X_train, self.y_train)
predicted = pipe.predict(self.X_test)
self.pred_df[name + '_Pred'] = predicted
score = self.rmse(self.y_test, predicted)
print(f"{name} RMSE:", score)
plt.figure(figsize=(6, 4))
sns.regplot(x=predicted, y=self.y_test, scatter_kws={"s": 10})
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title(f'{name} Predictions')
plt.grid(True)
plt.show()
def run_all(self):
models = {
'Linear': LinearRegression(),
'MLP': MLPRegressor(random_state=42, max_iter=500, learning_rate="constant", learning_rate_init=0.6),
'DecisionTree': DecisionTreeRegressor(max_depth=15, random_state=42),
'RandomForest': RandomForestRegressor(random_state=42),
'GradientBoosting': GradientBoostingRegressor(random_state=42, learning_rate=0.6, warm_start=True),
'ExtraTrees': ExtraTreesRegressor(n_estimators=100, random_state=42)
}
for name, model in models.items():
self.regression(name, model)
Here is the regular class application.
class MachineLearning:
def __init__(self, X_train, y_train, X_test, y_test):
self.X_train = X_train
self.y_train = y_train
self.X_test = X_test
self.y_test = y_test
self.pred_df = pd.DataFrame({'Actual': y_test})
def rmse(self, y_true, y_pred):
return mean_squared_error(y_true, y_pred, squared=False)
def regression(self, name, model):
pipe = make_pipeline(MaxAbsScaler(), model)
pipe.fit(self.X_train, self.y_train)
predicted = pipe.predict(self.X_test)
self.pred_df[name + '_Pred'] = predicted
score = self.rmse(self.y_test, predicted)
print(f"{name} RMSE:", score)
plt.figure(figsize=(6, 4))
sns.regplot(x=predicted, y=self.y_test, scatter_kws={"s": 10})
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title(f'{name} Predictions')
plt.grid(True)
plt.show()
def run_all(self):
models = {
'Linear': LinearRegression(),
'MLP': MLPRegressor(random_state=42, max_iter=500, learning_rate="constant", learning_rate_init=0.6),
'DecisionTree': DecisionTreeRegressor(max_depth=15, random_state=42),
'RandomForest': RandomForestRegressor(random_state=42),
'GradientBoosting': GradientBoostingRegressor(random_state=42, learning_rate=0.6, warm_start=True),
'ExtraTrees': ExtraTreesRegressor(n_estimators=100, random_state=42)
}
for name, model in models.items():
self.regression(name, model)
Performance Comparison: Time Benchmark
Now let’s compare each code to the one we did in the previous section.
import time
start_normal = time.time()
ml = MachineLearning(X_train, y_train, X_test, y_test)
ml.run_all()
end_normal = time.time()
normal_duration = end_normal - start_normal
normal_memory = (
ml.X_train.nbytes +
ml.X_test.nbytes +
ml.y_train.nbytes +
ml.y_test.nbytes
)
start_slotted = time.time()
sml = SlottedMachineLearning(X_train, y_train, X_test, y_test)
sml.run_all()
end_slotted = time.time()
slotted_duration = end_slotted - start_slotted
slotted_memory = (
sml.X_train.nbytes +
sml.X_test.nbytes +
sml.y_train.nbytes +
sml.y_test.nbytes
)
print(f"⏱️ Normal ML class duration: {normal_duration:.4f} seconds")
print(f"⏱️ Slotted ML class duration: {slotted_duration:.4f} seconds")
print(f"📦 Normal ML class memory usage: {normal_memory:.2f} bytes")
print(f"📦 Slotted ML class memory usage: {slotted_memory:.2f} bytes")
time_diff = normal_duration - slotted_duration
percent_faster = (time_diff / normal_duration) * 100
if percent_faster > 0:
print(f"✅ Slotted ML class is {percent_faster:.2f}% faster than the regular ML class.")
else:
print(f"ℹ️ No speed improvement with slots in this run.")
memory_diff = normal_memory - slotted_memory
percent_smaller = (memory_diff / normal_memory) * 100
if percent_smaller > 0:
print(f"✅ Slotted ML class uses {percent_smaller:.2f}% less memory than the regular ML class.")
else:
print(f"ℹ️ No memory savings with slots in this run.")
Here is the output.


Conclusion
By preventing the creation of dynamic __dict__
for each instance, Python __slots__
are very good at reducing the memory usage and speeding up attribute access. You saw how it works in practice through both data exploration and machine learning tasks using Allegro’s real recruitment project.
In small datasets, the improvements might be minor. But as data scales, the benefits become more noticeable, especially in memory-bound or performance-critical applications.
Nate Rosidi is a data scientist and in product strategy. He’s also an adjunct professor teaching analytics, and is the founder of StrataScratch, a platform helping data scientists prepare for their interviews with real interview questions from top companies. Nate writes on the latest trends in the career market, gives interview advice, shares data science projects, and covers everything SQL.
You must be logged in to post a comment Login