Python Dataclasses vs Pydantic: When to Use Which
Python’s dataclasses module and the Pydantic library solve similar problems — defining structured data with type hints — but they make very different trade-offs. Dataclasses are lightweight and part of the standard library; Pydantic adds runtime validation, coercion, and serialization at the cost of a dependency. Knowing when each is appropriate saves you from over-engineering simple code or under-protecting API boundaries.
Python Dataclasses
dataclasses (added in Python 3.7) auto-generates __init__, __repr__, and __eq__ from class annotations. It’s great for data containers where you control all the inputs.
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class User:
id: int
name: str
email: Optional[str] = None
tags: list[str] = field(default_factory=list)
user = User(id=1, name="Alice")
print(user)
# User(id=1, name='Alice', email=None, tags=[])
Key features:
@dataclass(frozen=True) # immutable, hashable
@dataclass(order=True) # enables <, >, <=, >=
@dataclass(slots=True) # faster attribute access (Python 3.10+)
What dataclasses don’t do: they don’t validate types at runtime. Passing id="not-an-int" will work without complaint.
user = User(id="oops", name="Bob") # no error raised
print(user.id) # "oops"
Pydantic Models
Pydantic validates data at construction time and coerces values when it can do so safely. It’s the foundation of FastAPI and is a natural fit for anything that handles external data.
from pydantic import BaseModel, EmailStr, field_validator
from typing import Optional
class User(BaseModel):
id: int
name: str
email: Optional[EmailStr] = None
tags: list[str] = []
user = User(id="42", name="Alice") # "42" is coerced to int
print(user.id) # 42
User(id="oops", name="Bob")
# ValidationError: id: Input should be a valid integer
Pydantic v2 (released 2023) is significantly faster than v1 thanks to a Rust core. Install it with:
$ pip install pydantic[email]
Validation and Coercion
Pydantic distinguishes between strict mode (no coercion) and lax mode (the default, which coerces "42" → 42):
from pydantic import BaseModel
class StrictUser(BaseModel):
model_config = {"strict": True}
id: int
StrictUser(id="42")
# ValidationError — string not accepted in strict mode
Custom validators:
from pydantic import BaseModel, field_validator
class Product(BaseModel):
name: str
price: float
@field_validator("price")
@classmethod
def price_must_be_positive(cls, v: float) -> float:
if v <= 0:
raise ValueError("price must be positive")
return v
Serialization
Pydantic models serialize to dict and JSON out of the box:
user = User(id=1, name="Alice", tags=["admin"])
print(user.model_dump())
# {'id': 1, 'name': 'Alice', 'email': None, 'tags': ['admin']}
print(user.model_dump_json())
# '{"id":1,"name":"Alice","email":null,"tags":["admin"]}'
# Parse from dict or JSON
user2 = User.model_validate({"id": 2, "name": "Bob"})
user3 = User.model_validate_json('{"id": 3, "name": "Carol"}')
With dataclasses, you get none of this for free — you’d reach for dataclasses.asdict() (which gives a plain dict but doesn’t serialize custom types) or a library like marshmallow.
Nested Models and Composition
Both work with nesting, but Pydantic validates the whole tree:
from pydantic import BaseModel
class Address(BaseModel):
street: str
city: str
class Customer(BaseModel):
name: str
address: Address
customer = Customer(
name="Alice",
address={"street": "123 Main St", "city": "Mumbai"} # dict is auto-converted
)
print(customer.address.city) # Mumbai
With dataclasses, you’d need to construct Address explicitly.
Performance
For pure in-memory data containers with no external input, dataclasses are faster — no validation overhead. For request/response parsing, Pydantic v2’s Rust core is fast enough that the overhead is negligible at typical API volumes.
A rough guide:
| Scenario | Use |
|---|---|
| Internal data transfer, known inputs | dataclass |
| Config objects from env or YAML | dataclass or pydantic |
| API request/response bodies | pydantic |
| FastAPI path parameters and bodies | pydantic (mandatory) |
| CLI argument parsing | dataclass + argparse |
| Database ORM row mapping | pydantic or ORM-specific models |
Using Both Together
You can use pydantic.dataclasses.dataclass to get Pydantic validation with dataclass syntax:
from pydantic.dataclasses import dataclass
@dataclass
class Item:
id: int
name: str
price: float
Item(id="5", name="Widget", price="9.99")
# id coerced to 5, price coerced to 9.99
This is useful when you want the ergonomics of @dataclass (frozen, slots, etc.) but still need runtime validation.
Conclusion
The decision is straightforward once you know the boundary: if data comes from outside your process — an HTTP request, a config file, a message queue — use Pydantic to validate and coerce it at the entry point. For data that stays inside your application and whose shape you fully control, a plain dataclass is lighter and just as clear. In FastAPI projects, Pydantic is a given; everywhere else, let the source of the data make the decision for you.