r/Python 3d ago

Tutorial The Inner Workings of Python Dataclasses Explained

Ever wondered how those magical dataclass decorators work? Wonder no more! In my latest article, I explain the core concepts behind them and then create a simple version from scratch! Check it out!

https://jacobpadilla.com/articles/python-dataclass-internals

(reposting since I had to fix a small error in the article)

160 Upvotes

17 comments sorted by

24

u/PurepointDog 3d ago

That felt so much hackier than I was expecting

23

u/nekokattt 3d ago

A fair bit of Python's standard library is like this. Look into collections.namedtuple for example.

If it isn't simple, it probably uses eval/exec/a lot of underlying stuff/C modules

15

u/DuckDatum 3d ago

Programming is like science. They teach you good guardrails, good rules of thumb, good yet often imprecise generalizations. Once you’re out there in the real world, dig into the weeds of things. You’re a better programmer when you know when to, and when not to, do things that are considered bad practice—like using eval.

4

u/DigThatData 2d ago

"bad practice" is a bit harsh, maybe "code smell"?

4

u/zapman449 2d ago

I trust very few people (mostly not including myself as well) to use eval reasonably. If I see that in a pull request the whole thing gets extra scrutiny.

1

u/Skasch 2d ago

I typically consider I use eval reasonably if I want to do something that doesn't seem possible without it, try a dozen alternatives, search for a few hours for different design patterns, sleep on it a few nights, ask a few colleagues their opinion, then write an apologetic comment above the line explaining why there's no way around it, then wrap that into a nice module so most other engineers won't have to think about it.

To be fair, I've never had to go that far.

1

u/kuwisdelu 1d ago

This is what you’re forced to do when your language doesn’t have lisp-like macros.

13

u/DaelonSuzuka 3d ago

See also, the classic dataclasses talk by Raymond Hettinger:

https://www.youtube.com/watch?v=T-TwcmT6Rcw

8

u/yrubooingmeimryte 2d ago

This is a great example of how NOT to do a tech talk. It takes him nearly 20 minutes to actually start talking about anything and even when he finally gets to the point he still constantly gets sidetracked talking about unrelated shit that just distracts from the the topic.

3

u/victoriasecretagent 2d ago

I typically enjoy his talks very much. Him and David Beazley.

5

u/JanEric1 3d ago

Is there any specific reason that is done like that? I feel like one should be able to do this without exec, but I haven't put the implementations side by side to compare.

14

u/FI_Stickie_Boi 3d ago

I believe the main reason is speed. attrs, the library dataclasses are based on, also do this, in order for the work to all be done during class creation, so that there's minimal overhead during "runtime" (ie. when you're instantiating classes, using methods, etc.) If you try and do this without eval/exec via decorators and all that, then you'll incur pretty significant runtime overhead because everytime you call a method, python will have to dig through multiple closures, which slows things down a lot.

3

u/Awkward-Fisherman380 3d ago

That's Amazing. Very insightful Keep it up✌️🏼

1

u/marcus-luck 3d ago

Great article! Thanks for writing and sharing!

1

u/magnomagna 2d ago

However, if there are arguments in the decorator, the dataclass function will be called

Just a small nitpick... better be more specific:

However, if there are only keyword-only arguments in the decorator, the dataclass function will be called

0

u/sohang-3112 Pythonista 2d ago

Good post!

1

u/kuwisdelu 1d ago

Oh look it’s Greenspun’s tenth rule in action.