#149 Python's small object allocator and other memory features

Python Bytes - A podcast by Michael Kennedy and Brian Okken - Luni

Brian #1: Dropbox: Our journey to type checking 4 million lines of Python

Continuing saga, but this is a cool write up.
Benefits
- “Experience tells us that understanding code becomes the key to maintaining developer productivity. Without type annotations, basic reasoning such as figuring out the valid arguments to a function, or the possible return value types, becomes a hard problem. Here are typical questions that are often tricky to answer without type annotations:
  - Can this function return None?
  - What is this items argument supposed to be?
  - What is the type of the id attribute: is it int, str, or perhaps some custom type?
  - Does this argument need to be a list, or can I give a tuple or a set?”
- Type checker will find many subtle bugs.
- Refactoring is easier.
- Running type checking is faster than running large suites of unit tests, so feedback can be faster.
- Typing helps IDEs with better completion, static error checking, and more.
Long story, but really cool learnings of how and why to tackle adding type hints to a large project with many developers.
Conclusion. mypy is great now, because DropBox needed it to be.

Michael #2: Setting Up a Flask Application in Visual Studio Code

Video, but also as a post
Follow on to the same in PyCharm:
- video and post
Steps outside VS Code
- Clone repo
- Create a virtual env (via venv)
- Install requirements (via requirements.txt)
- Setup flask app ENV variable
- flask deploy ← custom command for DB
VS Code
- Open the folder where the repo and venv live
- Open any Python file to trigger the Python subsystem
- Ensure the correct VENV is selected (bottom left)
- Open the debugger tab, add config, pick Flask, choose your app.py file
- Debug menu, start without debugging (or with)
Adding tests via VS Code
- Open command pallet (CMD SHIFT P), Python: Discover Tests, select framework, select directory of tests, file pattern, new tests bottle on the right bar

Brian #3: Multiprocessing vs. Threading in Python: What Every Data Scientist Needs to Know

How data scientists can go about choosing between the multiprocessing and threading and which factors should be kept in mind while doing so.
Does not consider async, but still some great info.
Overview of both concepts in general and some of the pitfalls of parallel computing.
The specifics in Python, with the GIL
Use threads for waiting on IO or waiting on users.
Use multiprocessing for CPU intensive work.
The surprising bit for me was the benchmarks
- Using something speeds up the code. That’s obvious.
- The difference between the two isn’t as great as I would have expected.
A discussion of merits and benefits of both.
And from the perspective of data science.
A few more examples, with code, included.

Michael #4: ORM - async ORM

And https://github.com/encode/databases
The orm package is an async ORM for Python, with support for Postgres, MySQL, and SQLite.
SQLAlchemy core for query building.
databases for cross-database async support.
typesystem for data validation.
Because ORM is built on SQLAlchemy core, you can use Alembic to provide database migrations.
Need to be pretty async savy

Brian #5: Getting Started with APIs

dataquest.io post
Conceptual introduction of web APIs
Discussion of GET status codes, including a nice list with descriptions.
- examples:
  - 301: The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
  - 400: The server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
endpoints
endpoints that take query parameters
JSON data
Examples in Python for using:
- requests to query endpoints.
- json to load and dump JSON data.

Michael #6: Memory management in Python

This article describes memory management in Python 3.6.
Everything in Python is an object. Some objects can hold other objects, such as lists, tuples, dicts, classes, etc.
such an approach requires a lot of small memory allocations
To speed-up memory operations and reduce fragmentation Python uses a special manager on top of the general-purpose allocator, called PyMalloc.
Layered managers
- RAM
- OS VMM
- C-malloc
- PyMem
- Python Object allocator
- Object memory
Three levels of organization
- To reduce overhead for small objects (less than 512 bytes) Python sub-allocates big blocks of memory.
- Larger objects are routed to standard C allocator.
- three levels of abstraction — arena, pool, and block.
- Block is a chunk of memory of a certain size. Each block can keep only one Python object of a fixed size. The size of the block can vary from 8 to 512 bytes and must be a multiple of eight
- A collection of blocks of the same size is called a pool. Normally, the size of the pool is equal to the size of a memory page, i.e., 4Kb.
- The arena is a chunk of 256kB memory allocated on the heap, which provides memory for 64 pools.
Python's small object manager rarely returns memory back to the Operating System.
An arena gets fully released If and only if all the pools in it are empty.

Extras

Brian:

Tuesday, Oct 6, Python PDX West,
Thursday, Sept 26, I’ll be speaking at PDX Python, downtown.
Both events, mostly, I’ll be working on new programming jokes unless I come up with something better. :)

Michael:

Jokes: A few I liked from the dad joke list.