#149 Python's small object allocator and other memory features

Python Bytes - A podcast by Michael Kennedy and Brian Okken - Luni

Categories:

Sponsored by Datadog: pythonbytes.fm/datadog

Brian #1: Dropbox: Our journey to type checking 4 million lines of Python

  • Continuing saga, but this is a cool write up.
  • Benefits
    • “Experience tells us that understanding code becomes the key to maintaining developer productivity. Without type annotations, basic reasoning such as figuring out the valid arguments to a function, or the possible return value types, becomes a hard problem. Here are typical questions that are often tricky to answer without type annotations:
      • Can this function return None?
      • What is this items argument supposed to be?
      • What is the type of the id attribute: is it int, str, or perhaps some custom type?
      • Does this argument need to be a list, or can I give a tuple or a set?”
    • Type checker will find many subtle bugs.
    • Refactoring is easier.
    • Running type checking is faster than running large suites of unit tests, so feedback can be faster.
    • Typing helps IDEs with better completion, static error checking, and more.
  • Long story, but really cool learnings of how and why to tackle adding type hints to a large project with many developers.
  • Conclusion. mypy is great now, because DropBox needed it to be.

Michael #2: Setting Up a Flask Application in Visual Studio Code

  • Video, but also as a post
  • Follow on to the same in PyCharm:
  • Steps outside VS Code
    • Clone repo
    • Create a virtual env (via venv)
    • Install requirements (via requirements.txt)
    • Setup flask app ENV variable
    • flask deploy ← custom command for DB
  • VS Code
    • Open the folder where the repo and venv live
    • Open any Python file to trigger the Python subsystem
    • Ensure the correct VENV is selected (bottom left)
    • Open the debugger tab, add config, pick Flask, choose your app.py file
    • Debug menu, start without debugging (or with)
  • Adding tests via VS Code
    • Open command pallet (CMD SHIFT P), Python: Discover Tests, select framework, select directory of tests, file pattern, new tests bottle on the right bar

Brian #3: Multiprocessing vs. Threading in Python: What Every Data Scientist Needs to Know

  • How data scientists can go about choosing between the multiprocessing and threading and which factors should be kept in mind while doing so.
  • Does not consider async, but still some great info.
  • Overview of both concepts in general and some of the pitfalls of parallel computing.
  • The specifics in Python, with the GIL
  • Use threads for waiting on IO or waiting on users.
  • Use multiprocessing for CPU intensive work.
  • The surprising bit for me was the benchmarks
    • Using something speeds up the code. That’s obvious.
    • The difference between the two isn’t as great as I would have expected.
  • A discussion of merits and benefits of both.
  • And from the perspective of data science.
  • A few more examples, with code, included.

Michael #4: ORM - async ORM

  • And https://github.com/encode/databases
  • The orm package is an async ORM for Python, with support for Postgres, MySQL, and SQLite.
  • SQLAlchemy core for query building.
  • databases for cross-database async support.
  • typesystem for data validation.
  • Because ORM is built on SQLAlchemy core, you can use Alembic to provide database migrations.
  • Need to be pretty async savy

Brian #5: Getting Started with APIs

  • dataquest.io post
  • Conceptual introduction of web APIs
  • Discussion of GET status codes, including a nice list with descriptions.
    • examples:
      • 301: The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
      • 400: The server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
  • endpoints
  • endpoints that take query parameters
  • JSON data
  • Examples in Python for using:
    • requests to query endpoints.
    • json to load and dump JSON data.

Michael #6: Memory management in Python

  • This article describes memory management in Python 3.6.
  • Everything in Python is an object. Some objects can hold other objects, such as lists, tuples, dicts, classes, etc.
  • such an approach requires a lot of small memory allocations
  • To speed-up memory operations and reduce fragmentation Python uses a special manager on top of the general-purpose allocator, called PyMalloc.
  • Layered managers
    • RAM
    • OS VMM
    • C-malloc
    • PyMem
    • Python Object allocator
    • Object memory
  • Three levels of organization
    • To reduce overhead for small objects (less than 512 bytes) Python sub-allocates big blocks of memory.
    • Larger objects are routed to standard C allocator.
    • three levels of abstraction — arena, pool, and block.
    • Block is a chunk of memory of a certain size. Each block can keep only one Python object of a fixed size. The size of the block can vary from 8 to 512 bytes and must be a multiple of eight
    • A collection of blocks of the same size is called a pool. Normally, the size of the pool is equal to the size of a memory page, i.e., 4Kb.
    • The arena is a chunk of 256kB memory allocated on the heap, which provides memory for 64 pools.
  • Python's small object manager rarely returns memory back to the Operating System.
  • An arena gets fully released If and only if all the pools in it are empty.

Extras

Brian:

  • Tuesday, Oct 6, Python PDX West,
  • Thursday, Sept 26, I’ll be speaking at PDX Python, downtown.
  • Both events, mostly, I’ll be working on new programming jokes unless I come up with something better. :)

Michael:

Jokes: A few I liked from the dad joke list.

  • What do you call a 3.14 foot long snake? A π-thon
  • What if it’s 3.14 inches, instead of feet? A μ-π-thon
  • Why doesn't Hollywood make more Big Data movies? NoSQL.
  • Why didn't the div get invited to the dinner party? Because it had no class.

Visit the podcast's native language site