Faster than Pandas
How to Speed Up Pandas with Modin
ref: Towards Data Science
The pandas library provides easy-to-use data structures like pandas DataFrames as well as tools for data analysis. One issue with pandas is that it wasn’t designed for analyzing a large amount of data like 100 GB or 1 TB datasets. 1
Fortunately, there is the Modin library. It can handle the datasets that pandas can't.
Modin is a drop-in replacement for pandas. While pandas is single-threaded, Modin lets you instantly speed up your workflows by scaling pandas so it uses all of your cores. Modin works especially well on larger datasets, where pandas becomes painfully slow or runs out of memory.
By simply replacing the import statement, Modin offers users effortless speed and scale for their pandas workflows.
The charts below show the speedup you get by replacing pandas with Modin
Faster Than Pandas with Polars
ref: Python in Office
Libraries in comparison: polars, modin, datatable
Results:
polars performs consistently better than all other libraries in most of our tests. Some of the highlights include:
- ~17x faster than pandas when reading csv files
- ~10x faster than pandas when merging two dataframes
- ~2-3x faster than pandas for our other tests
The results suggest that replacing pandas with polars will likely increase the speed of our Python program by at least 2-3 times.