Quit Emailing Yourself

# pandas

10 links tagged with pandas

Click any tag below to further narrow down your results

Links

The Journey to Zero-Copy: How chDB Became the Fastest SQL Engine on Pandas DataFrame

chDB transforms ClickHouse into a user-friendly Python library for seamless DataFrame operations, eliminating serialization overhead and enabling fast SQL queries directly on Pandas DataFrames. The latest version achieves significant performance improvements, making it 87 times faster than its predecessor by implementing zero-copy data handling and optimized processing.

Saved by markshervey · Last saved January 12, 2026 · 6 min read

+ clickhouse pandas ✓ + sql + data-science + performance

https://thenewstack.io/python-pandas-ditches-numpy-for-speedier-pyarrow/

Python's Pandas library has moved away from using NumPy in favor of the faster PyArrow for data processing tasks. This shift aims to improve performance and efficiency in handling large datasets, highlighting a significant change in the way data manipulation is approached in Python environments.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ python pandas ✓ + pyarrow + numpy + data-processing

GitHub - sinaptik-ai/pandas-ai: Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

PandasAI is a Python library that allows users to interact with data using natural language queries, catering to both technical and non-technical users. It supports various functionalities such as generating charts, working with multiple dataframes, and running in a secure Docker environment. The library can be installed via pip or poetry and is compatible with Python versions 3.8 to 3.11.

Saved by tldr-importer · Last saved October 29, 2025 · 2 min read

pandas ✓ + python + natural-language-processing + data-analysis + docker

GitHub - olooney/jellyjoin: jellyjoin Python package for soft joins with embedding vectors

Jellyjoin is a tool designed for performing "soft joins" on dataframes or lists by measuring semantic similarity rather than exact matches. It utilizes OpenAI embedding models for high-quality matches but falls back on traditional string similarity metrics when necessary. Users can customize similarity strategies and visualize associations through simple Pandas DataFrame outputs.

Saved by tldr-importer · Last saved October 29, 2025 · 4 min read

+ jellyjoin + semantic-similarity + dataframes + openai pandas ✓

GitHub - unionai-oss/pandera: A light-weight, flexible, and expressive statistical data testing library

Pandera is an open-source project by Union.ai that offers a flexible API for validating dataframe-like objects, enhancing data processing pipelines with statistically typed dataframes. It supports various libraries such as pandas and polars, and provides both object-based and class-based validation methods. Users are advised to import from the `pandera.pandas` module to avoid future deprecation issues.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ data-validation pandas ✓ + open-source + dataframes + schema

GitHub - narwhals-dev/narwhals: Lightweight and extensible compatibility layer between dataframe libraries!

Narwhals is a lightweight and extensible compatibility layer that enables seamless integration between various dataframe libraries such as cuDF, Modin, pandas, Polars, and PyArrow. It allows users to write dataframe-agnostic code with zero dependencies, facilitating the use of expressions and maintaining compatibility with complex types and indices. Users can easily wrap and unwrap dataframes while leveraging full static typing and efficient performance across different libraries.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

+ dataframe + compatibility + narwhals pandas ✓ + polars

[no-title]

The article presents a collection of 20 one-liners in Python using the Pandas library that can streamline data manipulation tasks. These concise snippets are designed to enhance efficiency and simplify complex operations, making them valuable for data analysts and programmers.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

pandas ✓ + python + data-manipulation + programming + one-liners

How to Spot (and Fix) 5 Common Performance Bottlenecks in pandas Workflows | NVIDIA Technical Blog

The article discusses five common performance bottlenecks in pandas workflows, providing solutions for each issue, including using faster parsing engines, optimizing joins, and leveraging GPU acceleration with cudf.pandas for significant speed improvements. It also highlights how users can access GPU resources for free on Google Colab, allowing for enhanced data processing capabilities without code modifications.

Saved by tldr-importer · Last saved October 29, 2025 · 5 min read

pandas ✓ + performance + gpu + data-processing + acceleration

Convert CSV to Excel with DuckDB, Polars, etc. - Confessions of a Data Guy

The article discusses methods for converting CSV and TXT files to Excel format using various tools like Pandas, DuckDB, and Polars. It emphasizes the need for efficient and concise code solutions for this common task, highlighting the simplicity of some one-liner approaches. The author expresses a preference for minimal coding effort while achieving the desired outcome.

Saved by tldr-importer · Last saved October 29, 2025 · 1 min read

+ csv + excel + duckdb + polars pandas ✓

3 pandas Workflows That Slowed to a Crawl on Large Datasets—Until We Turned on GPUs | NVIDIA Technical Blog

Many pandas workflows slow down significantly with large datasets, leading to frustration for data analysts. By utilizing NVIDIA's GPU-accelerated cuDF library, common tasks like analyzing stock prices, processing text-heavy job postings, and building interactive dashboards can be dramatically sped up, often by up to 20 times faster. Additionally, advancements like Unified Virtual Memory allow for processing larger datasets than the GPU's memory, simplifying the workflow for users.

Saved by tldr-importer · Last saved October 29, 2025 · 3 min read

pandas ✓ + gpu + cudf + data-analysis + performance