==== common pandas data types ==== ^ data type ^ description ^ supports missing values ^ | float | The NumPy float type | Yes | | int | The NumPy integer type | No | | 'Int64' | pandas nullable integer type | Yes | | object | The NumPy type for storing strings (and mixed types) | | | 'category' | pandas categorical type | Yes | | bool | The NumPy Boolean type | No. \\ None becomes False, np.nan becomes True. | | 'boolean' | pandas nullable Boolean type | Yes | | datetime64[ns] | The NumPy date type | Yes (NaT) | Ref:- (Pandas 1.x Cookbook, by Matt Harrison and Theodore Petrou, second edition, published in 2020) -> Chapter 1 -> page-7 ==== What packages does pandas depend on? ==== * Dependencies - https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html#dependencies * Recommended dependencies - https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html#recommended-dependencies * Optional dependencies - https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html#optional-dependencies ==== links I came across ==== * https://github.com/pandas-dev/pandas/releases - pandas release history * http://pandas.pydata.org/pandas-docs/stable/getting_started/install.html - pandas installation page. Contains instructions to install pandas in various ways. * http://pandas.pydata.org/pandas-docs/stable/ - pandas official documentation * https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html - good reference to learn about iloc, slicing ranges * DuckDB can run SQL queries directly on Parquet files and automatically take advantage of the advanced features of the Parquet format. * https://duckdb.org/2021/06/25/querying-parquet.html - gives more details * tags | pandas, streaming ==== pandas.errors.EmptyDataError ==== Sample code to generate EmptyDataError exception >>> import pandas as pd >>> from io import StringIO >>> empty = StringIO() >>> pd.read_csv(empty) Traceback (most recent call last): ... pandas.errors.EmptyDataError: No columns to parse from file Ref: https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.errors.EmptyDataError.html - Got the initial version of the code from here. ==== pandas.io.common.EmptyDataError is deprecated ==== Instead of pandas.io.common.EmptyDataError use pandas.errors.EmptyDataError data point | as of pandas 1.1.2, pandas.io.common.EmptyDataError does not work. Ref:- * https://pandas.pydata.org/docs/whatsnew/v0.20.0.html#pandas-errors * > We are adding a standard public module for all pandas exceptions & warnings pandas.errors. (GH14800). Previously these exceptions & warnings could be imported from pandas.core.common or pandas.io.common. These exceptions and warnings will be removed from the *.common locations in a future release. (GH15541) * https://pandas.pydata.org/docs/reference/api/pandas.errors.EmptyDataError.html - documentation from latest stable version * https://pandas.pydata.org/pandas-docs/version/1.4/reference/api/pandas.errors.EmptyDataError.html - documentation from version 1.4 * https://pandas.pydata.org/pandas-docs/version/0.20/generated/pandas.errors.EmptyDataError.html - documentation from version 0.20 ==== assert_frame_equal ==== Instead of from pandas.util.testing import assert_frame_equal use from pandas.testing import assert_frame_equal data point: Using ''from pandas.util.testing import assert_frame_equal'' in pandas 1.1.2, I get FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.