parallel_apply_over_df¶

sitelle.parallel.parallel_apply_over_df(df, func, axis=1, broadcast=False, raw=False, reduce=None, args=(), **kwargs)¶

Iterates over a pandas Dataframe to apply a function on each line, taking advantage of multiple cores. The signature is similar to pandas.DataFrame.apply

Parameters:

df (DataFrame) – The input dataframe
func (callable) – The function to apply to each line (should accept a Series as input)
axis ({0 or 'index', 1 or 'columns'}, default 0) –

0 or ‘index’: apply function to each column

1 or ‘columns’: apply function to each row
broadcast (boolean, default False) – For aggregation functions, return object of same size with values propagated
raw (boolean, default False) – If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
reduce (boolean or None, default None) – Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned
modules (tuple of strings) – The modules to be imported so that func works correctly. Example : (‘import numpy as np’,)
depfuncs (tuple of string) – The functions used by func but defined outside of its body
args – Additional arguments to be passed to func
kwargs – Additional keywords arguments to be passed to func

Navigation

parallel_apply_over_df¶