parallel_apply_over_df

sitelle.parallel.parallel_apply_over_df(df, func, axis=1, broadcast=False, raw=False, reduce=None, args=(), **kwargs)

Iterates over a pandas Dataframe to apply a function on each line, taking advantage of multiple cores. The signature is similar to pandas.DataFrame.apply

Parameters:
  • df (DataFrame) – The input dataframe
  • func (callable) – The function to apply to each line (should accept a Series as input)
  • axis ({0 or 'index', 1 or 'columns'}, default 0) –
    0 or ‘index’: apply function to each column
    1 or ‘columns’: apply function to each row
  • broadcast (boolean, default False) – For aggregation functions, return object of same size with values propagated
  • raw (boolean, default False) – If False, convert each row or column into a Series. If raw=True the passed function will receive ndarray objects instead. If you are just applying a NumPy reduction function this will achieve much better performance
  • reduce (boolean or None, default None) – Try to apply reduction procedures. If the DataFrame is empty, apply will use reduce to determine whether the result should be a Series or a DataFrame. If reduce is None (the default), apply’s return value will be guessed by calling func an empty Series (note: while guessing, exceptions raised by func will be ignored). If reduce is True a Series will always be returned, and if False a DataFrame will always be returned
  • modules (tuple of strings) – The modules to be imported so that func works correctly. Example : (‘import numpy as np’,)
  • depfuncs (tuple of string) – The functions used by func but defined outside of its body
  • args – Additional arguments to be passed to func
  • kwargs – Additional keywords arguments to be passed to func