VisualAnalysis

class pandas_visual_analysis.visual_analysis.VisualAnalysis(data, layout='default', categorical_columns=None, row_height=400, sample=None, select_color='#323EEC', deselect_color='#8A8C93', alpha=0.75, seed=None)[source]

Bases: object

Generate plots that support linked-brushing from a pandas DataFrame and display them in Jupyter notebooks.

Parameters
  • data (Union[DataFrame, DataSource]) – A pandas.DataFrame object or a DataSource.

  • layout (Union[str, List[List[str]]]) – Layout specification name or explicit definition of widget names in rows. Those columns have to include all columns of the DataFrame which have type object, str, bool or category. This means it can only add columns which do not have the aforementioned types. Defaults to ‘default’.

  • categorical_columns (Optional[List[str]]) – If given, specifies which columns are to be interpreted as categorical. Defaults to None.

  • row_height (Union[int, List[int]]) – Height in pixels each row should have. If given an integer, each row has the height specified by that value, if given a list of integers, each value in the list specifies the height of the corresponding row. Defaults to 400.

  • sample (Union[float, int, None]) – Int or float value specifying if the DataFrame should be sub-sampled. When an int is given, the DataFrame will be limited to that number of rows given by the value. When a float is given, the DataFrame will include the fraction of rows given by the value. Defaults to None.

  • select_color (Union[str, Tuple[int, int, int]]) – RGB tuple or hex color specifying the color display selected data points. Values in the tuple have to be between 0 and 255 inclusive or a hex string that converts to such RGB values. Defaults to ‘#323EEC’.

  • deselect_color (Union[str, Tuple[int, int, int]]) – RGB tuple or hex color specifying the color display deselected data points. Values in the tuple have to be between 0 and 255 inclusive or a hex string that converts to such RGB values. Defaults to ‘#8A8C93’.

  • alpha (float) – Opacity of data points when applicable ranging from 0.0 to 1.0 inclusive. Defaults to 0.75.

  • seed (Optional[int]) – Random seed used for sampling the data. Values can be any integer between 0 and 2**32 - 1 inclusive or None. Defaults to None.

static widgets()[source]
Returns

All the widget names that are available as input to a layout.

Advanced Usage of VisualAnalysis

For getting started see the basic usage guide.

Row Height

The row_height parameter enables control over the height of the widgets in rows.

When the value is an integer, all rows will have that height in pixels:

from pandas_visual_analysis import VisualAnalysis
VisualAnalysis(df, row_height=300)

When the parameter is a list of integers, each row will have the height of the value in the list at that position:

from pandas_visual_analysis import VisualAnalysis
VisualAnalysis(df, layout=[["Scatter", "Scatter"], ["ParallelCoordinates"]],
               row_height=[200, 300])

Here, the first row with the two scatter plot will have a height of 200 pixels while the parallel coordinates plot has a height of 300 pixels.

Sample

The sample parameter accepts either an integer or a float value between 0.0 and 1.0.

When an integer value is passed, the DataFrame is sampled to contain that many rows:

>>> from pandas_visual_analysis import VisualAnalysis
>>> v = VisualAnalysis(df, sample=100)
>>> len(v.data_source)
100

The following analysis will only show 100 data points sampled from df. This means the integer cannot be larger than the length of the initial DataFrame.

When a float is passed, the DataFrame is sampled to contain the fraction of rows given by the value.

from pandas_visual_analysis import VisualAnalysis
VisualAnalysis(df, sample=0.5)

Assuming the passed DataFrame originally contained 300 rows, the analysis will only show 150 of them.

It is also possible to pass a seed used for sampling:

from pandas_visual_analysis import VisualAnalysis
VisualAnalysis(df, sample=0.5, seed=17)

The value of the seed has to be an integer between 0 and 2**32-1 inclusive.

Colors

Instead of using the default color, it is possible to pass custom colors to the VisualAnalysis object. There are select_color and deselect_color which specify the colors to use to represent selected and deselected data points. They can either be hex strings (like ‘#323EEC’) or tuples representing RGB values (like (50, 62, 236)).

The parameter alpha specifies the opacity of the data points when applicable and is specified by a float value between 0.0 and 1.0. A value of 0.0 means transparent and 1.0 means opaque.

from pandas_visual_analysis import VisualAnalysis
VisualAnalysis(df, select_color='#323EEC', deselect_color='#8A8C93', alpha=0.75)