VisualAnalysis¶
-
class
pandas_visual_analysis.visual_analysis.
VisualAnalysis
(data, layout='default', categorical_columns=None, row_height=400, sample=None, select_color='#323EEC', deselect_color='#8A8C93', alpha=0.75, seed=None)[source]¶ Bases:
object
Generate plots that support linked-brushing from a pandas DataFrame and display them in Jupyter notebooks.
- Parameters
data (
Union
[DataFrame
,DataSource
]) – A pandas.DataFrame object or aDataSource
.layout (
Union
[str
,List
[List
[str
]]]) – Layout specification name or explicit definition of widget names in rows. Those columns have to include all columns of the DataFrame which have type object, str, bool or category. This means it can only add columns which do not have the aforementioned types. Defaults to ‘default’.categorical_columns (
Optional
[List
[str
]]) – If given, specifies which columns are to be interpreted as categorical. Defaults to None.row_height (
Union
[int
,List
[int
]]) – Height in pixels each row should have. If given an integer, each row has the height specified by that value, if given a list of integers, each value in the list specifies the height of the corresponding row. Defaults to 400.sample (
Union
[float
,int
,None
]) – Int or float value specifying if the DataFrame should be sub-sampled. When an int is given, the DataFrame will be limited to that number of rows given by the value. When a float is given, the DataFrame will include the fraction of rows given by the value. Defaults to None.select_color (
Union
[str
,Tuple
[int
,int
,int
]]) – RGB tuple or hex color specifying the color display selected data points. Values in the tuple have to be between 0 and 255 inclusive or a hex string that converts to such RGB values. Defaults to ‘#323EEC’.deselect_color (
Union
[str
,Tuple
[int
,int
,int
]]) – RGB tuple or hex color specifying the color display deselected data points. Values in the tuple have to be between 0 and 255 inclusive or a hex string that converts to such RGB values. Defaults to ‘#8A8C93’.alpha (
float
) – Opacity of data points when applicable ranging from 0.0 to 1.0 inclusive. Defaults to 0.75.seed (
Optional
[int
]) – Random seed used for sampling the data. Values can be any integer between 0 and 2**32 - 1 inclusive or None. Defaults to None.
Advanced Usage of VisualAnalysis¶
For getting started see the basic usage guide.
Row Height¶
The row_height parameter enables control over the height of the widgets in rows.
When the value is an integer, all rows will have that height in pixels:
from pandas_visual_analysis import VisualAnalysis
VisualAnalysis(df, row_height=300)
When the parameter is a list of integers, each row will have the height of the value in the list at that position:
from pandas_visual_analysis import VisualAnalysis
VisualAnalysis(df, layout=[["Scatter", "Scatter"], ["ParallelCoordinates"]],
row_height=[200, 300])
Here, the first row with the two scatter plot will have a height of 200 pixels while the parallel coordinates plot has a height of 300 pixels.
Sample¶
The sample parameter accepts either an integer or a float value between 0.0
and 1.0
.
When an integer value is passed, the DataFrame is sampled to contain that many rows:
>>> from pandas_visual_analysis import VisualAnalysis
>>> v = VisualAnalysis(df, sample=100)
>>> len(v.data_source)
100
The following analysis will only show 100 data points sampled from df
. This means the integer cannot be
larger than the length of the initial DataFrame.
When a float is passed, the DataFrame is sampled to contain the fraction of rows given by the value.
from pandas_visual_analysis import VisualAnalysis
VisualAnalysis(df, sample=0.5)
Assuming the passed DataFrame originally contained 300 rows, the analysis will only show 150 of them.
It is also possible to pass a seed used for sampling:
from pandas_visual_analysis import VisualAnalysis
VisualAnalysis(df, sample=0.5, seed=17)
The value of the seed has to be an integer between 0
and 2**32-1
inclusive.
Colors¶
Instead of using the default color, it is possible to pass custom colors to the VisualAnalysis object.
There are select_color
and deselect_color
which specify the colors to use to represent selected and deselected
data points. They can either be hex strings (like ‘#323EEC’) or tuples
representing RGB values (like (50, 62, 236)).
The parameter alpha specifies the opacity of the data points when applicable and is specified by a float value
between 0.0
and 1.0
. A value of 0.0
means transparent and 1.0
means opaque.
from pandas_visual_analysis import VisualAnalysis
VisualAnalysis(df, select_color='#323EEC', deselect_color='#8A8C93', alpha=0.75)