DataSource

class pandas_visual_analysis.data_source.DataSource(df, categorical_columns=None, sample=None, seed=None, *args, **kwargs)[source]

Bases: object

The DataSource object provides the data itself to the plots and also manages the brushing between the plots. If the plots observe the brushed_indices property of this class, they can react to any change in the data. It is also possible to set the brushed_indices property to trigger the change in any instances that observe this property. In addition to the brushed indices, this class also provides the brushed data directly, which is cached to speed up subsequent access to the data.

Parameters
  • df (DataFrame) – A pandas.DataFrame object.

  • categorical_columns (Optional[List[str]]) – If given, specifies which columns are to be interpreted as categorical. Those columns have to include all columns of the DataFrame which have type object, str, bool or category. This means it can only add columns which do not have the aforementioned types.

  • seed (Optional[int]) – Random seed used for sampling the data. Values can be any integer between 0 and 2**32 - 1 inclusive or None.

  • args – args for HasTraits superclass

  • kwargs – kwargs for HasTraits superclass

property brushed_data

Only determines brushed data if it was invalidated by new selected indices. This gives more efficiency if only the brushed indices are needed and not the brushed data.

Return type

DataFrame

Returns

The selected data corresponding to the indices.

property brushed_indices
Return type

Set[int]

Returns

The currently selected indices.

property data
Return type

DataFrame

Returns

The DataFrame for this pandas_visual_analysis.data_source.DataSource object.

property indices
Return type

Set[int]

Returns

All indices of the data frame. This is a list from 0 to len-1.

property len
Return type

int

Returns

The length of the DataFrame.

static read(path, *args, **kwargs)[source]

Reads the data specified by the path into a DataSource. Infers file type by extension. Supported extensions are: .csv, .tsv and .json.

Parameters
  • path (str) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file.

  • args – Arguments passed to inferred methods.

  • kwargs – Keyword arguments passed to inferred methods.

Returns

The DataSource containing the data from the specified file.

static read_csv(path, header=0)[source]

Read a comma-separated values (csv) file into DataSource.

Parameters
  • path (str) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file.

  • header (Optional[int]) – Row (0-indexed) to use for the column labels of the parsed DataFrame. Use None if there is no header.

Returns

The DataSource containing the data from the specified file.

static read_json(path, orient)[source]

Read a json file into a DataSource.

Parameters
  • path (str) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file.

  • orient (str) – Indication of expected JSON string format produced by DataFrame.to_json() with a corresponding orient value.

Returns

The DataSource containing the data from the specified file.

static read_tsv(path, header=0)[source]

Read a tab-separated values (tsv) file into DataSource.

Parameters
  • path (str) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file.

  • header (Optional[int]) – Row (0-indexed) to use for the column labels of the parsed DataFrame. Use None if there is no header.

Returns

The DataSource containing the data from the specified file.

reset_selection()[source]

Reset all the indices to the original state, that is all indices are selected.

Returns

None

Advanced Usage of DataSource

For getting started see the basic usage guide.

Read Data

It is possible to read data directly from from files or URLs from the DataSource using default settings.

from pandas_visual_analysis import DataSource
ds = DataSource.read_csv("./mpg.csv")

To infer the file type from the extension use the read() method. Supported file types are: .csv, .tsv and .json.

from pandas_visual_analysis import DataSource
ds = DataSource.read("./mpg.json", orient="columns")

For more advanced options, use the functionality provided by Pandas and pass the DataFrame to DataSource normally.

Using DataSource as a context manager

Instead of assigning the DataSource object to a variable, it is also possible to use it as a context manager.

from pandas_visual_analysis import DataSource, VisualAnalysis
with DataSource.read("./report.tsv", header=1) as ds:
    VisualAnalysis(ds)