DataSource¶

class pandas_visual_analysis.data_source.DataSource(df, categorical_columns=None, sample=None, seed=None, *args, **kwargs)[source]¶

Bases: object

The DataSource object provides the data itself to the plots and also manages the brushing between the plots. If the plots observe the brushed_indices property of this class, they can react to any change in the data. It is also possible to set the brushed_indices property to trigger the change in any instances that observe this property. In addition to the brushed indices, this class also provides the brushed data directly, which is cached to speed up subsequent access to the data.

Parameters

df (DataFrame) – A pandas.DataFrame object.
categorical_columns (Optional[List[str]]) – If given, specifies which columns are to be interpreted as categorical. Those columns have to include all columns of the DataFrame which have type object, str, bool or category. This means it can only add columns which do not have the aforementioned types.
seed (Optional[int]) – Random seed used for sampling the data. Values can be any integer between 0 and 2**32 - 1 inclusive or None.
args – args for HasTraits superclass
kwargs – kwargs for HasTraits superclass

property brushed_data¶

Only determines brushed data if it was invalidated by new selected indices. This gives more efficiency if only the brushed indices are needed and not the brushed data.

Return type: DataFrame
Returns: The selected data corresponding to the indices.

property brushed_indices¶

Return type: Set[int]
Returns: The currently selected indices.

property data¶

Return type: DataFrame
Returns: The DataFrame for this pandas_visual_analysis.data_source.DataSource object.

property indices¶

Return type: Set[int]
Returns: All indices of the data frame. This is a list from 0 to len-1.

property len¶

Return type: int
Returns: The length of the DataFrame.

static read(path, *args, **kwargs)[source]¶

Reads the data specified by the path into a DataSource. Infers file type by extension. Supported extensions are: .csv, .tsv and .json.

Parameters

path (str) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file.
args – Arguments passed to inferred methods.
kwargs – Keyword arguments passed to inferred methods.

Returns

The DataSource containing the data from the specified file.

static read_csv(path, header=0)[source]¶

Read a comma-separated values (csv) file into DataSource.

Parameters

path (str) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file.
header (Optional[int]) – Row (0-indexed) to use for the column labels of the parsed DataFrame. Use None if there is no header.

Returns

The DataSource containing the data from the specified file.

static read_json(path, orient)[source]¶

Read a json file into a DataSource.

Parameters

path (str) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file.
orient (str) – Indication of expected JSON string format produced by DataFrame.to_json() with a corresponding orient value.

Returns

The DataSource containing the data from the specified file.

static read_tsv(path, header=0)[source]¶

Read a tab-separated values (tsv) file into DataSource.

Parameters

path (str) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file.
header (Optional[int]) – Row (0-indexed) to use for the column labels of the parsed DataFrame. Use None if there is no header.

Returns

The DataSource containing the data from the specified file.

reset_selection()[source]¶

Reset all the indices to the original state, that is all indices are selected.

Returns: None

Advanced Usage of DataSource¶

For getting started see the basic usage guide.

Read Data¶

It is possible to read data directly from from files or URLs from the DataSource using default settings.

from pandas_visual_analysis import DataSource
ds = DataSource.read_csv("./mpg.csv")

To infer the file type from the extension use the read() method. Supported file types are: .csv, .tsv and .json.

from pandas_visual_analysis import DataSource
ds = DataSource.read("./mpg.json", orient="columns")

For more advanced options, use the functionality provided by Pandas and pass the DataFrame to DataSource normally.

Using DataSource as a context manager¶

Instead of assigning the DataSource object to a variable, it is also possible to use it as a context manager.

from pandas_visual_analysis import DataSource, VisualAnalysis
with DataSource.read("./report.tsv", header=1) as ds:
    VisualAnalysis(ds)

DataSource¶

Advanced Usage of DataSource¶

Read Data¶

Using DataSource as a context manager¶

Navigation

Related Topics