DataSource¶
-
class
pandas_visual_analysis.data_source.DataSource(df, categorical_columns=None, sample=None, seed=None, *args, **kwargs)[source]¶ Bases:
objectThe DataSource object provides the data itself to the plots and also manages the brushing between the plots. If the plots observe the brushed_indices property of this class, they can react to any change in the data. It is also possible to set the brushed_indices property to trigger the change in any instances that observe this property. In addition to the brushed indices, this class also provides the brushed data directly, which is cached to speed up subsequent access to the data.
- Parameters
df (
DataFrame) – A pandas.DataFrame object.categorical_columns (
Optional[List[str]]) – If given, specifies which columns are to be interpreted as categorical. Those columns have to include all columns of the DataFrame which have type object, str, bool or category. This means it can only add columns which do not have the aforementioned types.seed (
Optional[int]) – Random seed used for sampling the data. Values can be any integer between 0 and 2**32 - 1 inclusive or None.args – args for HasTraits superclass
kwargs – kwargs for HasTraits superclass
-
property
brushed_data¶ Only determines brushed data if it was invalidated by new selected indices. This gives more efficiency if only the brushed indices are needed and not the brushed data.
- Return type
DataFrame- Returns
The selected data corresponding to the indices.
-
property
brushed_indices¶ - Return type
Set[int]- Returns
The currently selected indices.
-
property
data¶ - Return type
DataFrame- Returns
The DataFrame for this
pandas_visual_analysis.data_source.DataSourceobject.
-
property
indices¶ - Return type
Set[int]- Returns
All indices of the data frame. This is a list from 0 to len-1.
-
property
len¶ - Return type
int- Returns
The length of the DataFrame.
-
static
read(path, *args, **kwargs)[source]¶ Reads the data specified by the path into a DataSource. Infers file type by extension. Supported extensions are: .csv, .tsv and .json.
- Parameters
path (
str) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file.args – Arguments passed to inferred methods.
kwargs – Keyword arguments passed to inferred methods.
- Returns
The DataSource containing the data from the specified file.
-
static
read_csv(path, header=0)[source]¶ Read a comma-separated values (csv) file into DataSource.
- Parameters
path (
str) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file.header (
Optional[int]) – Row (0-indexed) to use for the column labels of the parsed DataFrame. Use None if there is no header.
- Returns
The DataSource containing the data from the specified file.
-
static
read_json(path, orient)[source]¶ Read a json file into a DataSource.
- Parameters
path (
str) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file.orient (
str) – Indication of expected JSON string format produced by DataFrame.to_json() with a corresponding orient value.
- Returns
The DataSource containing the data from the specified file.
-
static
read_tsv(path, header=0)[source]¶ Read a tab-separated values (tsv) file into DataSource.
- Parameters
path (
str) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file.header (
Optional[int]) – Row (0-indexed) to use for the column labels of the parsed DataFrame. Use None if there is no header.
- Returns
The DataSource containing the data from the specified file.
Advanced Usage of DataSource¶
For getting started see the basic usage guide.
Read Data¶
It is possible to read data directly from from files or URLs from the DataSource using default settings.
from pandas_visual_analysis import DataSource
ds = DataSource.read_csv("./mpg.csv")
To infer the file type from the extension use the read() method. Supported file types are: .csv, .tsv and .json.
from pandas_visual_analysis import DataSource
ds = DataSource.read("./mpg.json", orient="columns")
For more advanced options, use the functionality provided by Pandas and pass the DataFrame to DataSource normally.
Using DataSource as a context manager¶
Instead of assigning the DataSource object to a variable, it is also possible to use it as a context manager.
from pandas_visual_analysis import DataSource, VisualAnalysis
with DataSource.read("./report.tsv", header=1) as ds:
VisualAnalysis(ds)