25 Jun 21:02

JGSweets

901c875

v0.5.2

Profiler

A library level seed value is now settable by the user to make the sampling during Profiling deterministic dp.set_seed #271
NumericalStats now include skewness, kurtosis, Counter Zeros, and Count Negatives #266, #267, #272, #273
User can turn off bias correction for variance, skewness, and kurtosis #269
Sum is returned in NumericalStats Profiles #264

Runtime Changes

Warnings will be issued when invalid is received by the NumericalStats profilers #280

Bug fixes

Default values for variance, skewness, and kurtosis are np.nan #275
Options no longer propagate to all levels when setting a single level property unless a wildcard is specified e.g. *.is_enabled #270

Other Changes

Documentation on contributions added #268
Github Pages updated #284 #285, #287, #288

Assets 2

08 Jun 15:23

JGSweets

0.5.1

9e3c82b

v0.5.1

Bug fixes

Fix merging UnstructuredProfiler #255
Fix bug in saving profiles without a labeler #257

Other Changes

Documentation: Add UnstructuredProfiler examples #252

Assets 2

02 Jun 16:22

lettergram

0.5.0

946c396

v0.5.0

Runtime Changes

Major release, unstructured profiles can now be generated

Profiler

Unstructured Profiler enabled, profiles can be generated on the TextData class
Factory Class automatically selects UnstructuredProfiler vs StructuredProfiler

Assets 2

24 May 15:57

lettergram

0.4.6

821813d

v0.4.6

Bug fixes

Fix histogram index out of range #217
Locking to required TensorFlow < 2.5.0, Tensorflow==2.5.0 has an issue #220
Remove depreciated AVRO file formats #220
Fix padding issue related to numpy #225
Remove pad in output of labeler #226

Other changes

histogram utils now use the builtin numpy functions #213

Assets 2

30 Apr 18:15

lettergram

v0.4.5

57040ec

0.4.5

Runtime Changes

Minor release, fixes bugs around null counts.

Assets 2

26 Apr 19:43

lettergram

0.4.4

0184d69

v0.4.4

Runtime Changes

Minor release, fixes bugs and adds save & load of profiles

Profiler

Enables saving & loading a Profile

Bug fixes

data can be None when checking length
Corrected row_has_null and row_is_null on update / adding
Ensured row statistics are appropriately calculated when subsampled
Minor bug fixes

Assets 2

22 Apr 19:15

lettergram

0.4.3

2238d32

v0.4.3

Runtime Changes

Migrating from v0.4.2 to v0.4.3 should result in a 30-90% reduction in profiling time.
Largely dependent on system resources and data size.

Notes

Remove requirement for tensorflow-addons
Library now works with tensorflow nightly (Python 3.9)
Added example on generating a new data labeler

Profiler

Multiprocessing data preprocessing
Improved histogram accuracy
Reduced histogram generation runtime
Option to set the bin count for histogram
Expanded precision and switch to precision estimation (as opposed to exact calculations)
Limit pool size based on cpu and memory limitations

Data

Improved JSON detection method
- Option (default) pulls metadata and data separately (data.meta and data.data)
- data.meta would be part of the JSON which contains no records
- data.data would be part of the JSON which contains records
- Added option to select keys which represent records

Report

Precision report now contains additional details

"precision": {
   'min': int,
   'max': int,
   'mean': float,
   'var': float,
   'std': float,
   'sample_size': int,
   'margin_of_error': float,
   'confidence_level': float		
},

Bug fixes

Fixed error in merging options
Fixed issue related to merging DateTimeColumns
Fixed multiprocessing on OSX
Fixed row calculations if min_true_samples is greater than zero

Assets 2

06 Apr 18:51

lettergram

0.4.2

f766ce7

v0.4.2

Runtime Changes

Notes

This update reduces runtime by on average 50%.

Profiler

Add support for HistogramOptions
Add multiprocessing support
Reduced runtime for shuffling indices
Vectorized precision function
Improved unique set & vocab merging
By default histogram only runs 'auto' bin edge detection

Data

Add length attribute to the data class data.length() or len(data)

Report

Added optional omit_keys to the report options function, remove keys from the final report
Added row_has_null_count (global), one or more nulls in the row
Added row_is_null_count (global), the entire row is null
Rename total_samples (global) -> row_count
Rename label BACKGROUND -> UNKNOWN (column)
Removed covariance (global)
Removed data_classification (global)
Removed data_label_probability (column)
Removed median (column)

Bug fixes

Accurate null count and total_samples on profile updates
Each column now receives the same sampled indices; enabling row_is_null_count

Assets 2

25 Mar 16:34

lettergram

0.4.1

d1be6d8

v0.4.1

BUGFIX: Enables running data profiler without the TensorFlow library

v0.4.0

New Features

Reduce profiling memory usage by ~50%
Reduce profiling runtime by >75%
Improve delimiter and header detection in delimited (CSV) data
Add progress notifications for profiling

Fixes

Adds warnings for sampling
Selects proper options on profile mergers
Fix repeated tensorflow warnings
Thresholds input for large CSV files by bytes or lines (whichever is smaller)

Assets 2

25 Mar 03:04

lettergram

0.4.0

f76ed25

v0.4.0

New Features

Reduce profiling memory usage by ~50%
Reduce profiling runtime by >75%
Improve delimiter and header detection in delimited (CSV) data
Add progress notifications for profiling

Fixes

Adds warnings for sampling
Selects proper options on profile mergers
Fix repeated tensorflow warnings
Thresholds input for large CSV files by bytes or lines (whichever is smaller)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiler

Runtime Changes

Bug fixes

Other Changes

Bug fixes

Other Changes

Runtime Changes

Profiler

Bug fixes

Other changes

Runtime Changes

Runtime Changes

Profiler

Bug fixes

Runtime Changes

Notes

Profiler

Data

Report

Bug fixes

Runtime Changes

Notes

Profiler

Data

Report

Bug fixes

v0.4.0

Releases: capitalone/DataProfiler

v0.5.2

Profiler

Runtime Changes

Bug fixes

Other Changes

v0.5.1

Bug fixes

Other Changes

v0.5.0

Runtime Changes

Profiler

v0.4.6

Bug fixes

Other changes

0.4.5

Runtime Changes

v0.4.4

Runtime Changes

Profiler

Bug fixes

v0.4.3

Runtime Changes

Notes

Profiler

Data

Report

Bug fixes

v0.4.2

Runtime Changes

Notes

Profiler

Data

Report

Bug fixes

v0.4.1

v0.4.0

v0.4.0