NannyML
v0.8.1

Contents:

  • Quickstart
    • What is NannyML?
    • Installing NannyML
    • Contents of the Quickstart
    • Just the code
    • Walkthrough
      • Estimating Performance without Targets
      • Detecting Data Drift
    • Insights
    • What next
  • Tutorials
    • Data requirements
      • Data Periods
        • Reference Period
        • Analysis Period
      • Columns
        • Timestamp
        • Target
        • Features
      • Model Output columns
        • Predicted class probabilities
        • Prediction class labels
      • NannyML Functionality Requirements
      • What next
    • Estimating Performance
      • Why Perform Performance Estimation
      • Estimating Performance for Binary Classification
        • Just The Code
        • Walkthrough
        • Insights
        • What’s next
      • Estimating Performance for Multiclass Classification
        • Just The Code
        • Walkthrough
        • Insights
        • What’s next
      • Estimating Performance for Regression
        • Just The Code
        • Walkthrough
        • Insights
        • What’s next
    • Monitoring Realized Performance
      • Why Monitoring Realized Performance
      • Monitoring Realized Performance for Binary Classification
        • Just The Code
        • Walkthrough
        • Insights
        • What Next
      • Monitoring Realized Performance for Multiclass Classification
        • Just The Code
        • Walkthrough
        • Insights
        • What Next
      • Monitoring Realized Performance for Regression
        • Just The Code
        • Walkthrough
        • Insights
        • What Next
    • Comparing Estimated and Realized Performance
    • Detecting Data Drift
      • Univariate Drift Detection
        • Just The Code
        • Walkthrough
        • Insights
        • What Next
      • Multivariate Drift Detection
        • Why Perform Multivariate Drift Detection
        • Just The Code
        • Walkthrough
        • Insights
        • What Next
    • Ranking
      • Just The Code
      • Walkthrough
        • Alert Count Ranking
        • Correlation Ranking
      • Insights
      • What’s Next
    • Working with results
      • What are NannyML Results?
      • Just the code
      • Walkthrough
    • Adjusting Plots
    • Chunking
      • Why do we need chunks?
      • Walkthrough on creating chunks
        • Time-based chunking
        • Size-based chunking
        • Number-based chunking
        • Automatic chunking
      • Chunks on plots with results
  • How It Works
    • Estimation of Performance of the Monitored Model
      • Confidence-based Performance Estimation (CBPE)
        • The Intuition
        • Implementation details
          • Binary classification
          • Multiclass Classification
        • Assumptions and Limitations
        • Appendix: Probability calibration
      • Direct Loss Estimation (DLE)
        • The Intuition
        • Implementation details
        • Assumptions and limitations
      • Other Approaches to Estimate Performance of Regression Models
        • Bayesian approaches
        • Conformalized Quantile Regression
        • Conclusions from Bayesian and Conformalized Quantile Regression approaches
    • Presenting Univariate Drift Detection Methods
      • Methods for Continuous Features
        • Kolmogorov-Smirnov Test
        • Jensen-Shannon Distance
        • Wasserstein Distance
        • Hellinger Distance
      • Methods for Categorical Variables
        • Chi-squared Test
        • Jensen-Shannon Distance
        • Hellinger Distance
        • L-Infinity Distance
    • Choosing Univariate Drift Detection Methods
      • Comparison of Methods for Continuous Variables
        • Shifting the Mean of the Analysis Data Set
        • Shifting the Standard Deviation of the Analysis Data Set
        • Tradeoffs of The Kolmogorov-Smirnov Statistic
        • Tradeoffs of Jensen-Shannon Distance and Hellinger Distance
          • Experiment 1
          • Experiment 2
        • Tradeoffs of Wasserstein Distance
          • Experiment 1
          • Experiment 2
      • Comparison of Methods for Categorical Variables
        • Sensitivity to Sample Size of Different Drift Measures
        • Behavior When a Category Slowly Disappears
        • Behavior When Observations from a New Category Occur
        • Effect of Sample Size on Different Drift Measures
        • Effect of the Number of Categories on Different Drift Measures
        • Comparison of Drift Methods on Data Sets with Many Categories
      • Results Summary (TLDR)
        • Methods for Continuous Variables
        • Methods For Categorical Variables
    • Ranking
      • Alert Count Ranking
      • Correlation Ranking
    • Data Reconstruction with PCA
      • Limitations of Univariate Drift Detection
        • “Butterfly” Dataset
      • Data Reconstruction with PCA
      • Understanding Reconstruction Error with PCA
        • Reconstruction Error with PCA on the butterfly dataset
    • Chunking Considerations
      • Not Enough Chunks
      • Not Enough Observations in Chunk
      • Impact of Chunk Size on Reliability of Results
    • Calculating Sampling Error
      • Defining Sampling Error from Standard Error of the Mean
      • Sampling Error Estimation and Interpretation for NannyML features
        • Performance Estimation
        • Performance Monitoring
        • Multivariate Drift Detection with PCA
        • Univariate Drift Detection
      • Assumptions and Limitations
  • Examples
    • Binary Classification: California Housing Dataset
      • Load and prepare data
      • Performance Estimation
      • Comparison with the actual performance
      • Drift detection
  • Example Datasets
    • Synthetic Binary Classification Dataset
      • Problem Description
      • Dataset Description
    • Synthetic Multiclass Classification Dataset
      • Problem Description
      • Dataset Description
    • California Housing Dataset
      • Modifying California Housing Dataset
      • Enriching the data
      • Training a Machine Learning Model
      • Meeting NannyML Data Requirements
    • Synthetic Regression Dataset
      • Problem Description
      • Dataset Description
  • Glossary
  • Command Line Interface (CLI)
    • Running the CLI
      • Installation
      • Configuration
    • Configuration file
      • Locations
      • Format
        • Input section
        • Output section
          • Writing to filesystem
          • Writing to a pickle file
          • Writing to a relational database
        • Column mapping section
        • Chunker section
        • Scheduling section
        • Standalone parameters section
      • Templating paths
      • Examples
    • Command overview
      • run
        • Syntax
        • Options
        • Example
  • Usage logging in NannyML
    • TLDR
    • What do we mean by usage statistics?
      • What about personal data
      • What about my dataset?
    • Why are we doing this?
      • Improving NannyML and prioritizing new features
      • Surviving as a company
    • How usage logging works
    • To opt in or not to opt in, that’s the question
    • How to disable usage logging
      • Setting the environment variable
      • Providing a .env file
      • Turning off user analytics in code
  • API reference
    • nannyml package
      • Subpackages
        • nannyml.cli package
          • Submodules
          • Module contents
        • nannyml.datasets package
          • Subpackages
          • Submodules
          • Module contents
        • nannyml.drift package
          • Subpackages
          • Submodules
          • Module contents
        • nannyml.io package
          • Subpackages
          • Submodules
          • Module contents
        • nannyml.performance_calculation package
          • Subpackages
          • Submodules
          • Module contents
        • nannyml.performance_estimation package
          • Subpackages
          • Module contents
        • nannyml.plots package
          • Subpackages
          • Submodules
          • Module contents
        • nannyml.sampling_error package
          • Submodules
          • Module contents
      • Submodules
        • nannyml.analytics module
          • SegmentUsageTracker
          • UsageEvent
          • UsageTracker
          • track()
        • nannyml.base module
          • AbstractCalculator
          • AbstractCalculatorResult
          • AbstractEstimator
          • AbstractEstimatorResult
        • nannyml.calibration module
          • Calibrator
          • CalibratorFactory
          • IsotonicCalibrator
          • NoopCalibrator
          • needs_calibration()
        • nannyml.chunk module
          • Chunk
          • Chunker
          • ChunkerFactory
          • CountBasedChunker
          • DefaultChunker
          • PeriodBasedChunker
          • SizeBasedChunker
        • nannyml.config module
          • ChunkerConfig
          • ColumnMapping
          • Config
          • CronSchedulingConfig
          • DatabaseWriterConfig
          • InputConfig
          • InputDataConfig
          • IntervalSchedulingConfig
          • PickleWriterConfig
          • RawFileWriterConfig
          • SchedulingConfig
          • TargetDataConfig
          • WriterConfig
          • get_config_path()
        • nannyml.exceptions module
          • CalculatorException
          • CalculatorNotFittedException
          • ChunkerException
          • EstimatorException
          • IOException
          • InvalidArgumentsException
          • InvalidReferenceDataException
          • MissingMetadataException
          • NotFittedException
          • ReaderException
          • WriterException
        • nannyml.runner module
          • run()
        • nannyml.usage_logging module
          • SegmentUsageTracker
          • UsageEvent
          • UsageLogger
          • disable_usage_logging()
          • enable_usage_logging()
          • get_logger()
          • log_usage()
      • Module contents
  • Contributing
    • Spread the word
    • Be a part of the team
    • Contribute to the codebase
      • Get started coding
      • Pull Request Guidelines
      • Tips
NannyML
  • Welcome to NannyML’s documentation!
  • Edit on GitHub

Welcome to NannyML’s documentation!

PyPi Supported versions coverage

Contents:

  • Quickstart
    • What is NannyML?
    • Installing NannyML
    • Contents of the Quickstart
    • Just the code
    • Walkthrough
    • Insights
    • What next
  • Tutorials
    • Data requirements
    • Estimating Performance
    • Monitoring Realized Performance
    • Comparing Estimated and Realized Performance
    • Detecting Data Drift
    • Ranking
    • Working with results
    • Adjusting Plots
    • Chunking
  • How It Works
    • Estimation of Performance of the Monitored Model
    • Presenting Univariate Drift Detection Methods
    • Choosing Univariate Drift Detection Methods
    • Ranking
    • Data Reconstruction with PCA
    • Chunking Considerations
    • Calculating Sampling Error
  • Examples
    • Binary Classification: California Housing Dataset
  • Example Datasets
    • Synthetic Binary Classification Dataset
    • Synthetic Multiclass Classification Dataset
    • California Housing Dataset
    • Synthetic Regression Dataset
  • Glossary
  • Command Line Interface (CLI)
    • Running the CLI
    • Configuration file
    • Command overview
  • Usage logging in NannyML
    • TLDR
    • What do we mean by usage statistics?
    • Why are we doing this?
    • How usage logging works
    • To opt in or not to opt in, that’s the question
    • How to disable usage logging
  • API reference
    • nannyml package
  • Contributing
    • Spread the word
    • Be a part of the team
    • Contribute to the codebase

Indices and tables

  • Index

  • Module Index

  • Search Page

Next

© Copyright 2022, NannyML. Revision 9c452df9.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: v0.8.1
Versions
latest
stable
v0.8.1
v0.8.0
v0.7.0
v0.6.3
v0.6.2
v0.6.1
v0.6.0
v0.5.3
v0.5.2
v0.5.1
v0.5.0
v0.4.1
v0.4.0
v0.3.2
v0.3.1
v0.3.0
v0.2.1
v0.2.0
main
Downloads
On Read the Docs
Project Home
Builds