NannyML
v0.8.2

Contents:

  • Quickstart
    • What is NannyML?
    • Installing NannyML
    • Contents of the Quickstart
    • Just the code
    • Walkthrough
      • Estimating Performance without Targets
      • Detecting Data Drift
    • Insights
    • What next
  • Tutorials
    • Data requirements
      • Data Periods
        • Reference Period
        • Analysis Period
      • Columns
        • Timestamp
        • Target
        • Features
      • Model Output columns
        • Predicted class probabilities
        • Prediction class labels
      • NannyML Functionality Requirements
      • What next
    • Estimating Performance
      • Why Perform Performance Estimation
      • Estimating Performance for Binary Classification
        • Just The Code
        • Walkthrough
        • Insights
        • What’s next
      • Estimating Performance for Multiclass Classification
        • Just The Code
        • Walkthrough
        • Insights
        • What’s next
      • Estimating Performance for Regression
        • Just The Code
        • Walkthrough
        • Insights
        • What’s next
    • Monitoring Realized Performance
      • Why Monitoring Realized Performance
      • Monitoring Realized Performance for Binary Classification
        • Just The Code
        • Walkthrough
        • Insights
        • What Next
      • Monitoring Realized Performance for Multiclass Classification
        • Just The Code
        • Walkthrough
        • Insights
        • What Next
      • Monitoring Realized Performance for Regression
        • Just The Code
        • Walkthrough
        • Insights
        • What Next
    • Comparing Estimated and Realized Performance
    • Detecting Data Drift
      • Univariate Drift Detection
        • Just The Code
        • Walkthrough
        • Insights
        • What Next
      • Multivariate Drift Detection
        • Why Perform Multivariate Drift Detection
        • Just The Code
        • Walkthrough
        • Insights
        • What Next
    • Ranking
      • Just The Code
      • Walkthrough
        • Alert Count Ranking
        • Correlation Ranking
      • Insights
      • What’s Next
    • Persisting calculators
      • Just the code
      • Walkthrough
        • What’s Next
    • Working with results
      • What are NannyML Results?
      • Just the code
      • Walkthrough
    • Adjusting Plots
    • Chunking
      • Why do we need chunks?
      • Walkthrough on creating chunks
        • Time-based chunking
        • Size-based chunking
        • Number-based chunking
        • Automatic chunking
      • Chunks on plots with results
  • How It Works
    • Estimation of Performance of the Monitored Model
      • Confidence-based Performance Estimation (CBPE)
        • The Intuition
        • Implementation details
          • Binary classification
          • Multiclass Classification
        • Assumptions and Limitations
        • Appendix: Probability calibration
      • Direct Loss Estimation (DLE)
        • The Intuition
        • Implementation details
        • Assumptions and limitations
      • Other Approaches to Estimate Performance of Regression Models
        • Bayesian approaches
        • Conformalized Quantile Regression
        • Conclusions from Bayesian and Conformalized Quantile Regression approaches
    • Presenting Univariate Drift Detection Methods
      • Methods for Continuous Features
        • Kolmogorov-Smirnov Test
        • Jensen-Shannon Distance
        • Wasserstein Distance
        • Hellinger Distance
      • Methods for Categorical Variables
        • Chi-squared Test
        • Jensen-Shannon Distance
        • Hellinger Distance
        • L-Infinity Distance
    • Choosing Univariate Drift Detection Methods
      • Comparison of Methods for Continuous Variables
        • Shifting the Mean of the Analysis Data Set
        • Shifting the Standard Deviation of the Analysis Data Set
        • Tradeoffs of The Kolmogorov-Smirnov Statistic
        • Tradeoffs of Jensen-Shannon Distance and Hellinger Distance
          • Experiment 1
          • Experiment 2
        • Tradeoffs of Wasserstein Distance
          • Experiment 1
          • Experiment 2
          • Experiment 3
      • Comparison of Methods for Categorical Variables
        • Sensitivity to Sample Size of Different Drift Measures
        • Behavior When a Category Slowly Disappears
        • Behavior When Observations from a New Category Occur
        • Effect of Sample Size on Different Drift Measures
        • Effect of the Number of Categories on Different Drift Measures
        • Comparison of Drift Methods on Data Sets with Many Categories
      • Results Summary (TLDR)
        • Methods for Continuous Variables
        • Methods For Categorical Variables
    • Ranking
      • Alert Count Ranking
      • Correlation Ranking
    • Data Reconstruction with PCA
      • Limitations of Univariate Drift Detection
        • “Butterfly” Dataset
      • Data Reconstruction with PCA
      • Understanding Reconstruction Error with PCA
        • Reconstruction Error with PCA on the butterfly dataset
    • Chunking Considerations
      • Not Enough Chunks
      • Not Enough Observations in Chunk
      • Impact of Chunk Size on Reliability of Results
    • Calculating Sampling Error
      • Defining Sampling Error from Standard Error of the Mean
      • Sampling Error Estimation and Interpretation for NannyML features
        • Performance Estimation
        • Performance Monitoring
        • Multivariate Drift Detection with PCA
        • Univariate Drift Detection
      • Assumptions and Limitations
  • Examples
    • Binary Classification: California Housing Dataset
      • Load and prepare data
      • Performance Estimation
      • Comparison with the actual performance
      • Drift detection
  • Example Datasets
    • Synthetic Binary Classification Dataset
      • Problem Description
      • Dataset Description
    • Synthetic Multiclass Classification Dataset
      • Problem Description
      • Dataset Description
    • California Housing Dataset
      • Modifying California Housing Dataset
      • Enriching the data
      • Training a Machine Learning Model
      • Meeting NannyML Data Requirements
    • Synthetic Regression Dataset
      • Problem Description
      • Dataset Description
  • Glossary
  • Command Line Interface (CLI)
    • Running the CLI
      • Installation
      • Configuration
    • Configuration file
      • Locations
      • Format
        • Input section
        • Output section
          • Writing to filesystem
          • Writing to a pickle file
          • Writing to a relational database
        • Column mapping section
        • Store section
        • Chunker section
        • Scheduling section
        • Standalone parameters section
      • Templating paths
      • Examples
    • Command overview
      • run
        • Syntax
        • Options
        • Example
  • Usage logging in NannyML
    • TLDR
    • What do we mean by usage statistics?
      • What about personal data
      • What about my dataset?
    • Why are we doing this?
      • Improving NannyML and prioritizing new features
      • Surviving as a company
    • How usage logging works
    • To opt in or not to opt in, that’s the question
    • How to disable usage logging
      • Setting the environment variable
      • Providing a .env file
      • Turning off user analytics in code
  • API reference
    • nannyml package
      • Subpackages
        • nannyml.cli package
          • Submodules
          • Module contents
        • nannyml.datasets package
          • Subpackages
          • Submodules
          • Module contents
        • nannyml.drift package
          • Subpackages
          • Submodules
          • Module contents
        • nannyml.io package
          • Subpackages
          • Submodules
          • Module contents
        • nannyml.performance_calculation package
          • Subpackages
          • Submodules
          • Module contents
        • nannyml.performance_estimation package
          • Subpackages
          • Module contents
        • nannyml.plots package
          • Subpackages
          • Submodules
          • Module contents
        • nannyml.sampling_error package
          • Submodules
          • Module contents
      • Submodules
        • nannyml.analytics module
          • SegmentUsageTracker
          • UsageEvent
          • UsageTracker
          • track()
        • nannyml.base module
          • AbstractCalculator
          • AbstractCalculatorResult
          • AbstractEstimator
          • AbstractEstimatorResult
        • nannyml.calibration module
          • Calibrator
          • CalibratorFactory
          • IsotonicCalibrator
          • NoopCalibrator
          • needs_calibration()
        • nannyml.chunk module
          • Chunk
          • Chunker
          • ChunkerFactory
          • CountBasedChunker
          • DefaultChunker
          • PeriodBasedChunker
          • SizeBasedChunker
        • nannyml.config module
          • ChunkerConfig
          • ColumnMapping
          • Config
          • CronSchedulingConfig
          • DatabaseWriterConfig
          • FileStoreConfig
          • InputConfig
          • InputDataConfig
          • IntervalSchedulingConfig
          • PickleWriterConfig
          • RawFileWriterConfig
          • SchedulingConfig
          • StoreConfig
          • TargetDataConfig
          • WriterConfig
          • get_config_path()
        • nannyml.exceptions module
          • CalculatorException
          • CalculatorNotFittedException
          • ChunkerException
          • DeserializeException
          • EstimatorException
          • IOException
          • InvalidArgumentsException
          • InvalidReferenceDataException
          • MissingMetadataException
          • NotFittedException
          • ReaderException
          • SerializeException
          • StoreException
          • WriterException
        • nannyml.runner module
          • run()
        • nannyml.usage_logging module
          • SegmentUsageTracker
          • UsageEvent
          • UsageLogger
          • disable_usage_logging()
          • enable_usage_logging()
          • get_logger()
          • log_usage()
      • Module contents
  • Contributing
    • Spread the word
    • Be a part of the team
    • Contribute to the codebase
      • Get started coding
      • Pull Request Guidelines
      • Tips
NannyML
  • »
  • How It Works
  • Edit on GitHub

How It Works¶

  • Estimation of Performance of the Monitored Model
    • Confidence-based Performance Estimation (CBPE)
    • Direct Loss Estimation (DLE)
    • Other Approaches to Estimate Performance of Regression Models
  • Presenting Univariate Drift Detection Methods
    • Methods for Continuous Features
    • Methods for Categorical Variables
  • Choosing Univariate Drift Detection Methods
    • Comparison of Methods for Continuous Variables
    • Comparison of Methods for Categorical Variables
    • Results Summary (TLDR)
  • Ranking
    • Alert Count Ranking
    • Correlation Ranking
  • Data Reconstruction with PCA
    • Limitations of Univariate Drift Detection
    • Data Reconstruction with PCA
    • Understanding Reconstruction Error with PCA
  • Chunking Considerations
    • Not Enough Chunks
    • Not Enough Observations in Chunk
    • Impact of Chunk Size on Reliability of Results
  • Calculating Sampling Error
    • Defining Sampling Error from Standard Error of the Mean
    • Sampling Error Estimation and Interpretation for NannyML features
    • Assumptions and Limitations
Next Previous

© Copyright 2022, NannyML. Revision 8e8c6063.

Built with Sphinx using a theme provided by Read the Docs.
Read the Docs v: v0.8.2
Versions
latest
stable
v0.8.2
v0.8.1
v0.8.0
v0.7.0
v0.6.3
v0.6.2
v0.6.1
v0.6.0
v0.5.3
v0.5.2
v0.5.1
v0.5.0
v0.4.1
v0.4.0
v0.3.2
v0.3.1
v0.3.0
v0.2.1
v0.2.0
main
Downloads
On Read the Docs
Project Home
Builds