Документ взят из кэша поисковой машины. Адрес оригинального документа : http://hea-www.harvard.edu/AstroStat/Stat310_1112/ab_20120221.pdf
Дата изменения: Tue Feb 21 20:45:08 2012
Дата индексирования: Tue Oct 2 04:18:44 2012
Кодировка:

Поисковые слова: universe
MIC Discussion

Discussion of the Maximal Information Coefficient
Alexander W Blocker http://www.awblocker.com/

Feb 21 2012


MIC Discussion Outline

Outline

1

Defining MIC Subtleties & technical issues Simon & Tibshirani's response Broader concerns & lessons

2

3

4


MIC Discussion Defining MIC

Outline

1

Defining MIC Subtleties & technical issues Simon & Tibshirani's response Broader concerns & lessons

2

3

4


MIC Discussion Defining MIC

Motivation

Have high-dimensional dataset 100s-1000s of variables; often fewer observations than variables Goal: find novel bivariate relationships General definition of relationships (not just nonlinear, even nonfunctional) "Equitable" wrt different types of relationships Alternative to manual search (according to authors)


MIC Discussion Defining MIC

Generality & equitability
Stated goals of the method (heuristic) Generality: ability to detect broad range of relationships
Includes nonfunctional Also want "noncoexistence" and mixtures of functions

Equitability: similar scoring of "equally noisy relationships of different types"
Harder to pin down; asymptotic? How do nonfunctional fit? Symmetry complications; predictive distribution from sinusoid, e.g.


MIC Discussion Defining MIC

Technical definition

Start from scatterplot Consider grid on scatterplot Define mutual information of empirical distribution on grid I

G

KL divergence of factored distribution from actual joint Always 0 Information-theoretic measure of dependence; compression interpretation
From Figure 1 of Reshef et al. 2011


MIC Discussion Defining MIC

Technical definition, continued

Now, fix grid size (x , y ) Maximize IG over grid layouts IG Normalize to M
x ,y

=

IG log min{x ,y }

Maximize again over (x , y ) s.t. x , y < B (n ) M M is MIC for pair of variables

From Figure 1 of Reshef et al. 2011


MIC Discussion Defining MIC

Computation, briefly

Hard to do this maximization Approximate search methods needed Dynamic-programming based solution Quite fast


MIC Discussion Defining MIC

Properties

MIC, as defined: Symmetric (from MI symmetry) 0 iff variables independent (with B (n) conditions) 1 for functionally related variables Lower bound linked to R 2 for noisy functional relationships


MIC Discussion Defining MIC

Initial statistical reaction

That sounds great


MIC Discussion Defining MIC

Initial statistical reaction

That sounds great But it can't be a panacea


MIC Discussion Defining MIC

Initial statistical reaction

That sounds great But it can't be a panacea Must have lower power than, e.g., F-test for linear Nonfunctional multimodal predictive distribution; harder than nonparametric regression Huge multiple comparisons problem


MIC Discussion Defining MIC

Initial statistical reaction

That sounds great But it can't be a panacea Must have lower power than, e.g., F-test for linear Nonfunctional multimodal predictive distribution; harder than nonparametric regression Huge multiple comparisons problem And we have theorems


MIC Discussion Subtleties & technical issues

Outline

1

Defining MIC Subtleties & technical issues Simon & Tibshirani's response Broader concerns & lessons

2

3

4


MIC Discussion Subtleties & technical issues

There's always a tuning parameter

Nonparametric techniques nearly always have smoothness parameters Kernel width, number of knots, penalty weight, etc. Require careful attention to ensure validity and efficiency


MIC Discussion Subtleties & technical issues

There's always a tuning parameter

Nonparametric techniques nearly always have smoothness parameters Kernel width, number of knots, penalty weight, etc. Require careful attention to ensure validity and efficiency Here, it's grid size B (n) Large B (n) overfitting; find structure in everything Small B (n) oversmoothing; miss noisy/subtle structure


MIC Discussion Subtleties & technical issues

Pathological cases & overfitting

Showed that surely So, B (n) too If B (n) = O ( In particular,

B (n) = (n

1+

), > 0 M 1 almost

large does overfit n1- ), > 0, MIC converges to correct value this implies MIC 0 for independent RVs


MIC Discussion Subtleties & technical issues

Choice of B (n) -- published method

Selected B (n) via simulation in paper Showed B (n) = n1- had proper limits under independence Settled on B (n) = n0.6 Rationale not apparent; no power or predictive checks


MIC Discussion Subtleties & technical issues

What about the coefficient?

Usually need both rate and coefficient for smoothness parameters Standard to get both in nonparametric statistics Rates analytically, coefficient estimated/approximated Neither completely handled here Could compromise power


MIC Discussion Simon & Tibshirani's response

Outline

1

Defining MIC Subtleties & technical issues Simon & Tibshirani's response Broader concerns & lessons

2

3

4


MIC Discussion Simon & Tibshirani's response

Simulations

Simon and Tibshirani addressed power concerns directly Simulated from range of relationships with Gaussian noise Varied noise scale over factor of 3 Evaluated frequentist power at FPR of 0.05 Compared to Pearson and Brownian distance correlation


MIC Discussion Simon & Tibshirani's response

Brownian distance correlation

Published by Szґ ely and Rizzo in AoAS (2009) ek Uses distances between points and Brownian process approx Tuning parameter is power on distance Easy to compute (energy R package)


MIC Discussion Simon & Tibshirani's response

Power comparisons
Alright for short-period sine wave and circular


MIC Discussion Simon & Tibshirani's response

Power comparisons, continued
Underpowered for linear and cubic, as expected


MIC Discussion Simon & Tibshirani's response

Power comparisons, continued
Surprisingly poor for X
1/4

and step functions


MIC Discussion Simon & Tibshirani's response

Power comparisons, continued
Alright, but not dominant, for long-period sine and quadratic


MIC Discussion Simon & Tibshirani's response

Discussion

As expected, there's no free lunch here Model-free method means less power for MIC Looking for extremely general forms of structure; inevitable tradeoffs Distance correlation is surprisingly good


MIC Discussion Broader concerns & lessons

Outline

1

Defining MIC Subtleties & technical issues Simon & Tibshirani's response Broader concerns & lessons

2

3

4


MIC Discussion Broader concerns & lessons

Note

Concerns here are not particular to the Reshef et al. paper.


MIC Discussion Broader concerns & lessons

Note

Concerns here are not particular to the Reshef et al. paper. However, it does raise some interesting questions on this overall direction of research.


MIC Discussion Broader concerns & lessons

Pitfalls & potential of broader approach

Searching a vast amount of raw data for complex relationships can be problematic Often find mainly artifacts of the measurement process Conversely, using preprocessed data can show effects of processing rather than science Discovery is good goal, but is this too general?
Semi-supervised approaches Hierarchical methods


MIC Discussion Broader concerns & lessons

Beyond bivariate

What types of complexity matter most? Increasing number of variables vs. increasing complexity Ideally both, but curse of dimensionality stings Often observe greater gains from covariates than complex low-dimensional structure Depends upon setting, of course


MIC Discussion Broader concerns & lessons

Independent detection vs. pooling information

Need to consider tradeoffs depending on richness of data per variable Little lost working independently with many data per variable With few observations per variable, pooling becomes more important Appears relevant even for some examples in paper (Spellman et al. data)


MIC Discussion Broader concerns & lessons

Example -- Spellman data
Could benefit from hierarchical modeling

From Figure 5 of Reshef et al. 2011


MIC Discussion Broader concerns & lessons

Next steps with discovery-oriented analyses

Exploration and discovery, then ? After exploration phase, want stronger scientific results Predictive models, mechanistic hypotheses, etc. Dangers of inference with detected variables Distinction between EDA and data reduction Keeping sight of core modeling challenges


MIC Discussion Broader concerns & lessons

Location and publication

Where should statistics research appear? Nature/Science vs. statistics journals MIC & power law papers (Science) Contrast with FDR development (Jeff Leek's comments)