Menu Close

Best Practices in Machine Learning

I luckily read two wonderful articles (one of them was shared by @Boke89707488 Dr. Bo Ke) that are telling me the best practices using machine learning.

The first one is “Establishment of Best Practices for Evidence for Prediction” by Russell A. Poldrack, PhD; Grace Huckins, MSc; Gael Varoquaux, PhD.

Best Practices for Predictive Modeling:

In-sample model fit indices should not be reported as evidence.”

Always validate your models using the model-unseen test data for the report.

“The cross-validation procedure should encompass all operations
applied to the data. In particular, predictive analyses should not be performed on data after variable selection if the variable selection was informed to any degree by the data themselves (ie, post hot cross-validation). Otherwise, estimated predictive accuracy will be inflated owing to circularity.”

More understanding of this point will be explained at the end of this post.

“Prediction analyses should not be performed with samples smaller
than several hundred observations”

More understanding of this point will be explained at the end of this post.

Multiple measures of prediction accuracy should be examined and
reported. For regression analyses, measures of variance, such as R2, should be accompanied by measures of unsigned error, such as mean squared error or mean absolute error. For classification analyses, accuracy should be reported separately for each class, and a measure of accuracy that is insensitive to relative class frequencies, such as area under the receiver operating characteristic curve, should be reported.

Definitely.

“The coefficient of determination should be computed by using the
sums-of-squares formulation rather than by squaring the correlation coefficient”

Sure.

k-fold cross-validation, with k in the range of 5 to 10 should be
used rather than leave-one-out cross-validation because the
testing set in leave-one-out cross-validation is not representative of the whole data and is often anti-correlated with the training set”

More considerations should be mentioned. Please see the below about the practical way of doing cross-validation.

The second article is “Machine learning algorithm validation with a
limited sample size
” by Andrius VabalasI, Emma Gowen, Ellen Poliakoff, Alexander J. Casson

I do love the figure in this article, which can tell exactly how to do cross-validation.

From work by Andrius VabalasI, Emma Gowen, Ellen Poliakoff, Alexander J. Casson

More importantly, we should remember this

Our simulations show that K-fold Cross-Validation (CV)
produces strongly biased performance estimates with small sample sizes, and the bias is
still evident with sample size of 1000. Nested CV and train/test split approaches produce
robust and unbiased performance estimates regardless of sample size

Reference: 

Poldrack RA, Huckins G, Varoquaux G. Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry. 2020;77(5):534–540. doi:10.1001/jamapsychiatry.2019.3671

Vabalas A, Gowen E, Poliakoff E, Casson AJ (2019) Machine learning algorithm validation with a limited sample size. PLoS ONE 14(11): e0224365. https://doi.org/10.1371/journal.pone.0224365

“Face detection in untrained deep neural networks” by Baek, S., Song, M., Jang, J. et al. 

Original article: https://doi.org/10.1038/s41467-021-27606-9

I do love this work for many reasons! 

Summary

It demonstrates that face-selectivity can emerge from untrained deep neural networks, whose weights are randomly initialized.  

The author found that units selective to faces emerge robustly in randomly initialized networks and that these units reproduce many characteristics observed in monkeys. ( They only claimed that their results aligned with other studies for face-selectivity in monkeys )

Scientific importance

I think that this work provides some suggestions to the following questions:

Whether this neuronal selectivity can arise innately or whether it requires training from the visual experience?

Where do innate cognitive functions in both biological and artificial neural networks come from?

“These findings may provide insight into the origin of innate cognitive functions
in both biological and artificial neural networks.”

Is face-selectivity a particular type of neuronal properties or is selectivity one common property for face and other objects?

I partially agree with this work that selectivity is one common property for faces and other objects. However, I also believe that “face” is also special in terms of playing a key role in social interaction.

My question

However, I still do not know how technically (or physically or biologically) it could develop face-selectivity in untrained deep neural networks (or in a primate brain).

Do you think that the key factor for the development of the “phenomena” is the feed-forward connections rather than the statistical complexity embedded in each hierarchical circuit? Or maybe both?

Reference:

Baek, S., Song, M., Jang, J. et al. Face detection in untrained deep neural networks. Nat Commun 12, 7328 (2021). https://doi.org/10.1038/s41467-021-27606-9

“Adversarially Robust is A Big Deal”

It is interesting and surprising to see a tweet from Patrick Mineault with a review for the adversarial attack issue on the current artificial neural networks. Here, I just want to put it into my note where I will look it back later on.