Why? To start, let’s breakdown and define the 2 basic components of a valid regression model:
Response = (Constant + Predictors) + Error
Another way we can say this is:
Response = Deterministic + Stochastic
The take-home message to me is that the residual represents the unpredictable error. By checking the residual plot, you can validate whether your predictors are missing some of the predictive information.
Residual plots can reveal unwanted residual patterns that indicate biased results more effectively than numbers.
The residuals should be centered on zero throughout the range of fitted values and normally distributed.
Now let’s look at a problematic residual plot. Keep in mind that the residuals should not contain any predictive information.
I luckily read two wonderful articles (one of them was shared by @Boke89707488 Dr. Bo Ke) that are telling me the best practices using machine learning.
“In-sample model fit indices should not be reported as evidence.”
Always validate your models using the model-unseen test data for the report.
“The cross-validation procedure should encompass all operations applied to the data. In particular, predictive analyses should not be performed on data after variable selection if the variable selection was informed to any degree by the data themselves (ie, post hot cross-validation). Otherwise, estimated predictive accuracy will be inflated owing to circularity.”
More understanding of this point will be explained at the end of this post.
“Prediction analyses should not be performed with samples smaller than several hundred observations”
More understanding of this point will be explained at the end of this post.
“Multiple measures of prediction accuracy should be examined and reported. For regression analyses, measures of variance, such asR2, should be accompanied by measures of unsigned error, suchas mean squared error or mean absolute error. For classificationanalyses, accuracy should be reported separately for each class,and a measure of accuracy that is insensitive to relative class frequencies, such as area under the receiver operating characteristic curve, should be reported.“
Definitely.
“The coefficient of determination should be computed by using the sums-of-squares formulation rather than by squaring the correlation coefficient”
Sure.
“k-fold cross-validation, with k in the range of 5 to 10 should be used rather than leave-one-out cross-validation because the testing set in leave-one-out cross-validation is not representative of the whole data and is often anti-correlated with the training set”
More considerations should be mentioned. Please see the below about the practical way of doing cross-validation.
I do love the figure in this article, which can tell exactly how to do cross-validation.
From work by Andrius VabalasI, Emma Gowen, Ellen Poliakoff, Alexander J. Casson
More importantly, we should remember this
Our simulations show that K-fold Cross-Validation (CV) produces strongly biased performance estimates with small sample sizes, and the bias is still evident with sample size of 1000. Nested CV and train/test split approaches produce robust and unbiased performance estimates regardless of sample size
Reference:
Poldrack RA, Huckins G, Varoquaux G. Establishment of Best Practices for Evidence for Prediction: A Review. JAMA Psychiatry. 2020;77(5):534–540. doi:10.1001/jamapsychiatry.2019.3671
Vabalas A, Gowen E, Poliakoff E, Casson AJ (2019) Machine learning algorithm validation with a limited sample size. PLoS ONE 14(11): e0224365. https://doi.org/10.1371/journal.pone.0224365
It demonstrates that face-selectivity can emerge from untrained deep neural networks, whose weights are randomly initialized.
The author found that units selective to faces emerge robustly in randomly initialized networks and that these units reproduce many characteristics observed in monkeys. ( They only claimed that their results aligned with other studies for face-selectivity in monkeys )
Scientific importance
I think that this work provides some suggestions to the following questions:
Whether this neuronal selectivity can arise innately or whether it requires training from the visual experience?
Where do innate cognitive functions in both biological and artificial neural networks come from?
“These findings may provide insight into the origin of innate cognitive functions in both biological and artificial neural networks.”
Is face-selectivity a particular type of neuronal properties or is selectivity one common property for face and other objects?
I partially agree with this work that selectivity is one common property for faces and other objects. However, I also believe that “face” is also special in terms of playing a key role in social interaction.
My question
However, I still do not know how technically (or physically or biologically) it could develop face-selectivity in untrained deep neural networks (or in a primate brain).
Do you think that the key factor for the development of the “phenomena” is the feed-forward connections rather than the statistical complexity embedded in each hierarchical circuit? Or maybe both?