Non-local Neural Networks

Peng Liu June 23, 2018

Abstract

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method [4] in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our nonlocal models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available.

Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He

Local & non-local

Local computing

Small Receptive field

Such as 3×3, 5×5 kernel

Focus On local region convolution computing

Learn More local features but less global information

Non-local Local computing

Fully connection convolutional layer is an example

Another way to increase the receptive field

Stack more convolutional layers

Use the large size of filters

Motivation & Main Idea

a non-local operation computes the response at a position as a weighted sum of the features at all positions in the input feature maps

Related Work

Non-local means [4] is a classical filtering algorithm that computes a weighted mean of all pixels in an image.

CVPR 2005; A Non-local Algorithm for Image Denoising; Antoni Buades, Bartomeu Coll, Jean-Michel Morel

The non-local behavior in Eq.(1) is due to the fact that all positions (∀j) are considered in the operation

More details:Non-local Neural Networks__Club_v1

Code: https://github.com/titu1994/keras-non-local-nets

Reference

[1] Non-Local Neural Networks

Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7794-7803

[2] Videos as Space-Time Region Graphs

Xiaolong Wang, Abhinav Gupta Submitted to ECCV 2018 The Follow-up Wor

[3] zhuanlan.zhihu.com/p/33345791

What is the difference between Bagging and Boosting?

Peng Liu June 13, 2018

Recently, I am wondering to join the Kaggle competition community. First, I am interested in data science. Although I have computer science background and also been doing machine learning stuff, I have not mastered the real skills and knowledge in projects. Second, I need to extend my research sense to reality. I mean money. The fundings each competition are attracting to me. But the most important is to know and learn something from many excellent new friends. Anyway, I believe this is a right decision, which is going to help a lot from research, jobs, and my own companies.

Bagging and Boosting are both ensemble methods in Machine Learning, but what’s the key behind them?

Bagging and Boosting are similar in that they are both ensemble techniques, where a set of weak learners are combined to create a strong learner that obtains better performance than a single one. So, let’s start at the beginning:

What is an ensemble method?

”Ensemble” is a Machine Learning concept in which the idea is to train multiple models using the same learning algorithm. The ensembles take part in a bigger group of methods, called multi-classifiers, where a set of hundreds or thousands of learners with a common objective are fused together to solve the problem.

The second group of multi-classifiers contains the hybrid methods. They use a set of learners too, but they can be trained using different learning techniques. Stacking is the most well-known. If you want to learn more about Stacking, you can read my previous post, “Dream team combining classifiers“.

The main causes of error in learning are due to noise, bias and variance. Ensemble helps to minimize these factors. These methods are designed to improve the stability and the accuracy of Machine Learning algorithms. Combinations of multiple classifiers decrease variance, especially in the case of unstable classifiers, and may produce a more reliable classification than a single classifier.

To use Bagging or Boosting you must select a base learner algorithm. For example, if we choose a classification tree, Bagging and Boosting would consist of a pool of trees as big as we want. Single Bagging and Boosting 1 learner N learners From 1 to N Algorithm Comparison Versus

How do Bagging and Boosting get N learners?

Bagging and Boosting get N learners by generating additional data in the training stage. N new training data sets are produced by random sampling with replacement from the original set. By sampling with replacement some observations may be repeated in each new training data set.

In the case of Bagging, any element has the same probability to appear in a new data set. However, for Boosting the observations are weighted and therefore some of them will take part in the new sets more often: Single Bagging and Boosting Training set Multiple sets Random sampling with replacement Over weighted data Algorithm Comparison Versus These multiple sets are used to train the same learner algorithm and therefore different classifiers are produced.

Why are the data elements weighted?

At this point, we begin to deal with the main difference between the two methods. While the training stage is parallel for Bagging (i.e., each model is built independently), Boosting builds the new learner in a sequential way: Single Bagging and Boosting Parallel Sequential Algorithm Comparison Versus In Boosting algorithms each classifier is trained on data, taking into account the previous classifiers’ success. After each training step, the weights are redistributed. Misclassified data increases its weights to emphasise the most difficult cases. In this way, subsequent learners will focus on them during their training.

How does the classification stage work?

To predict the class of new data we only need to apply the N learners to the new observations. In Bagging the result is obtained by averaging the responses of the N learners (or majority vote). However, Boosting assigns a second set of weights, this time for the N classifiers, in order to take a weighted average of their estimates. Single Bagging and Boosting Single estimate Simple average Weighted average Algorithm Comparison Versus In the Boosting training stage, the algorithm allocates weights to each resulting model. A learner with good a classification result on the training data will be assigned a higher weight than a poor one. So when evaluating a new learner, Boosting needs to keep track of learners’ errors, too. Let’s see the differences in the procedures: Single Bagging and Boosting Training stage Train and keep Train and evaluate Update sample weights Update learners weights Algorithm Comparison Versus

Some of the Boosting techniques include an extra-condition to keep or discard a single learner. For example, in AdaBoost, the most renowned, an error less than 50% is required to maintain the model; otherwise, the iteration is repeated until achieving a learner better than a random guess.

The previous image shows the general process of a Boosting method, but several alternatives exist with different ways to determine the weights to use in the next training step and in the classification stage. Click here if you like to go into detail: AdaBoost, LPBoost, XGBoost, GradientBoost, BrownBoost.

Which is the best, Bagging or Boosting?

There’s not an outright winner; it depends on the data, the simulation, and the circumstances.
Bagging and Boosting decrease the variance of your single estimate as they combine several estimates from different models. So the result may be a model with higher stability.

If the problem is that the single model gets a very low performance, Bagging will rarely get a better bias. However, Boosting could generate a combined model with lower errors as it optimizes the advantages and reduces pitfalls of the single model.

By contrast, if the difficulty of the single model is over-fitting, then Bagging is the best option. Boosting for its part doesn’t help to avoid over-fitting; in fact, this technique is faced with this problem itself. For this reason, Bagging is effective more often than Boosting.

To sum up:

Similarities		Differences
Both are ensemble methods to get N learners from 1 learner…		… but, while they are built independently for Bagging, Boosting tries to add new models that do well where previous models fail.
Both generate several training data sets by random sampling…		… but only Boosting determines weights for the data to tip the scales in favor of the most difficult cases.
Both make the final decision by averaging the N learners (or taking the majority of them)…		… but it is an equally weighted average for Bagging and a weighted average for Boosting, more weight to those with better performance on training data.
Both are good at reducing variance and provide higher stability…		… but only Boosting tries to reduce bias. On the other hand, Bagging may solve the over-fitting problem, while Boosting can increase it.

reference: https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/

AutoML: What & Comparison & Concerns

Peng Liu June 10, 2018

What is AutoML?

Automated Machine Learning provides methods and processes to make Machine Learning available for non-Machine Learning experts, to improve efficiency of Machine Learning and to accelerate research on Machine Learning.

Machine learning (ML) has achieved considerable successes in recent years and an ever-growing number of disciplines rely on it. However, this success crucially relies on human machine learning experts to perform the following tasks:

Preprocess and clean the data.
Select and construct appropriate features.
Select an appropriate model family.
Optimize model hyperparameters.
Postprocess machine learning models.
Critically analyze the results obtained.

As the complexity of these tasks is often beyond non-ML-experts, the rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge. We call the resulting research area that targets progressive automation of machine learning AutoML.

Reference: http://www.ml4aad.org/automl/

AutoML comparison

Automatic Machine Learning (autoML) is a process of building Machine Learning models by algorithm with no human intervention. There are several autoML packages available for building predictive models:

Datasets

In this post we compare three autoML packages (auto-sklearn, h2o and mljar). The comparison is performed on binary classification task on 28 datasets from openml. Datasets are described below.

Methodology

Each dataset was divided into train and test sets (70% of samples for training and 30% of samples for testing). Packages were tested on the same data splits.
The autoML model was trained on train set, with 1 hour limit for training time.
Final autoML model was used to compute predictions on test set (on samples not used for training).
The logloss was used to assess performance of the model (the lower logloss the better model). The logloss was selected because is more accurate than accuracy metric.
The process was repeated 10 times (with different seeds used for splits). Final results are average over 10 repetition.

Results

The results are presented in table and chart below. The best approach for each dataset is bolded.

The average logloss for each method on test subset of data, computed with 10 times repetition.

AutoML packages comparison (the lower logloss the better algorithm).

Discussion

The poor performance of auto-sklearn algorithm can be explained with 1 hour limit for training time. Auto-sklearn is using bayesian optimization for hyper parameters tuning which has sequential nature and requires many iterations to find good solution. The 1 hour training limit was selected from business perspective — in my opinion, user that is going to use autoML package prefers to wait 1 hour than 72 hours for result. The h2o results compared to auto-sklearn are better on almost all datasets.

The best results were obtained by mljar package — it was the best algorithm on 26 from 28 datasets. On average it was by 47.15% better than auto-sklearn and 13.31% better than h2o autoML solution.

The useful feature of mljar is user interface, so all models after the optimization are available through web browser (mljar is saving all models obtained during optimization).

The view with all models trained during optimization.

The details of selected model. The information about used hyper parameters and learning curve for train and test folds is available.

The code used for comparison is available at github

The mljar package can be used in python or R or by web browser.

View at Medium.com

Reference: https://medium.com/@MLJARofficial/automl-comparison-4b01229fae5e

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/welcome.html

AutoML Concerns

The rise of generally intelligent AI

Much of the AI in the world today was made to accomplish a single, narrow use case like the translation of a sentence from one language to another, but Dean said he wants Google to create more AI models that can achieve multiple tasks and achieve a kind of “common sense reasoning about the world.”

“I think in the future you’re going to see us move more towards models that can do many, many things and then build on that experience of doing those many, so that when we want to train a model to do something else, it can build on that set of skills and expertise that it already has,” he said.

For example, if a robot is asked to pick something up, it will understand things like how a hand works, how gravity works, and other understandings about the world.

“I think that’s going to be an important trend that you’ll see in the next few years,” he said.

AutoML’s bias and opacity challenges

Depending on whom you ask, AutoML, Google’s AI that can create other AI models is either exciting or terrifying.

Machines that train machines surely frighten AI naysayers. But AutoML, said Google Cloud chief scientist Fei-Fei Li, lowers barriers to creating custom AI models for everyone from high-end developers to a ramen shop owner in Tokyo.

Dean finds it exciting because it’s helping Google “automatically solve problems,” but the use of AutoML also presents unique issues.

“Because we’re using more learned systems than traditional sort of hand-coded software, I think that raises a lot of challenges for us that we’re tackling,” he said. “So one is if you learn from data and that data has biased decisions in it already, then the machine learning models who learn can themselves perpetuate those biases. And so there’s a lot of work that we’re doing, and others in the machine learning community, to figure out how we can train machine learning models that don’t have forms of bias.”

Another challenge: how to properly design safety-critical systems with AutoML to create AI for industries like health care. Decades of computer science best practices have been established for hand-coding such systems, and the same must be done for machines making machines.

It’s one thing to get something wrong when you’re classifying the species of a dog, Dean said; it’s another thing entirely to make mistakes in safety-critical systems.

“I think that’s a really interesting and important direction for us to apply, particularly as we start to get machine learning in more safety-critical kinds of systems, things that are making decisions about your health care or an autonomous car,” he said.

Safety-critical AI needs more transparency

Together with news that Google Assistant will soon make phone calls for you and the release of Android P in beta, on Tuesday CEO Sundar Pichai talked about how Google is applying AI to health care to predict the readmission of patients based on information drawn from electronic health records.

An article by Google researchers published Tuesday in the Nature of Digital Medicine explains examples of why its AI made certain decisions about a patient so that doctors could see the reasoning behind a recommendation in medical records. In the future, Dean hopes a developer or doctor who wants to know why an AI made a specific decision will be able to simply ask the AI model and get a response.

Today, the implementation of AI in Google products goes through an internal review process, Dean said. Google is currently developing a set of guidelines for how to assess whether or not an AI model contains bias.

“What you want is essentially, just like security review or privacy review for new features in products, you want an ML fairness review that’s part of integrating machine learning into our products,” he said.

Humans should also be part of the decision-making process, Dean said, when it comes to AI implemented by developers through tools like ML Kit or TensorFlow, which has been downloaded more than 13 million times.

Drawing the line at AI weaponry

In response to a question, Dean said he does not believe Google should be in the business of making autonomous weaponry.

In March, news broke that Google was working with the Department of Defense to improve its analysis of footage gathered by drones.

“I think there are a number of interesting ethical questions about machine learning and AI as we as a society start to develop more powerful techniques,” he said. “I personally have signed a letter, an open letter about six or nine months ago — don’t know exactly when — saying that I was opposed to using machine learning for autonomous weapons. I think obviously there’s a continuum of what decisions we want to make as a company, so should we offer Gmail to military services that want to use it? That seems fine to me. I think most people have qualms about using autonomous weapons systems.”

Thousands of Google employees, according to the New York Times, have signed a letter that states Google should stay out of the creation of “warfare technology” could cause irreparable damage to Google’s brand and trust between the company and the public. Dean did not specify if he signed the letter referenced in the New York Times reporting.

AI drives new projects and products

Alongside patient readmission AI and a Gboard designed to understand Morse code, Pichai also highlighted a previously released study of AI that accurately detected diabetic retinopathy and predicted problems, as well as highly trained ophthalmologists, did.

AI models with that level of intelligence are beginning to do more than imitate human activity They’re helping Google discover new products and services.

“By training these models on large amounts of data, we can actually make systems that can do things that we didn’t know we could do, and that’s a really fundamental advance,” Dean said. “We’re now creating entirely new kinds of tests and products proven by AI, rather than using AI to do things we think we want to be able to do but just need the training system.”

Reference: https://venturebeat.com/2018/05/09/googles-ai-chief-on-automl-autonomous-weapons-and-the-future/