DeepMind: Neural Processes

Peng Liu October 16, 2018

What about

More and more works have been proposed to overcome the limitations around deep learning. This work is to

improve the flexibility of the testing phase of a deep learning model by combining the advantage of Gaussian Process(GP).
on the contrary, deep learning can learn a kernel function from data observations automatically, which can be directly used by GP as well.

Details

(refer to https://wemedia.ifeng.com/68012220/wemedia.shtml)

Function approximation is the core of many problems in machine learning. DeepMind’s latest research combines the advantages of neural networks and stochastic processes and proposes a neural process model that achieves good performance and high computational efficiency on multitasking.

Paper: https://arxiv.org/pdf/1807.01622.pdf

Function approximation is at the heart of many problems in machine learning. A very popular method of this problem over the past decade has been deep neural networks. Advanced neural networks consist of black box function approximators that learn to parameterize a single function from a large number of training data points. As a result, most of the workload on the network falls into the training phase, while the evaluation and testing phases are simplified to fast forward propagation. While high test time performance is valuable for many practical applications, the output of the network cannot be updated after training, which may be undesirable. For example, meta-learning is an increasingly popular area of research that addresses this limitation.

As an alternative to neural networks, random processes can also be reasoned to perform function regression. The most common example of this method is the Gaussian process (GP), which is a neural network model with complementary properties: GP does not require an expensive training phase, and can perform potential ground truth functions based on certain observations. Inferred, this makes them very flexible when tested.

In addition, the GP represents an infinite number of different functions in unobserved positions, so it can capture the uncertainty of its prediction given some observations. However, GP is computationally expensive: the original GP is the scale of the 3rd order data point, and the current optimal approximation method is the quadratic approximation. In addition, the available kernels are usually limited in their form of functionality, requiring an additional optimization process to determine the most appropriate kernel and its hyperparameters for any given task.

Therefore, the combination of neural network and stochastic process reasoning to make up for some of the shortcomings of the two methods, as a potential solution, is getting more and more attention. In this work, the team of DeepMind research scientist Marta Garnelo et al. proposed a method based on neural networks and learning stochastic process approximation, which they call Neural Processes (NPs). NP has some basic properties of GP, that is, they learn to model the distribution on top of functions, can estimate the uncertainty of their prediction according to the observation of the context, and transfer some work from training to test time to achieve the flexibility of the model.

More importantly, NP generates predictions in a very computationally efficient manner. Given n context points and m target points, the reasoning of a trained NP corresponds to the forward transfer of a deep neural network, which

Scale, not like the classic GP

. In addition, the model overcomes many of the functional design constraints by learning the implicit kernel directly from the data.

The main contributions of this research are:

Neural Processes, a model that combines the advantages of neural networks and stochastic processes.

We compare neural processes (NP) with meta-learning, deep latent variable models, and Gaussian processes. Given that NP is relevant in many areas, they allow comparisons between many related topics.

We demonstrate the advantages and capabilities of NP by applying NP to a range of tasks, including one-dimensional regression, true image completion, Bayesian optimization, and contextual bandits.

Neural process model

Figure 1: Neural process model

(a) The graph model of the neural process, x and y correspond to the data of y = f(x), C and T respectively represent the number of context points and target points, and z represents the global latent variable. A gray background indicates that the variable was observed.

(b) Schematic diagram of the implementation of the neural process. The variables in the circle correspond to the variables in the model of (a), the variables in the box represent the intermediate representation of NP, and the bold letters represent the following calculation modules: h – encoder, a – aggregator and g – decoder. In our implementation, h and g correspond to a neural network and a corresponds to a mean function. The solid line indicates the generation process and the dashed line indicates the reasoning process.

In our NP implementation, we provide two additional requirements: the order of the context points and the invariance of computational efficiency.

The final model can be summarized as the following three core components (see Figure 1b):

From the input space to the encoder of the representation space, the inputs are in pairs

Context value and generate a representation for each pair

. We parameterize h as a neural network.

Aggregator a, which summarizes the input of the encoder.

Conditional decoder g, which samples the global latent variable z and the new target location

As input and corresponding

Value output prediction

Figure 2: Graph model for correlation model (a-c) and neural process (d). Gray shading indicates that the variable was observed. C represents the context variable, and T represents the target variable, which is the variable to be predicted given C.

result

Figure 4. Pixelated regression on MNIST and CelebA

The diagram on the left shows that an image completes pixelation can be framed as a 2-D regression task, where f (pixel coordinates) = pixel brightness. The diagram on the right shows the results of the image implementation of MNIST and CelebA. The top image corresponds to the context node provided to the model. In order to be able to show more clearly, the unobserved points are marked in blue and white in MNIST and CelebA, respectively. In the case of a given text node, each row corresponds to a different sample. As the text nodes increase, the predicted pixels become closer to the underlying pixels, and the variance between the samples gradually decreases.

Figure 5. Thompson sampling of 1-D objective functions using neural processes

These figures show the process of 5 iterative optimizations. Each prediction function (blue) is drawn by sampling a latent variable, the condition of which is to increase the number of text nodes (black). The underlying ground truth function is represented as a black dotted line. The red triangle represents the next evaluation point, which corresponds to the minimum of the extracted NP curve. The red circle in the next iteration corresponds to this evaluation point, and its underlying ground truth refers to a new text node that will be the NP.

Table 1. Bayesian optimization using Thompson sampling

The average of the optimization steps needs to reach the global minimum of the 1-D function generated by the Gaussian process. These values are normalized by the number of steps taken by a random search. The performance of a Gaussian process using the appropriate kernel is equivalent to the upper limit of performance.

Table 2. Results of the wheel bandit problem after increasing the delta value

The results represent the mean and standard error of more than 100 cumulative and simple regrets. The result normalizes the performance of a uniform agent.

Discuss

We introduced a set of models that combine the advantages of stochastic processes and neural networks, called neural processes. NPs learn to represent distributions on functions, and make flexible predictions based on some text input during testing. NPs don’t need to write the kernel themselves, but learn the implicit measure directly from the data.

We apply NPs to some column regression tasks to demonstrate their flexibility. The purpose of this paper is to introduce NPs and compare it to the research currently underway. Therefore, the task we present is that although there are many types, the dimensions are relatively low. Extending NPs to higher dimensions can significantly reduce computational complexity and data driven representations.

Limitations of this work

How can we assure NPs have the equal performance to pure deep learning models?
What real application cases are better of using NPs rather than deep learning?

Expecting more future works are going deeper.

【Refer】Medical Imaging Meets NIPS: A summary

Peng Liu October 2, 2018

This year I attended and presented a poster at the Medical Imaging Meets NIPs workshop. The workshop focused on bringing together professionals from both the medical imaging and machine learning communities. Altogether there were eleven talks and two poster sessions. Here I’m going to recap some of the highlights of the workshop. Presentations and posters generally discussed segmentation, classification, and/or image reconstruction.

Segmentation

Before coming to this workshop I must admit that I did not fully understand the value of image segmentation. My thought process was always something along the lines: why would you just want to outline something in an image and not also classify it? This workshop changed my view on the value of segmentation.

Radiation Therapy

Raj Jena, a radiologist at Cambridge University and a Microsoft researcher, gave a presentation on “ Pixel Perfectionism — Machine learning and Adaptive Radiation Therapy.” In the talk he described how machine learning could help provide better treatments and optimize workflows. In order for patients to receive proper radiation therapy it is important to pinpoint the exact boundary of the tumor. By locating the exact boundary between the tumor and healthy tissue, treatments can deliver more radiation as there is less risk of damaging healthy tissue. However, currently, segmentation is done manually by radiologists. This often causes discrepancies between different radiologists, which can noticeably affect treatment results. Consistency is also important in gauging the effectiveness of drugs used in combination with radiation because if the radiation is not the same across the patients it is nearly impossible to tell if improvements are caused by the drug or better radiation.

Machine learning offers the opportunity to provide consistency and more accurate segmentation. Secondly, machine learning models can often run in seconds whereas radiologists often take several hours to manually segment images. This time can be better spent plotting the course of treatment or seeing additional patients. Jena also described how machine learning could allow him to become a “super radiation oncologist.”

Slide from Jena’s talk. The “Super Radiation Oncologist” uses machine learning to constantly adapt therapy and predict effects of treatment.

Slide from Jena’s talk. Details adaptive radiation therapy.

ML can enable oncologists to both better adapt treatments to changes in the shape and size of healthy tissues and to help oncologists predict possible adverse affects of radiation therapy. For instance, Jena described how he is using simple methods such as Gaussian processes to predict potential side effects of radiation.

This was one of my favorite talks of the entire workshop and I urge you to check out Jena’s full presentation.

Building quality datasets

A common theme throughout the workshop was the quality of annotations and the difficulty building good medical imaging datasets. This is particularly true in segmentation tasks where a model can only be as good as its annotators and the annotators must be skilled radiologists themselves.

One possible way to accelerate the annotation process is through active learning. Tanveer Syeda-Mahmood of IBM briefly brought this up when discussing IBM’s work on radiology. With active learning one might start with a small labeled dataset and several expert human annotators. The ML algorithm learns the training set well enough so that it can annotate easy images itself and the experts annotate the hard edge cases. Specifically, images that the classifier scores below a threshold of certainty are then sent to humans to manually annotate. One of the posters (by Girro et al) also discussed using active learning to help effectively train a semantic image segmentation network.

Active learning may address part of the problem, however it does not entirely solve the quality issue. The central question is how can researchers develop a accurate dataset when even the experts disagree on boundaries. With respect to this point, Bjorne Menze presented on the construction of the BraTS dataset. The BraTS dataset is one of the largest brain imaging datasets. He fused data from several different annotators in order to create the “ground truth.” Since its creation BraTS has held several different challenges. One of the challenges involved segmenting all the tumors with machine learning algorithms and the most recent focused on predicting overall survival.

Localization, detection, and classification

Accurately classifying diseases found in medical images was a prominent topic at the workshop. Detecting objects/ROIs and accurately classifying them is a challenging task in medical imaging. This is largely due to the variety of modalities (and dimensions) of medical images (i.e. X-Ray MRI, CT, Ultrasound, and Sonogram), the size of the images, and (as with segmentation) limited annotated (and sometimes low quality) training data. As such, presenters showcased an interesting variety of techniques for overcoming these obstacles.

Ivana Igsum discussed deep learning techniques in cardiac imaging. In particular, she described her work in accurately detecting calcification in arteries. She described how her and her team developed methods to automatically score calcium and categorize cardiovascular disease risk. To do this her team used a multi-layer CNN approach.

Later in the day, Yaroslav Nikulin presented on the winning approach from the digital mammography challenge.

Posters

Natalia Antropova, Benjamin Huynh and Maryellen Giger of the University of Chicago had an interesting poster on using an LSTM to perform breast DCI-MRI classification. This involved inputting 3d MRI images from multiple time steps after a contrast dye was applied. They then extracted features from these images using a CNN which they fed to the LSTM and outputted a prediction. Altogether this poster provided an interesting application of a LSTM (and CNN) to handle “4d” medical imaging data.

My poster focused on my current work in progress with respect to using object detectors to accurately localize and classify multiple conditions in Chest X-Rays. My major goals are to investigate how well object detection algorithms perform on a limited dataset as opposed to multi-label classification CNNs trained on the entire dataset. I think object detectors have a lot of potential at localizing and classifying diseases/conditions in medical images if configured and trained properly, however they are limited by the shortage of labeled bounding box data, which is one of the reasons I found the following poster very interesting.

Hiba Chougrad and Hamid Zouaki had an interesting poster on transfer learning for breast imaging classification. In the abstract Convolutional Neural Networks for Breast Cancer Screening: Transfer Learning with Exponential Decay, they described testing several different transfer learning methods. For example, they compared fine-tuning and utilizing a CNN pretrained on image net with randomly initializing weights. In the end, they discovered the optimal technique was to use an exponentially decaying learning rate to fine tune the layers. So for the bottom layers (i.e., the ones closest to the softmax), the learning rate would be the highest and for the upper layers, the learning rate would be the lowest. This intuitively makes a lot of sense as the bottom layers tend to learn the most dataset relevant features. By using these and related techniques we can (hopefully) develop accurate models without having large datasets.

Reconstruction and generation

Heartflow

I usually am not impressed with industrial pitches touting how great their product is and how it will “revolutionize [insert industry].” However, Heartflow and their DeepLumen blood vessel segmentation algorithm definitely impressed me. The product reduced unnecessary angiograms by 83% and is FDA approved. I will not go into extended detail here, but I think Heartflow is a good example of machine learning having an impact in a real-world environment.

Two of the other presenters also touched on reconstruction as well. Igsum’s talk (previously mentioned) discussed a method for constructing a routine CT from a low dose CT. Daniel Rueckert of Imperial College described how ML-based reconstruction could enable more time stamps in imaging.

Posters

One of the poster that I found particularly interesting was MR-to-CT Synthesis using Cycle-Consistent Generative Adversarial Networks. In this work the authors (Wolterink et al.) take the popular CycleGAN algorithm and used it to convert MRI images into CT images. This is a potentially very useful application that could prevent patients from having to have multiple imaging procedures. Additionally, CTs expose patients to radiation so this could also potentially reduce radiation exposure.

There were also a posters by Virdi et al. on Synthetic Medical Images from Dual Generative Adversarial Networks and Mardani et al. on Deep Generative Adversarial Networks for Compressed Sensing (GANCS) Automates MRI.

Tools and platforms

Several speakers spoke about new tools aimed at making medical image analysis with machine learning more accessible to both clinicians and ML researchers. Jorge Cardoso described NiftyNet and how it enables researchers to develop medical imaging models more easily. NiftyNet is built on Tensorflow and it includes many simple to use modules for loading high dimensional inputs.

Poster from Makkie et al. on their neuroimaging platform

Also on the tools side, G. Varoquax presented on NILearn,a Python module for Neuro-Imaging data, built on top of scikit-learn. Just as scikit-learn seeks to make ML accessible to people with basic programming skills, the goal of NILearn is to do the same with brain imaging. The only systems related poster was from Makkie and Liu from the University of Georgia. It focused on their brain initiative platform for neuro-imaging and how it fused several different technologies including Spark, AWS and Tensorflow. Finally, the DLTK toolkit had their own poster at the conference. Altogether there were some really interesting toolkits and platforms that should help make medical image analysis with machine learning more accessible to everyone.

Other talks

Wiro Nessen had an interesting presentation on the confluence of biomedical imaging and genetic data. In the talk he described how both large genetic and imaging datasets could be combined to try to detect bio-markers in the images. The synthesis of the two areas could also help detect diseases early and figure out more targeted treatments.

Announcements

At the end of her talk, Ivana Igsum announced that the first Deep Learning in Medical Imaging or MIDL event is taking place in Amsterdam in July.
I’m continuously adding new papers, conferences, and tools to my machine learning healthcare curated list. However, it’s a big job so make a PR and contribute today!
I’m starting a new Slack channel dedicated to machine learning in healthcare. So if you are interested feel free to join.
My summary of Machine Learning for Healthcare or ML4H at NIPs will be out in the next few weeks. This will cover everything from hospital operations (like LOS forecasting and hand hygiene), to mining electronic medical records, to drug discovery, to analyzing genomic data. So stay tuned.

How to Successfully Incorporate Undergraduate Researchers Into a Complex Research Program at a Large Institution from Rebecca

Peng Liu September 30, 2018

Rebecca B. Weldon & Valerie F. Reyna

Human Neuroscience Institute, Department of Human Development, Cornell University, Ithaca, NY 14850.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4521737/

Working as a team is always better than yourself. Recently, I mentioned this though to some excellent Ph.D. students, and the main goal was to expect to work with them and have more work and larger impact produced to our academic society. Unfuntunetly, as we all know, all Ph.D. researchers are too busy with their own tasks to be able to provide more effort to any more additional work. In addition, there are some research funds concerns that may also restrict our cooperation. In this article, the authors proposed the importance of cooperative research and how to successfully incorporate undergraduate researchers into a complex research program at a large institution. I believe this article is going to help a lot for all Ph.D. student and any researcher who wanna have more and larger impact works done effectively.

Initial screening of potential undergraduate research assistants: Making sure it is a good fit

The first point is to make sure the potential undergraduate research assistant is a good fit for your team. The very first step in recruiting is to find the students who are really interested in being a part of scientific research, and then, we should send an initial screening survey to any interested students, which is usually including basic questions about the student and also some questions about career ambitions and extracurricular activities. We need to know why the student thinks he or she will be a good fit for one (or more) of our research teams. To get a better sense for whether this student is a good match for the lab, the next step is to let our graduate students have a talk with the undergraduate student. Lastly, we recommend the student to the directory of our lab for a final interview.