Menu Close

AmbientGAN: Generative models from lossy measurements


How can we train our generative models with partial, noisy observations”

Why do we care?

In many settings, it is expensive or even impossible to obtain fully-observed samples, but economical to obtain partial, noisy samples.

Proposes in this paper:

  • AmbientGAN: train the discriminator not on the raw day domain but on the measurement domain
  • Propose the way to train the generative model with a noisy, corrupted, or missing data without any clean images
  • Prove that it is theoretically possible to recover the original true data distribution even though the measurement process is not invertible






Generative Adversarial Networks


Limitation of GAN

  • Require Good (or fully observed ) training samples

Related work:

  • Compressed sensing attempts to address this problem by exploiting the models of the data structure, sparsity
  • Bora et al. ICML 2017 “Compressed Sensing using Generative Models”
  • Compressed sensing is a very promising study and can give amazing results (need to go deeper)

Chicken and Egg

  • they have proposed it is possible to solve the problem with a small number of measurements by using Generative models
  • what if it is even not possible to gather the good data in the first place?
  • How can we collect enough data to train a generative model to start with?


















  • it is possible to train the generator without fully-observed data
  • In theory, it is possible to find the true distribution by training the generator when the measurement process is invertible and differentiable
  • Empirically, it is possible to recover the good data distribution even though the measurement process is not clearly known.

Possible Applications?

  • OCR
  • Gayoung’s webtoon data
  • Adding Reconstructionloss and Cyclic loss
  • Learnable f(.) by FC
  • etc.

Qestions to consider:

  • Cycle-GAN v.s Ambient-GAN?




Benchmarking Neural Network Robustness To Common Corruptions and Perturbations



This paper observes that a major flaw in common image-classification networks is their lack of robustness to common corruptions and perturbations. The authors develop and publish two variants of the ImageNet validation dataset, one for corruptions and one for perturbations. They then propose metrics for evaluating several common networks on their new datasets and find that robustness has not improved much from AlexNet to ResNet. They do, however, find several ways to improve performance including using larger networks, using ResNeXt, and using adversarial logit pairing.

Quality: The datasets and metrics are very thoroughly treated, and are the key contribution of the paper.

Some questions: What happens if you combine ResNeXt with ALP or histogram equalization? Or any other combinations? Is ALP equally beneficial across all networks? Are there other useful adversarial defenses?

Clarity: The novel validation sets and reasoning for them are well-explained, as are the evaluation metrics. Some explanation of adversarial logit pairing would be welcome, and some intuition (or speculation) as to why it is so effective at improving robustness.

Originality: Although adversarial robustness is a relatively popular subject, I am not aware of any other work presenting datasets of corrupted/perturbed images.

Significance: The paper highlights a significant weakness in many image-classification networks, provides a benchmark, and identifies ways to improve robustness. It would be improved by more thorough testing, but that is less important than the dataset, metrics and basic benchmarking provided.

Question: Why do authors do not recommend training on the new datasets?

Siamese neural networks

Siamese neural network is a class of neural network architectures that contain two or more identical subnetworksidentical here means they have the same configuration with the same parameters and weights. Parameter updating is mirrored across both subnetworks.

Siamese NNs are popular among tasks that involve finding similarity or arelationship between two comparable things. Some examples are paraphrase scoring, where the inputs are two sentences and the output is a score of how similar they are; or signature verification, where figure out whether two signatures are from the same person. Generally, in such tasks, two identical subnetworks are used to process the two inputs, and another module will take their outputs and produce the final output. The picture below is from Bromley et al (1993)[1]. They proposed a Siamese architecture for the signature verification task.

Siamese architectures are good in these tasks because

  1. Sharing weights across subnetworks means fewer parameters to train for, which in turn means less data required and less tendency to overfit.
  2. Each subnetwork essentially produces a representation of its input. (“Signature Feature Vector” in the picture.) If your inputs are of the same kind, like matching two sentences or matching two pictures, it makes sense to use similar model to process similar inputs. This way you have representation vectors with the same semantics, making them easier to compare.

In Question Answering, some recent studies have used Siamese architectures to score relevance between a question and an answer candidate[2]. So one input is a question sentence, the other input is an answer, and the output is how relevant is the answer to the question. Questions and answers don’t look exactly the same, but if the goal is to extract the similarity or a connection between them, a Siamese architecture can work well, too.





In my own experience, Siamese Networks may offer 3 distinct advantages over Traditional CLASSIFICATION!

These advantages are somewhat true for any kind of data, and not just for Images (where these are currently most popularly used).


Let’s say we want to learn to predict what animal is in a given image.

  • Case 1 : if it is just 2 animal classes to predict from (Cat vs Dogs) and given millions of images of each class, one could train a deep CNN Classifier. Easy!
  • Case 2 : but what if we have tens of thousands of animal classes and for most of these, we only have a few dozens of image examples? Trying to learn each animal as a Class using deep CNN seems less feasible now. Such a classifier can perform poorly for rarely seen training class e.g. let’s say there were only 4 training images of ‘eels’

Siamese Network is a Model Architecture used alongside a Distance-based Loss.

  • It learns what makes 2 pair of inputs the same (e.g. dog-dog, eel-eel).
  • In Comparison, Classification learns what makes an input a dog/ cat/ eel etc.

Advantages of such learning can be:

  1. MORE ROBUST TO CLASS IMBALANCE. If the model has learnt well what makes any 2 animals the same, one example of a class like ‘eel’ in training may be sufficient to predict / recognize an eel in future. This is amazing! See One-Shot learning
  2. NICE TO ENSEMBLE WITH BEST CLASSIFIER. Given that its learning mechanism is somewhat different from Classification, simple averaging of it with a Classifier can do much better than averaging 2 correlated Supervised models (e.g. GBM & RF classifier). I have experienced it personally.
  3. BETTER EMBEDDINGS. Siamese focus on learning embeddings (in deeper layer) that place same classes / concepts close together. Hence, can learn semantic similarity.
    • This is different from Classification Loss (e.g. logistic loss) which is explicitly rewarded only to make the classes linearly separable.
    • This makes its embeddings more useful in a generic sense e.g. one can calculate distance on it. For example, one could use its last-layer embeddings to build a ‘search-by-image’ app
    • Images below shows the MNIST Embeddings that i got by training:
      • Classifier with 3 Hidden layers (size 200–100–2 ) & Softmax loss
      • Siamese Architecture with same network & Distance Loss.
      • I plot as embeddings the output of their 3rd Hidden layer on Test Images. Clearly Siamese Embeddings are not only linearly separable but also fit for distance-calculation.

Downside can be:

  • Training involves Pairwise Learning => quadratic pairs to learn from (in order to see all information available) => slower than Classification (pointwise learning)
  • Prediction can add a few HyperParameters and can be slightly slower. It does not readily output Class probabilities, but distances from each Class.



abbr. 超文本传输协议安全(Hyper Text Transfer Protocol)
android https: 通信安全