Menu Close

Siamese neural networks

Siamese neural network is a class of neural network architectures that contain two or more identical subnetworksidentical here means they have the same configuration with the same parameters and weights. Parameter updating is mirrored across both subnetworks.

Siamese NNs are popular among tasks that involve finding similarity or arelationship between two comparable things. Some examples are paraphrase scoring, where the inputs are two sentences and the output is a score of how similar they are; or signature verification, where figure out whether two signatures are from the same person. Generally, in such tasks, two identical subnetworks are used to process the two inputs, and another module will take their outputs and produce the final output. The picture below is from Bromley et al (1993)[1]. They proposed a Siamese architecture for the signature verification task.

Siamese architectures are good in these tasks because

  1. Sharing weights across subnetworks means fewer parameters to train for, which in turn means less data required and less tendency to overfit.
  2. Each subnetwork essentially produces a representation of its input. (“Signature Feature Vector” in the picture.) If your inputs are of the same kind, like matching two sentences or matching two pictures, it makes sense to use similar model to process similar inputs. This way you have representation vectors with the same semantics, making them easier to compare.

In Question Answering, some recent studies have used Siamese architectures to score relevance between a question and an answer candidate[2]. So one input is a question sentence, the other input is an answer, and the output is how relevant is the answer to the question. Questions and answers don’t look exactly the same, but if the goal is to extract the similarity or a connection between them, a Siamese architecture can work well, too.

Footnotes

[1] https://papers.nips.cc/paper/769…

[2] http://arxiv.org/pdf/1512.05193v…

 

In my own experience, Siamese Networks may offer 3 distinct advantages over Traditional CLASSIFICATION!

These advantages are somewhat true for any kind of data, and not just for Images (where these are currently most popularly used).

  1. CAN BE MORE ROBUST TO EXTREME CLASS IMBALANCE.
  2. CAN BE GOOD TO ENSEMBLE WITH A CLASSIFIER.
  3. CAN YIELD BETTER EMBEDDINGS.

Let’s say we want to learn to predict what animal is in a given image.

  • Case 1 : if it is just 2 animal classes to predict from (Cat vs Dogs) and given millions of images of each class, one could train a deep CNN Classifier. Easy!
  • Case 2 : but what if we have tens of thousands of animal classes and for most of these, we only have a few dozens of image examples? Trying to learn each animal as a Class using deep CNN seems less feasible now. Such a classifier can perform poorly for rarely seen training class e.g. let’s say there were only 4 training images of ‘eels’

Siamese Network is a Model Architecture used alongside a Distance-based Loss.

  • It learns what makes 2 pair of inputs the same (e.g. dog-dog, eel-eel).
  • In Comparison, Classification learns what makes an input a dog/ cat/ eel etc.

Advantages of such learning can be:

  1. MORE ROBUST TO CLASS IMBALANCE. If the model has learnt well what makes any 2 animals the same, one example of a class like ‘eel’ in training may be sufficient to predict / recognize an eel in future. This is amazing! See One-Shot learning
  2. NICE TO ENSEMBLE WITH BEST CLASSIFIER. Given that its learning mechanism is somewhat different from Classification, simple averaging of it with a Classifier can do much better than averaging 2 correlated Supervised models (e.g. GBM & RF classifier). I have experienced it personally.
  3. BETTER EMBEDDINGS. Siamese focus on learning embeddings (in deeper layer) that place same classes / concepts close together. Hence, can learn semantic similarity.
    • This is different from Classification Loss (e.g. logistic loss) which is explicitly rewarded only to make the classes linearly separable.
    • This makes its embeddings more useful in a generic sense e.g. one can calculate distance on it. For example, one could use its last-layer embeddings to build a ‘search-by-image’ app
    • Images below shows the MNIST Embeddings that i got by training:
      • Classifier with 3 Hidden layers (size 200–100–2 ) & Softmax loss
      • Siamese Architecture with same network & Distance Loss.
      • I plot as embeddings the output of their 3rd Hidden layer on Test Images. Clearly Siamese Embeddings are not only linearly separable but also fit for distance-calculation.

Downside can be:

  • Training involves Pairwise Learning => quadratic pairs to learn from (in order to see all information available) => slower than Classification (pointwise learning)
  • Prediction can add a few HyperParameters and can be slightly slower. It does not readily output Class probabilities, but distances from each Class.

Reference: https://www.quora.com/What-are-Siamese-neural-networks-what-applications-are-they-good-for-and-why

Paper:        http://www.cs.utoronto.ca/~gkoch/files/msc-thesis.pdf

基本翻译
abbr. 超文本传输协议安全(Hyper Text Transfer Protocol)
网络释义
HTTPS: HTTP Secure
android https: 通信安全