machine-learning

Representation-of-image-VS--word

machine-learning

GitHub

Home
AI-meeting
AI-papers
AI-papers
- Introduction
- Reading-record
Research-Institution&Researcher
Research-Institution&Researcher
- Researcher
- Research-institution
Theory
Theory
- Introduction
- Deep-learning
  Deep-learning
  - Introduction
  - Book-deep-learning
    Book-deep-learning
    
    Introduction
    
    Part-I-Applied-Math-and-Machine-Learning-Basics
    Part-I-Applied-Math-and-Machine-Learning-Basics
    
    1-Introduction
    
    5-Machine-Learning-Basics
    5-Machine-Learning-Basics
    
    5-Machine-Learning-Basics
    
    5.1-Learning-Algorithms
    
    Task
    Task
    
    Pattern-recognition
    
    Sequence-labeling
    Sequence-labeling
    
    Sequence-labeling
    
    Sequence-Labeling-Generative-and-Discriminative
    
    5.2-Capacity-Overfitting-and-Underfitting
    
    5.3-Hyperparameters-and-Validation-Sets
    5.3-Hyperparameters-and-Validation-Sets
    
    Hyperparameter(machine-learning)
    
    Cross-validation(statistics)
    
    5.4-Estimators-Bias-and-Variance
    5.4-Estimators-Bias-and-Variance
    
    5.4-Estimators-Bias-and-Variance
    
    Bias–variance tradeoff
    
    5.5-Maximum-Likelihood-Estimation
    5.5-Maximum-Likelihood-Estimation
    
    5.5-Maximum-Likelihood-Estimation
    
    Part-II-Deep-Networks-Modern-Practices
    Part-II-Deep-Networks-Modern-Practices
    
    Part-II-Deep-Networksb-Modern-Practice
    
    6-Deep-Feedforward-Networks
    6-Deep-Feedforward-Networks
    
    6-Deep-Feedforward-Network
    
    6.2-Gradient-Based-Learning
    
    6.4-Architecture-Design
    6.4-Architecture-Design
    
    6.4-Architecture-Design
    
    Universal-approximation-theorem
    
    6.5-Back-Propagation-and-Other-Differentiation-algorithms
    6.5-Back-Propagation-and-Other-Differentiation-algorithms
    
    Introduction
    
    Back-Propagation
    Back-Propagation
    
    Introduction
    
    zhihu-Back-Propagation
    
    wikipedia-Backpropagation
    
    6.5-Back-Propagation-and-Other-Differentiation
    
    Implementation
    
    Backpropagation-through-time
    
    7-Regularization-for-Deep-Learning
    7-Regularization-for-Deep-Learning
    
    7.8-Early-Stopping
    
    8-Optimization-for-Training-Deep-Models
    8-Optimization-for-Training-Deep-Models
    
    8-Optimization-for-Training-Deep-Models
    
    SGD
    SGD
    
    5.9-Stochastic-Gradient-Descent
    
    Gradient-descent
    
    Stochastic-gradient-descent
    
    9-Convolutional-Networks
    9-Convolutional-Networks
    
    9-Convolutional-Networks
    
    9.3-Pooling
    
    CS231n
    CS231n
    
    Introduction
    
    Convolutional-Neural-Networks(CNNs-or-ConvNets)
    
    CNN-translation-invariance
    
    CNN-pooling-layer
    
    VGG
    VGG
    
    VGG
    
    AlexNet
    AlexNet
    
    AlexNet
    
    Fei-Fei-Li
    
    ImageNet
    
    paper-Convolutional-Sequence-to-Sequence-Learning
    
    10-Sequence-Modeling-Recurrent-and-Recursive-Nets
    10-Sequence-Modeling-Recurrent-and-Recursive-Nets
    
    10-Sequence-Modeling-Recurrentand-Recursive-Nets
    
    10.1-Unfolding-Computational-Graphs
    
    10.3-Bidirectional-RNNs
    
    10.4-Encoder-Decoder-Sequence-to-Sequence-Architectures
    
    10.10-The-Long-Short-Term-Memory-and-Other-Gated
    
    LSTM
    LSTM
    
    LSTM
    
    colah-Understanding-LSTM-Networks
    
    RNN-and-LSTM-tutorial
    
    RNN
    RNN
    
    Introduction
    
    12-Applications
    12-Applications
    
    12.4-Natural-Language-Processing
    12.4-Natural-Language-Processing
    
    12.4.5-Neural-Machine-Translation
  - Book-Neural-Networks-and-Deep-Learning
    Book-Neural-Networks-and-Deep-Learning
    
    Introduction
  - Guide
    Guide
    
    Tutorial
    Tutorial
    
    Artificial-neural-network
    
    Neural-Networks-Tutorial
    
    ujjwalkarn-A-Quick-Introduction-to-Neural-Networks
    
    Computation-And-model-And-computational-graph
    Computation-And-model-And-computational-graph
    
    Introduction
    
    Computational-graph
    
    Model-capacity
    Model-capacity
    
    Model-capacity
    
    Model-initialization
    Model-initialization
    
    Model-initialization
    
    Batch-epoch-step
    Batch-epoch-step
    
    Buzz-word-batch-size
    
    Batch-VS-epoch
    
    Steps-VS-epochs
    
    Word-epoch
    
    Activation-function
    Activation-function
    
    Activation-function
    
    ReLU-VS-sigmoid-VS-softmax
    
    sigmoid-in-deep-learning
    
    Design-neural-network
    
    End-to-end
    End-to-end
    
    Introduction
    
    End-to-end-reinforcement-learning
    
    Transformer-and-attention
    Transformer-and-attention
    
    Papers
- Feature-engineering
  Feature-engineering
- Machine-learning
  Machine-learning
  - Introduction
  - Markov-model
    Markov-model
    
    Markov-chain
    
    Markov-models
    
    Hidden-Markov-model
    
    Viterbi-algorithm
    
    Forward-algorithm
  - CRF
    CRF
    
    CRF
- Data-generating-process
- VS-statistical-model-VS-machine-learning-model
- VS-statistics-model-VS-stochastic-process
Programming
Programming
- Programming-paradigm
  Programming-paradigm
  - Introduction
  - Symbolic-and-imperative
- TensorFlow
  TensorFlow
  - Introduction
  - Doc
  - Implementation
    Implementation
    
    Introduction
    
    TensorFlow-white-paper
    TensorFlow-white-paper
    
    Introduction
    
    whitepaper2015
    
    XLA
    XLA
    
    Introduction
    
    Tensorflow-src-explanation
    Tensorflow-src-explanation
    
    Introduction
  - API
    API
    
    Introduction
    
    Python
    Python
    
    Introduction
    
    Core-graph-data-structures
    
    Building-Graphs
    Building-Graphs
    
    Introduction
    
    tf.control_dependencies
    
    Low-level-API
    Low-level-API
    
    tf.placeholder-VS-tf.Variable
    tf.placeholder-VS-tf.Variable
    
    Introduction
    
    How-to-understand-tf.Variable
    
    TODO-C++
    TODO-C++
    
    Introduction
- pytorch
  pytorch
  - paper-Automatic-differentiation-in-PyTorch
- Keras
  Keras
- VS-pytorch-vs-tensorflow
Data-set
Data-set
- Introduction
Application
Application
- NLP
  NLP
  - Introduction
  - NLP
  - WordNet
  - NLP-progress
    NLP-progress
    
    Task-of-NLP
    
    Entity-Linking
    Entity-Linking
    
    Entity-Linking
    
    Relationship-Extraction
    Relationship-Extraction
    
    Relationship-Extraction
    
    Shallow-parsing
    Shallow-parsing
    
    Shallow-parsing
    
    Text-Classification
    Text-Classification
    
    Text-Classification
    
    paper-Bidirectional-LSTM-with-attention-mechanism-and-convolutional-layer
    
    Part-of-speech-tagging
    Part-of-speech-tagging
    
    Introduction
    
    Neural-machine-translation
    Neural-machine-translation
    
    Neural-machine-translation
    
    Distant-supervision
    Distant-supervision
    
    Distant-supervision
  - Representation-of-word
    Representation-of-word
    
    Representation-of-image-VS--word Representation-of-image-VS--word
    Table of contents
    
    representation of word
  - Linguistics
    Linguistics
    
    Introduction
    
    Part-of-speech
  - Model
    Model
    
    BERT
    BERT
    
    BERT
    
    BERT-paper
    
    BERT-implementation
    
    GPT
    GPT
    
    openai-GPT-2
  - Lib
    Lib
    
    BERT
    BERT
    
    BERT
    
    BERT-paper
    
    BERT-implementation
    
    GPT
    GPT
    
    openai-GPT-2
  - Book-Natural-Language-Processing-with-Python
    Book-Natural-Language-Processing-with-Python
    
    Introduction
- Computable-knowledge
  Computable-knowledge
  - Introduction
- Sequence
  Sequence
  - Sequence
- NRE
  NRE
  - Introduction

前言

对word的representation要比对image的representation难，因为word是具有非常广泛的distribution，它的组合是非常之多的，但是image则非常集中；

在NLP领域一个非常重要的问题就是如何来度量语义的相似性；

representation of word

Vector space model

Distributional semantics

在Word embedding中有这样的一段话：

In 2000 Bengio et al. provided in a series of papers the "Neural probabilistic language models" to reduce the high dimensionality of words representations in contexts by "learning a distributed representation for words". (Bengio et al., 2003).[12]