Monday, November 12, 2007

Custering paper presentation, 2007-11-12, Nathan

Title: Evolutionary Spectral Clustering by Incorporating Temporal Smoothness
Author: Yun Chi, Xiaodan Song, Dengyong Zhou, Koji Hino and Belle Tseng

Input: Similarity matrices between nodes at different timesteps.
Output: Data Clusters at each timestep

Summary: In many applications, the characteristics of the objects to be clustered change over time. On one hand, the current clusters should depend mainly on current data features. On the other hand, current clusters should not deviate too much from recent historic data. This paper extend spectral clustering such that the clusters produced in successive tempsteps exhibit certain levels of temporal smoothness. The paper modify the cost function of spectral clustering to consider both snapshot cost and temporal cost. They proposed two frameworks for defining the temporal cost. Under the preserving cluster quality framework, a convex combination of the similarity matrix at different timesteps is used to define the data similarities. Under the preserving cluster membership framework, the difference between the clustering results at different timesteps is added to the cost function. Under both frameworks, the optimal solutions remain to be the top-k eigenvectors of some symmetric matrices.

Monday, October 15, 2007

Seminar 2007-10-8 Evan

Title: Combining content and link for classification using matrix factorization
Authors: Shenghuo Zhu, Kai Yu, Yun Chi, Yihong
Venue: SIGIR 2007

Input: A set of document, A set of links connecting between these document
Output: The class label of each document

Method:
1. Form the set of document into term-doc matrix C, and form the set of links into adjacency matrix A
2. Factorize C and A at the same time, and the factorized latent base Z are shared:
C->ZV, A->ZUZ^T
3. Since the objects connected by the link are from the same set, we factorize A into ZUZ^T to guarantee that the latent feature values of the source and destination are consistent.

Monday, October 8, 2007

"Learning to Rank" paper presented by Bin

Title: Learning to Rank: From Pairwise Approach to Listwise Approach
Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, Han Li
Venue: ICML 2007

Problem Setting:
Input: queries and Web pages' relevant scores.
Output: rank of Web pages.

High Level Idea:
The point in this paper is that it constructs a loss function based on listwise relation rather than pairwise relation. A distribution over listwise rank is designed based on ranking scores and the goal is to minimize its distance between the ground truth, by which we find the linear ranking function.

Problems:
1. Does not consider the priority in the rank;
2. Only use linear function as the ranking function.

Seminar 2007-10-8 Vincent

Paper: [ICML07] Multiclass Core Vector Machine (CVM)

Input: labeled data {(x_i,y_i)i=1,...,n}
Output: core set
Aim: extend two class CVM to T-class
Method: map label y_i to a T*1 vector, and still plug it into the rou-SVM. Then still use CVM to solve.

Questions: (1) is the mapping function unique, or best?
(2) can this be seen as a general rule to extend two-class to multi-class classification, no matter what kind of classifier is used?
(3) CVM is different from active learning, which works for unlabeled data. Placement problem can be put under CVM.

Prof.yang

What's new on ranking:

- transfer learning: query distribution, doc distribution

- semi-supervised ranking: given query q_i and its ranking vector y_i, some labeled and some unlabeled

- ranking is embedding: (query, doc set) --> lower dimension, a line. Given another (q,d) --> another dimension, are these two spaces the same?

- Multi-view: ranking by text, vedio, etc., how to apply multi-view to balance these different rankings

- collaborative ranking: ranking for science {q_i}, ranking for entertainment {q_j}, and we know user preference on these two topics. Any possibility to derive the latent space?

- the relation between ranking and classification: discriminative model for ranking, discriminative and generative model for classification

Nathan:
- partial order = direct graph, random walk

Caobin:
- given query, see how doc ranked. How about given doc, see how query ranked? Iterative method?

Evan:
- labeled clickthrough, unlabeled query.

Sinno:
- embedding,

Seminar 2007-10-8 Derek

Topic: ranking
Input: queries, corpus
Output: ranking
Evaluation: Accuracy, ROC, MAP (mean average precision), NDCG

(1) Structural SVM - MAP is not linearly decomposable, then the same as NDCG.
question: in training, (x,y) mapped to feature space, this depends on y, but in testing, how to map x to the feature space?
(2) AdaRank - AdaBoost on instance s = (q,d,y)
(3) Feature Selection for Ranking -

Seminar 2007-10-8 Nathan

Title: Active Exploration for Learning Rankings from Clickthrough Data
Author: Filip Radlinski and Thorsten Joachims
Venue: SIGKDD 2007