Cross-language Information Retrieval by Reduced k-means
Keywords:
cross-language information retrieval, dimensionality reduction, latent semantic indexing, canonical correlation analysis, Reduced k-meansAbstract
Cross-language information retrieval aims at retrieving relevant documents in one language for a query set in another language. Here we propose a new approach to the problem of cross-language information retrieval based on factorization of a term-document matrix by an iterative method of Reduced k-means clustering. Method of Reduced k-means intended at simultaneous reduction of objects (documents) and variables (index terms). Proposed method is compared to standard machine learning techniques of cross-language information retrieval by usage of latent semantic indexing and canonical correlation analysis. Motivation for usage of Reduced k-means method for a task of cross-language information retrieval comes from an observation that documents in a semantic space obtained by method of latent semantic indexing are clustered by their language and not by their topics in the first place. As Reduced k-means aims at preserving clustering structure of data, the idea is that the proposed method could address the mentioned problem.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 International Journal of Computer Information Systems and Industrial Management Applications
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.