Jaccard similarity

Jaccard similarity how to#
Jaccard similarity code#

There are various types of recommendation algorithms, such as kNN, NMF, LMF, etc. According to the property proved in (pages 75–76), TMJ is also a kernel function. TMJ is a product of Jaccard and Triangle.

Jaccard similarity code#

All code files and data sets are available from the Github database ( ).įrom the viewpoint of multiple kernel learning, the similarity measures such as Jaccard and Triangle meet the requirements of kernel function. Finally, we make our concluding remarks and indicate further work. Subsequently, we analyze the experimental results. Secondly we present the Triangle and TMJ similarities with a running example. In subsequent sections, we firstly review the basic concept of memory-based recommender system and eight popular similarity measures. Specifically, the MAE obtained on four datasets are 0.707, 0.671, 0.614 and 0.179, respectively. Results show that the recommender system using TMJ outperforms all the counterparts in terms of the mean absolute error (MAE) and the root mean square error (RSME). The leave-one-out scenario is chosen because the result is not influenced by the division of the training/testing sets. These datasets include Movielens 100k, 1M, FilmTrust and EachMovie. We compare TMJ with eight existing measures on four popular datasets under the leave-one-out scenario. Therefore TMJ can take advantage of both Triangle and Jaccard similarities. Fortunately, the Jaccard similarity complements with it in that non co-rating users are considered. Since it only considers the co-rating users, it is not good enough when used alone. The Triangle similarity is one minus the third divided by the sum of two edges corresponding to the vectors. As illustrated in Fig 1, the rating vectors of two items form a triangle in the space. Only the item-based CF will be considered since it performs better than the user-based one. This paper proposes the Triangle multiplying Jaccard (TMJ) similarity. Naturally, new similarity measures providing better prediction ability are always desired. State-of-the-art ones include Cosine, Pearson Correlation Coefficient (PCC), Jaccard, Proximity Impact Popularity (PIP), New Heuristic Similarity Model (NHSM) and so on. Various types of similarity measures have been adopted or designed for this issue.

Jaccard similarity how to#

The key issue of CF scheme is how to calculate the similarity between users or items. Collaborative filtering (CF) through k-nearest neighbors (kNN) is a popular memory-based recommendation schema. The distance measure is essential in machine learning tasks such as clustering, classification, image processing, and collaborative filtering. OBDMA201601),, data collection and analysis and Innovation and Entrepreneurship Foundation of Southwest Petroleum University (Grant SWPUSC16-003),, data collection and analysis.Ĭompeting interests: The authors have declared that no competing interests exist. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.ĭata Availability: All code files and datasets are available from the Github database ( ).įunding: This work was supported by National Natural Science Foundation of China (Grant 6137904114),, decision to publish and preparation of the manuscript Natural Science Foundation of the Department of Education of Sichuan Province (Grant 16ZA0060),, study design Key Laboratory of Oceanographic Big Data Mining & Application of Zhejiang Province (grant No. Received: JAccepted: AugPublished: August 17, 2017Ĭopyright: © 2017 Sun et al. (2017) Integrating Triangle and Jaccard similarities for recommendation. Citation: Sun S-B, Zhang Z-H, Dong X-L, Zhang H-R, Li T-J, Zhang L, et al.