Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches Conference Maurizio Ferrari Dacrema; Paolo Cremonesi; Dietmar Jannach Proceedings of the 2019 ACM Conference on Recommender Systems (RecSys 2019), Copenhagen, 2019, (Pre SFI). @conference{Dacrema2019,
title = {Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches},
author = {Maurizio Ferrari Dacrema and Paolo Cremonesi and Dietmar Jannach},
url = {https://arxiv.org/pdf/1907.06902.pdf},
year = {2019},
date = {2019-08-16},
booktitle = {Proceedings of the 2019 ACM Conference on Recommender Systems (RecSys 2019)},
address = {Copenhagen},
abstract = {Deep learning techniques have become the method of choice for
researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in
general, it has, as a result, become difficult to keep track of what
represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications
point out problems in today’s research practice in applied machine
learning, e.g., in terms of the reproducibility of the results or the
choice of the baselines when proposing new models.
In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically,
we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however
turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or
graph-based techniques. The remaining one clearly outperformed
the baselines but did not consistently outperform a well-tuned nonneural linear ranking method. Overall, our work sheds light on a
number of potential problems in today’s machine learning scholarship and calls for improved scientific practices in this area.},
note = {Pre SFI},
keywords = {Collaborative filtering, General and referance, Information Systems, Recommender systems, WP2: User Modeling Personalization and Engagement},
pubstate = {published},
tppubtype = {conference}
}
Deep learning techniques have become the method of choice for
researchers working on algorithmic aspects of recommender systems. With the strongly increased interest in machine learning in
general, it has, as a result, become difficult to keep track of what
represents the state-of-the-art at the moment, e.g., for top-n recommendation tasks. At the same time, several recent publications
point out problems in today’s research practice in applied machine
learning, e.g., in terms of the reproducibility of the results or the
choice of the baselines when proposing new models.
In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically,
we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however
turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or
graph-based techniques. The remaining one clearly outperformed
the baselines but did not consistently outperform a well-tuned nonneural linear ranking method. Overall, our work sheds light on a
number of potential problems in today’s machine learning scholarship and calls for improved scientific practices in this area. |
Evaluation of Session-based Recommendation Algorithms Journal Article Malte Ludewig; Dietmar Jannach In: User-Modeling and User-Adapted Interaction, vol. 28, no. 4-5, pp. 331-390, 2018, (Pre SFI). @article{Ludewig2018,
title = {Evaluation of Session-based Recommendation Algorithms},
author = {Malte Ludewig and Dietmar Jannach},
url = {https://arxiv.org/pdf/1803.09587.pdf},
doi = {10.1007/s11257-018-9209-6},
year = {2018},
date = {2018-03-26},
journal = {User-Modeling and User-Adapted Interaction},
volume = {28},
number = {4-5},
pages = {331-390},
abstract = {Recommender systems help users find relevant items of interest, for example on e-commerce or media streaming sites. Most academic research is concerned with approaches that personalize the recommendations according to long-term user profiles. In many real-world applications, however, such long-term profiles often do not exist and recommendations therefore have to be made solely based on the observed behavior of a user during an ongoing session. Given the high practical relevance of the problem, an increased interest in this problem can be observed in recent years, leading to a number of proposals for session-based recommendation algorithms that typically aim to predict the user's immediate next actions. In this work, we present the results of an in-depth performance comparison of a number of such algorithms, using a variety of datasets and evaluation measures. Our comparison includes the most recent approaches based on recurrent neural networks like GRU4REC, factorized Markov model approaches such as FISM or FOSSIL, as well as simpler methods based, e.g., on nearest neighbor schemes. Our experiments reveal that algorithms of this latter class, despite their sometimes almost trivial nature, often perform equally well or significantly better than today's more complex approaches based on deep neural networks. Our results therefore suggest that there is substantial room for improvement regarding the development of more sophisticated session-based recommendation algorithms.},
note = {Pre SFI},
keywords = {Evaluation, General and referance, Information Systems, Recommender systems, WP2: User Modeling Personalization and Engagement},
pubstate = {published},
tppubtype = {article}
}
Recommender systems help users find relevant items of interest, for example on e-commerce or media streaming sites. Most academic research is concerned with approaches that personalize the recommendations according to long-term user profiles. In many real-world applications, however, such long-term profiles often do not exist and recommendations therefore have to be made solely based on the observed behavior of a user during an ongoing session. Given the high practical relevance of the problem, an increased interest in this problem can be observed in recent years, leading to a number of proposals for session-based recommendation algorithms that typically aim to predict the user's immediate next actions. In this work, we present the results of an in-depth performance comparison of a number of such algorithms, using a variety of datasets and evaluation measures. Our comparison includes the most recent approaches based on recurrent neural networks like GRU4REC, factorized Markov model approaches such as FISM or FOSSIL, as well as simpler methods based, e.g., on nearest neighbor schemes. Our experiments reveal that algorithms of this latter class, despite their sometimes almost trivial nature, often perform equally well or significantly better than today's more complex approaches based on deep neural networks. Our results therefore suggest that there is substantial room for improvement regarding the development of more sophisticated session-based recommendation algorithms. |