DataRec Library for Reproducible in Recommend Systems

DataRec Library for Reproducible in Recommend Systems

In this episode of Data Skeptic's Recommender Systems series, host Kyle Polich explores DataRec, a new Python library designed to bring reproducibility and standardization to recommender systems research. Guest Alberto Carlo Mario Mancino, a postdoc researcher from Politecnico di Bari, Italy, discusses the challenges of dataset management in recommendation research—from version control issues to preprocessing inconsistencies—and how DataRec provides automated downloads, checksum verification, and standardized filtering strategies for popular datasets like MovieLens, Last.fm, and Amazon reviews.

The conversation covers Alberto's research journey through knowledge graphs, graph-based recommenders, privacy considerations, and recommendation novelty. He explains why small modifications in datasets can significantly impact research outcomes, the importance of offline evaluation, and DataRec's vision as a lightweight library that integrates with existing frameworks rather than replacing them. Whether you're benchmarking new algorithms or exploring recommendation techniques, this episode offers practical insights into one of the most critical yet overlooked aspects of reproducible ML research.

Episoder(589)

Data Science at eHarmony

Data Science at eHarmony

I'm joined this week by Jon Morra, director of data science at eHarmony to discuss a variety of ways in which machine learning and data science are being applied to help connect people for successful long term relationships. Interesting open source projects mentioned in the interview include Face-parts, a web service for detecting faces and extracting a robust set of fiducial markers (features) from the image, and Aloha, a Scala based machine learning library. You can learn more about these and other interesting projects at the eHarmony github page. In the wrap up, Jon mentioned the LA Machine Learning meetup which he runs. This is a great resource for LA residents separate and complementary to datascience.la groups, so consider signing up for all of the above and I hope to see you there in the future.

27 Mai 201642min

[MINI] Stationarity and Differencing

[MINI] Stationarity and Differencing

Mystery shoppers and fruit cultivation help us discuss stationarity - a property of some time serieses that are invariant to time in several ways. Differencing is one approach that can often convert a non-stationary process into a stationary one. If you have a stationary process, you get the benefits of many known statistical properties that can enable you to do a significant amount of inferencing and prediction.

20 Mai 201613min

Feather

Feather

I'm joined by Wes McKinney (@wesmckinn) and Hadley Wickham (@hadleywickham) on this episode to discuss their joint project Feather. Feather is a file format for storing data frames along with some metadata, to help with interoperability between languages. At the time of recording, libraries are available for R and Python, making it easy for data scientists working in these languages to quickly and effectively share datasets and collaborate.

13 Mai 201623min

[MINI] Bargaining

[MINI] Bargaining

Bargaining is the process of two (or more) parties attempting to agree on the price for a transaction. Game theoretic approaches attempt to find two strategies from which neither party is motivated to deviate. These strategies are said to be in equilibrium with one another. The equilibriums available in bargaining depend on the the transaction mechanism and the information of the parties. Discounting (how long parties are willing to wait) has a significant effect in this process. This episode discusses some of the choices Kyle and Linh Da made in deciding what offer to make on a house.

6 Mai 201615min

deepjazz

deepjazz

Deepjazz is a project from Ji-Sung Kim, a computer science student at Princeton University. It is built using Theano, Keras, music21, and Evan Chow's project jazzml. Deepjazz is a computational music project that creates original jazz compositions using recurrent neural networks trained on Pat Metheny's "And Then I Knew". You can hear some of deepjazz's original compositions on soundcloud.

29 Apr 201629min

[MINI] Auto-correlative functions and correlograms

[MINI] Auto-correlative functions and correlograms

When working with time series data, there are a number of important diagnostics one should consider to help understand more about the data. The auto-correlative function, plotted as a correlogram, helps explain how a given observations relates to recent preceding observations. A very random process (like lottery numbers) would show very low values, while temperature (our topic in this episode) does correlate highly with recent days. See the show notes with details about Chapel Hill, NC weather data by visiting: https://dataskeptic.com/blog/episodes/2016/acf-correlograms

22 Apr 201614min

Early Identification of Violent Criminal Gang Members

Early Identification of Violent Criminal Gang Members

This week I spoke with Elham Shaabani and Paulo Shakarian (@PauloShakASU) about their recent paper Early Identification of Violent Criminal Gang Members (also available onarXiv). In this paper, they use social network analysis techniques and machine learning to provide early detection of known criminal offenders who are in a high risk group for committing violent crimes in the future. Their techniques outperform existing techniques used by the police. Elham and Paulo are part of the Cyber-Socio Intelligent Systems (CySIS) Lab.

15 Apr 201627min

[MINI] Fractional Factorial Design

[MINI] Fractional Factorial Design

A dinner party at Data Skeptic HQ helps teach the uses of fractional factorial design for studying 2-way interactions.

8 Apr 201611min

Populært innen Vitenskap

fastlegen
rekommandert
tingenes-tilstand
jss
rss-rekommandert
sinnsyn
forskningno
rss-overskuddsliv
villmarksliv
rss-paradigmepodden
doktor-fives-podcast
fjellsportpodden
tomprat-med-gunnar-tjomlid
rss-nysgjerrige-norge
pod-britannia
abid-nadia-skyld-og-skam
nevropodden
vett-og-vitenskap-med-gaute-einevoll
aldring-og-helse-podden
rss-inn-til-kjernen-med-sunniva-rose