Vitor Cerqueira
Dec 15, 2022

Hi Flávio! Thanks, I'm glad you enjoy this space.

The point of purging is to avoid the inherent correlation between the final part of the training data and the start of validation. In principle, this will result in a better estimation concerning the performance of the model in the long-term.

But, as far as I known, purging is a CV-only thing. After CV, I would re-train the model using the whole data set. Does this answer your question?

Regarding the second point. Yes, you should try to use the information from other products to forecast the sales of another. A form of leakage can occur can occur if you have the same entity (product) in both training and validation.

I think this can be situational. Here's a more comprehensive read on this problem: https://www.kaggle.com/code/jorijnsmit/found-the-holy-grail-grouptimeseriessplit.

Hope this helps!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Vitor Cerqueira
Vitor Cerqueira

Written by Vitor Cerqueira

Ph.D and Researcher on AI and Time Series.

No responses yet

Write a response