Predicting Out Of Memory Kill events with Machine Learning (Ep. 203)

Data Science at Home - A podcast by Francesco Gadaleta

Categories:

Sometimes applications crash. Some other times applications crash because memory is exhausted. Such issues exist because of bugs in the code, or heavy memory usage for reasons that were not expected during design and implementation. Can we use machine learning to predict and eventually detect out of memory kills from the operating system? Apparently, the Netflix app many of us use on a daily basis leverage ML and time series analysis to prevent OOM-kills. Enjoy the show! Our Sponsors Explore the Complex World of Regulations. Compliance can be overwhelming. Multiple frameworks. Overlapping requirements. Let Arctic Wolf be your guide.Check it out at https://arcticwolf.com/datascience Amethix works to create and maximize the impact of the world’s leading corporations and startups, so they can create a better future for everyone they serve. We provide solutions in AI/ML, Fintech, Healthcare/RWE, and Predictive maintenance. Transcript 100:00:04,150 --> 00:00:09,034And here we are again with the season four of the Data Science at Home podcast. 200:00:09,142 --> 00:00:19,170This time we have something for you if you want to help us shape the data science leaders of the future, we have created the the Data Science at Home's Ambassador program. 300:00:19,340 --> 00:00:28,378Ambassadors are volunteers who are passionate about data science and want to give back to our growing community of data science professionals and enthusiasts. 400:00:28,534 --> 00:00:37,558You will be instrumental in helping us achieve our goal of raising awareness about the critical role of data science in cutting edge technologies. 500:00:37,714 --> 00:00:45,740If you want to learn more about this program, visit the Ambassadors page on our [email protected]. 600:00:46,430 --> 00:00:49,234Welcome back to another episode of Data Science at Home podcast. 700:00:49,282 --> 00:00:55,426I'm Francesco Podcasting from the Regular Office of Amethyx Technologies, based in Belgium. 800:00:55,618 --> 00:01:02,914In this episode, I want to speak about a machine learning problem that has been formulated at Netflix. 900:01:03,022 --> 00:01:22,038And for the record, Netflix is not sponsoring this episode, though I still believe that this problem is a very well known problem, a very common one across factors, which is how to predict out of memory kill in an application and formulate this problem as a machine learning problem. 1000:01:22,184 --> 00:01:39,142So this is something that, as I said, is very interesting, not just because of Netflix, but because it allows me to explain a few points that, as I said, are kind of invariance across sectors. 1100:01:39,226 --> 00:01:56,218Regardless of your application, is a video streaming application or any other communication type of application, or a fintech application, or energy, or whatever, this memory kill, out of memory kill still occurs. 1200:01:56,314 --> 00:02:05,622And what is an out of memory kill? Well, it's essentially the extreme event in which the machine doesn't have any more memory left. 1300:02:05,756 --> 00:02:16,678And so usually the operating system can start eventually swapping, which means using the SSD or the hard drive as a source of memory. 1400:02:16,834 --> 00:02:19,100But that, of course, will slow down a lot. 1500:02:19,430 --> 00:02:45,210And eventually when there is a bug or a memory leak, or if there are other applications running on the same machine, of course there is some kind of limiting factor that essentially kills the application, something that occurs from the operating system most of the time that kills the application in order to prevent the application from monopolizing the entire machine, the hardware of the machine. 1600:02:45,710 --> 00:02:48,500And so this is a very important problem. 1700:02:49,070 --> 00:03:03,306Also, it is important to have an episode about this because there are some strategies that I've used at Netflix that are pretty much in line with what I

Visit the podcast's native language site