If you haven’t already please add your name to this signup sheet: https://docs.google.com/spreadsheets/d/1eRO6tZUgUpay_Y-0Tx5rRkQdbYA2yP839kCI4C1uVX8/edit#gid=0
Please post questions for Yves in the chat here!
That was fascinating, thank you! Will the slides be shared?
Question for Yves, am curious if these new approached require a new computing strategy for Netflix? i.e. does it necessitate the need for GPUs or other accelerators?
for the slides: https://drive.google.com/drive/folders/13kPiJnVx8C1kuRQ-iXYlE_1HqWrR1hSK
Awesome talk, thanks! What exactly does “continuous time” mean — are we talking about fine-grained discretization? If so, what’s the granularity - minutes? Seconds?
Can you say more about different approaches modelling time?
We have seen a lot of buzz with platforms who use collab filtering recommendations like TikTok that it can promote underlying bias based on race, gender etc... Does the algorithm that Netflix uses avoid that kind of bias?
Thanks for the talk Yves! Do you think in general recommendations should have a causal effect? Why exactly?
How do recommendation systems work in multi-user settings? For example, say I share my Netflix (same profile) with my brother who has very different tastes than me? Can the recommender system understand this and make separate recommendations? Or is this a harder problem this?
Thanks for the talk! Let’s say I have trouble running my bandits model online (as happens too often in industry practice), how much of a downturn is it to run my bandits model offline? The downturn seems pretty big.
does Netflix see a correlation between online metrics and the IPS estimates?
To access the materials presented, we are recording the Zoom meeting and hope to post the video after the tutorial. Recorded talks for the papers presented in the Q&A are already available on the tutorial website (https://personal-tutorial.com/personal-at-isca-2020/).
(sorry I didn't get through all the questions -- feel free to ping me at firstname.lastname@example.org after the workshop :) )
Thanks for the talk, Yves!
Thanks Yves. It was an informative talk
thanks for the talk! What would be the takeaways (if any) from your research for someone using classical cloud GPU servers [AWS / Azure /GCP / …]
How many threads, submitting CUDA kernels did you end up with?
Do you have any insight on the growth rate of the embedding table size?
Super cool! :)
Can you split an embedding table across multiple GPUs?
Great talk! My question: did you come up with some general technique to automatically "fuse" or "concatenate" kernels? Merging two kernels can cause interference in the general case.
E.g., you have 8x32GB GPU memory, would you keep a 60GB embedding table in GPU memory or put in CPU memory?
Thanks, Bilge! Can you pls. elaborate more on the last optimization about where do perform the concatenation? What are the pros and cons of each option?
Followup on this question, I don’t quite get last optimization that how would concatenation optimizes the GPU memory usage? Concatenation does not change the data size right?
Does using AMD Epic like CPU (each with 128 threads) on a dual socket system be more helpful for recommendation systems than a GPU?
What percentage of Facebook models fits on single GPU?
Also, for the system(s) that don't have NVLinks and are limited by GPU-to-GPU bandwidth in distributed training, did you consider "aggregate" multiple batch updates? We did such an optimization when training BERT-large on a non-DGX machine (4x T4 GPUs per machine) and it helped a lot in reducing communication overhead without hurting model accuracy/convergence
@Gennady, there is also a trick to introduce lag in the weights updates, and use the gradients from step n to update the weights on the step (n+1). It can help perform communication and computation simultaneously.
@dmitry is that sort of pipelining commonly used? Seems like using stale weights could have some accuracy challenges (potentially)
@Dmitry: yes, it can be a good technique to improve performance/training throughput, but it might have an affect on the speed of convergence. Although I would expect it to work in many cases without a problem.
What bandwidth do you need for the SSD to sustain the GPU compute?
@David I've seen that actually used in Megatron-LM training, because the goal there was to optimize large-scale training to the limits.
@Weijie, thank you! What is the comparison of the accuracy with the systems, which use less dense input features? Say, BERT?
how do you compare two embeddings of different sizes?
Paul & Kirsten
@Tony Do you have any insight as to how the latent space behaves when the dimensions are different?
@Tony We see that for training using this mixed embeddings approach will give us gains on training time; how do you think this method will affect inference? Will there be similar performance benefits?
Have you compared this method to hashing only unpopular items using a similar frequency cap and parameter budget?
how do you pick m?
I have a question I’d love to ask over mic
Gabriel then Meghan
I am wondering does it affect the quality, does it affect convergence?
Hi everyone,I wanted to follow up with some of the questions I missed earlier.I unfortunately cannot answer the questions regarding the growth rate of the embedding sizes and ratio of the embeddings that fits on GPUs at Facebook. In the models I showed in my talk, we do distribute different tables across GPUs but we don’t split them.Regarding the questions about the last optimization I talked about:@Gennady PekhimenkoWe did employ kernel fusion for some operators but we did not use an automated way to do this. There is some great work on kernel fusion using Pytorch Jit too.@Jiashen CaoFusing the concats on the GPU helped with reducing the data transfer overhead, it does not optimize the memory usage.Thanks everyone! Feel free to email me at email@example.com if you have further questions.
Udit and I would like to thank all the speakers, student session chairs, presenters, and all attendees!
Please stay in touch! We will follow up with the recorded talks and materials
Stay in touch!
Here -- https://docs.google.com/spreadsheets/d/1eRO6tZUgUpay_Y-0Tx5rRkQdbYA2yP839kCI4C1uVX8/edit#gid=0
This was excellent, thanks so much!
Thanks for all the Organisers, Speakers and Moderators!!
We will share with you all the future events!
Thank you everyone!
Thank you for this great content, made available freely online!
Thank you, everyone!
thank for the nice session..very useful to me