Who can see your viewing activity?
Should it be m^k_v in step 2?
Do we use message from (k-1)-th or k-th step in combine?
Do these Aggregate/Combine operations take the node degree into account? Since high degree nodes would capture information from the entire graph more quickly than low degree nodes.
Should we think each feature as a iteration?
Vicky Jiaqi Zhang
What’s the latent dimension?
Did we choose to have weights shared between iterations?
How much does the complexity of the network increase by creating a fully-connected NN?
Does the hyperbolic tangent does any better job than ReLU in terms of extrapolation?
Gautam Reddy Nallamala
How is the network trained end-to-end with a min function?
How are we encoding the prior information that you talked about earlier about the problem we aretrying to learn so that we can do better in out of train set distribution, did we encode some priorprobability Distribution , if so how we are doing here now?
Can you learn the linear functions directly, instead of as MLPs?
Could you give more information about ReLU networks fitting linear function away from training data?
No I meant the plots you showed where ReLU networks don’t extrapolate
Is the theorem false if you don't assume that the network is trained by GD in this theorem?
Thanks, can you also provide further details regarding the ReLU network structure in your experiments such as depth, width, dropout, batch norm etc.
Dr. Jegelka, the talk was extremely interesting! I'm quite unfamiliar with GNN's.. So, for a novice, are there any resources I could use to learn more about GNN's?
You mentioned earlier how an RNN is like a GNN but where the update graph forms a line, but do we assume the weights in the RNN aren't necessariy shared among iterations?
If the problem is not dynamic programing in the core, the GNN work?
Could I have a reference to the matrix algebra thing Boaz mentioned earlier?