ActionFormer: temporal action localization
identify actions in time and recognize their categories
Contribution
Methods = Encoder + Decoder

Encoder = CNN + Transformer (No positional encoding)
Decoder = classification head + regression head
Last updated