Mall Cop Segway Video Clip

Effectiveness of Max-Pooling for Fine-Tuning CLIP on Videos

Abstract: CLIP is a powerful spatial feature extractor trained on a large dataset of image-text pairs. It exhibits strong generalization when extended to other domains and modalities. However, its ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Effectiveness of Max-Pooling for Fine-Tuning CLIP on Videos

Trending now