Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification - Details

author：

Indexed by：

Scopus

Abstract：

Data-Free　Knowledge　Distillation　(DFKD)　has　shown　great　potential　in　creating　a　compact　student　model　while　alleviating　the　dependency　on　real　training　data　by　synthesizing　surrogate　data.　However,　prior　arts　are　seldom　discussed　under　distribution　shifts,　which　may　be　vulnerable　in　real-world　applications.　Recent　Vision-Language　Foundation　Models,　e.g.,　CLIP,　have　demonstrated　remarkable　performance　in　zero-shot　out-of-distribution　generalization,　yet　consuming　heavy　computation　resources.　In　this　paper,　we　discuss　the　extension　of　DFKD　to　Vision-Language　Foundation　Models　without　access　to　the　billion-level　image-text　datasets.　The　objective　is　to　customize　a　student　model　for　distribution-agnostic　downstream　tasks　with　given　category　concepts,　inheriting　the　out-of-distribution　generalization　capability　from　the　pre-trained　foundation　models.　In　order　to　avoid　generalization　degradation,　the　primary　challenge　of　this　task　lies　in　synthesizing　diverse　surrogate　images　driven　by　text　prompts.　Since　not　only　category　concepts　but　also　style　information　are　encoded　in　text　prompts,　we　propose　three　novel　Prompt　Diversification　methods　to　encourage　image　synthesis　with　diverse　styles,　namely　Mix-Prompt,　Random-Prompt,　and　Contrastive-Prompt.　Experiments　on　out-of-distribution　generalization　datasets　demonstrate　the　effectiveness　of　the　proposed　methods,　with　Contrastive-Prompt　performing　the　best.　©　2023　ACM.

Keyword：

data-free knowledge distillation out-of-distribution generalization vision-language foundation model

Community：

[ 1 ] [Xuan Y.]Hikvision Research Institute, Hangzhou, China
[ 2 ] [Chen W.]Hikvision Research Institute, Hangzhou, China
[ 3 ] [Chen W.]College of Computer Science and Technology, Zhejiang University, Hangzhou, China
[ 4 ] [Yang S.]Hikvision Research Institute, Hangzhou, China
[ 5 ] [Xie D.]Hikvision Research Institute, Hangzhou, China
[ 6 ] [Lin L.]Fuzhou University, Fuzhou, China
[ 7 ] [Zhuang Y.]College of Computer Science and Technology, Zhejiang University, Hangzhou, China

Reprint 's Address：

Email：

Show more details

Related Keywords：

Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification
2023，PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023

Source ：

Year： 2023

Page： 4928-4938

Language： English

Cited Count：

WoS CC Cited Count： 0

SCOPUS Cited Count：

ESI Highly Cited Papers on the List： 0 Unfold All

WanFang Cited Count：

Chinese Cited Count：

30 Days PV： 0

Affiliated Colleges：

Get Fulltext

DOI Library Discovery Baidu Scholar Search SCOPUS

Type
Departments

All Years Choose Year From to