Publications

Publications by categories in reversed chronological order. Full list can be found at Google Scholar.

2024

  1. ArXiv
    InternLM2 Technical Report
    Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, and 95 more authors
    2024
  2. ArXiv
    InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
    Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, and 18 more authors
    2024
  3. CVPR
    From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
    Rongjie Li, Songyang Zhang, Dahua Lin, Kai Chen, and Xuming He
    In Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
  4. NAACL
    Fake Alignment: Are LLMs Really Aligned Well?
    Yixu Wang, Yan Teng, Kexin Huang, Chengqi Lyu, Songyang Zhang, and 3 more authors
    In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024
  5. T-PAMI
    SGTR+: End-to-end Scene Graph Generation with Transformer
    Rongjie Li, Songyang Zhang, and Xuming He
    In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
  6. NAACL
    BotChat: Evaluating LLMs’ Capabilities of Having Multi-Turn Dialogues
    Haodong Duan, Jueqi Wei, Chonghua Wang, Hongwei Liu, Yixiao Fang, and 3 more authors
    In Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2024
  7. TMLR
    PixMIM: Rethinking Pixel Reconstruction in Masked Image Modeling
    Yuan Liu, Songyang Zhang, Jiacheng Chen, Kai Chen, and Dahua Lin
    In Transactions on Machine Learning Research (TMLR), 2024

2023

  1. ArXiv
    InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
    Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, and 16 more authors
    2023
  2. ArXiv
    LawBench: Benchmarking Legal Knowledge of Large Language Models
    Zhiwei Fei, Xiaoyu Shen, Dawei Zhu, Fengzhe Zhou, Zhuo Han, and 4 more authors
    arXiv preprint arXiv:2309.16289 2023
  3. ArXiv
    MMBench: Is Your Multi-modal Model an All-around Player?
    Yuan Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, and 7 more authors
    In arXiv Preprint, 2023
  4. IJCAI
    TG-VQA: Ternary Game of Video Question Answering
    Hao Li, Peng Jin, Zesen Cheng, Songyang Zhang, Kai Chen, and 3 more authors
    In Proceeding of International Joint Conferences on Artificial Intelligence (IJCAI), 2023
  5. ICCV
    Improving Pixel-based MIM by Reducing Wasted Modeling Capability
    Yuan Liu, Songyang Zhang, Jiacheng Chen, Zhaohui Yu, Kai Chen, and 1 more author
    In Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV), 2023
  6. CVPR
    RIFormer: Keep Your Vision Backbone Effective But Removing Token Mixer
    Jiahao Wang, Songyang Zhang, Yong Liu, Taiqiang Wu, Yujiu Yang, and 4 more authors
    In Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023

2022

  1. ECCV
    Learning Semantic Correspondence with Sparse Annotations
    Shuaiyi Huang, Luyu Yang, Bo He, Songyang Zhang, Xuming He, and 1 more author
    In Proceeding of the European Conference on Computer Vision (ECCV), 2022
  2. ECCV
    Action Quality Assessment with Temporal Parsing Transformer
    Yang Bai, Desen Zhou, Songyang Zhang, Jian Wang, Errui Ding, and 2 more authors
    In Proceeding of the European Conference on Computer Vision (ECCV), 2022
  3. CVPR
    SGTR: End-to-end Scene Graph Generation with Transformer
    Rongjie Li, Songyang Zhang, and Xuming He
    In Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

2021

  1. NeurIPS
    Dynamic Grained Encoder for Vision Transformers
    Lin Song*, Songyang Zhang*, Songtao Liu, Zeming Li, Xuming He, and 3 more authors
    In Proceeding of Advances in Neural Information Processing Systems (NeurIPS), 2021
  2. ACM MM
    An EM Framework for Online Incremental Learning of Semantic Segmentation
    Shipeng Yan*, Jiale Zhou*, Jiangwei Xie, Songyang Zhang, and Xuming He
    In Proceeding of The 29th ACM International Conference on Multimedia (ACM MM), 2021
  3. IJCAI
    Learning Implicit Temporal Alignment for Few-shot Video Classification
    Songyang Zhang*, Jiale Zhou*, and Xuming He
    In Proceeding of International Joint Conferences on Artificial Intelligence (IJCAI), 2021
  4. CVPR
    Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation
    Rongjie Li, Songyang Zhang, Bo Wan, and Xuming He
    In Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
  5. CVPR
    Distribution Alignment: A Unified Framework for Long-tail Visual Recognition
    Songyang Zhang, Zeming Li, Shipeng Yan, Xuming He, and Jian Sun
    In Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

2020

  1. InterSpeech
    Transformer with Bidirectional Decoder for Speech Recognition
    Xi Chen, Songyang Zhang, Dandan Song, Peng Ouyang, and Shouyi Yin
    In The Conference of the International Speech Communication Association (InterSpeech), 2020
  2. ECCV
    Part-aware Prototype Network for Few-shot Semantic Segmentation
    Yongfei Liu*, Xiangyi Zhang*, Songyang Zhang, and Xuming He
    In Proceeding of the European Conference on Computer Vision (ECCV), 2020

2019

  1. ICML
    LatentGNN: Learning Efficient Non-local Relations for Visual Recognition
    Songyang Zhang, Shipeng Yan, and Xuming He
    In Proceeding of the 36th International Conference on Machine Learning (ICML),, 2019
  2. AAAI
    A Dual Attention Network With Semantic Embedding for Few-shot Learning
    Shipeng Yan*, Songyang Zhang*, and Xuming He
    In Proceeding of Association for the Advancement of Artificial Intelligence (AAAI), 2019
  3. ICCV
    Dynamic Context Correspondence Network for Semantic Alignment
    Shuaiyi Huang, Qiuyue Wang, Songyang Zhang, and Xuming He
    In Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV), 2019

2017

  1. CVPR
    Predicting Salient Face in Multiple-face Videos
    Yufan Liu, Songyang Zhang, Mai Xu, and Xuming He
    In Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017