KEYNOTE SPEAKER I IN IPMV 2025

    IEEE Fellow

    Prof. James Tin-Yau Kwok, The Hong Kong University of Science and Technology, Hong Kong, China

 

BIO: James Kwok is a Professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. He is an IEEE Fellow. He has served / is serving as an Associate Editor for the IEEE Transactions on Neural Networks and Learning Systems, Neural Networks, Neurocomputing, Artificial Intelligence Journal, International Journal of Data Science and Analytics, and on the Editorial Board of Machine Learning. He is also serving as Senior Area Chairs of major machine learning / AI conferences including NeurIPS, ICML, ICLR, IJCAI. He is on the IJCAI Board of Trustees. He is recognized as the Most Influential Scholar Award Honorable Mention for "outstanding and vibrant contributions to the field of AAAI/IJCAI between 2009 and 2019". Prof Kwok is the IJCAI-2025 Program Chair.

Speech Title: Large Language Models: Pre-Training, Fine-Tuning and Application

Abstract: Large language models (LLMs) are now widely used. However, several challenges remain in pre-training and fine-tuning. First, during unsupervised pre-training, semantically irrelevant information can negatively impact downstream tasks, leading to negative transfer. Second, multiple models with various hyperparameter configurations are often created during fine-tuning, but typically only one of these models is utilized in the downstream task. To address the first issue, we introduce a new pre-training method that trains each expert using only semantically relevant data through cluster-conditional gates. This approach allows for the allocation of downstream tasks to customized models pre-trained on data most similar to the downstream data. To tackle the second issue, we consider the learned soup, which combines all fine-tuned models with learned weighting coefficients. While this can significantly enhance performance, it is also computationally expensive. We propose to mitigate this issue by formulating the learned soup as a hyperplane optimization problem and employing block coordinate gradient descent to learn the mixing coefficients. At each iteration, this approach only requires loading a few fine-tuned models and building a computational graph with one combined model. Experimental results show that this can then be run on a single GPU while significantly reducing memory usage. Finally, we present an application of LLMs for knowledge graph completion. We utilize an in-context learning strategy to guide the LLM. Empirical results demonstrate its effectiveness, achieving improved performance with no additional training required.

 

KEYNOTE SPEAKER II IN IPMV 2025

 

Fellow of IEEE

Prof. Junsong Yuan, State University of New York at Buffalo, USA

 

BIO: Dr. Junsong Yuan is Professor and Director of Visual Computing Lab at Department of Computer Science and Engineering, State University of New York (SUNY) at Buffalo, USA. Before joining SUNY Buffalo, he was Associate Professor at Nanyang Technological University (NTU), Singapore. He obtained his Ph.D. from Northwestern University, M.Eng. from National University of Singapore, and B.Eng. from Huazhong University of Science Technology. He is a recipient of SONY Faculty Innovation Award (2024), SUNY Chancellor's Award for Excellence in Scholarship and Creative (2022), IEEE Trans. on Multimedia Best Paper (2016), Northwestern Outstanding EECS Ph.D. Thesis (2010), and Nanyang Assistant Professorship (2009). He serves as Editor-in-Chief of Journal of Visual Communication and Image Representation (JVCI), Associate Editor of IEEE Trans. on Pattern Analysis and Machine Intelligence (T-PAMI) and IEEE Trans. on Image Processing (T-IP). He also serves as General/Program Co-chair of ICME and Area Chair for CVPR, ICCV, ECCV, NeurIPS, ACM MM, etc. He is a Fellow of IEEE (2021) and IAPR (2018).

Speech Title: Intelligent Hand Sensing and Augmented Interaction

Abstract: Humans are the most intelligent beings on the planet not only because of our powerful brain but also due to the unique structure of our hands. Hands have been crucial tools for us to interact and change both the physical world and the virtual world such as metaverse. In this talk, we will discuss real-time hand sensing using optical cameras, and how it can enhance our interactions with physical world and metaverse. Towards 3D hand sensing from single 2D images, we will discuss how to leverage synthetic hand data to address high-dimensional regression problem of articulated hand pose estimation and 3D hand shape reconstruction. To improve the generalization ability of handling hands of various shapes and poses, we will also discuss invariant hand representation through disentanglement. The resulting systems can facilitate intelligent interactions in virtual and real environments using bare hands, as well as via hand object interactions.

 

KEYNOTE SPEAKER III IN IPMV 2025

 

Prof. Chi Man Pun, University of Macau, Macau, China

 

BIO: Prof. Pun received his Ph.D. degree in Computer Science and Engineering from the Chinese University of Hong Kong in 2002, and his M.Sc. and B.Sc. degrees from the University of Macau. He had served as the Head of the Department of Computer and Information Science, University of Macau from 2014 to 2019, where he is currently a Professor and in charge of the Image Processing and Pattern Recognition Laboratory. He has investigated many externally funded research Projects as PI, and has authored/co-authored more than 200 refereed papers in many top-tier Journals (including T-PAMI, T-IFS, T-IP, T-DSC, T-KDE, and T-MM) and Conferences (including CVPR, ICCV, ECCV, AAAI, ICDE, IJCAI, MM, and VR). He has also co-invented several China/US Patents, and is the recipient of the Macao Science and Technology Award 2014 and the Best Paper Award in the 6th Chinese Conference on Pattern Recognition and Computer Vision (PRCV2023). Dr. Pun has served as the General Chair for the 10th &11th International Conference Computer Graphics, Imaging and Visualization (CGIV2013, CGIV2014), the 13th IEEE International Conference on e-Business Engineering (ICEBE2016), and the General Co-Chair for the IEEE International Conference on Visual Communications and Image Processing (VCIP2020) and the International Workshop on Advanced Image Technology (IWAIT2022), and the Program/Local Chair for several other international conferences. He has also served as the SPC/PC member for many top CS conferences such as AAAI, CVPR, ICCV, ECCV, MM, etc. He is currently serving as the editorial board member for the journal of Artificial Intelligence (AIJ). Besides, he has been listed in the World's Top 2% Scientists by Stanford University since 2020. His research interests include Image Processing and Pattern Recognition; Multimedia Information Security, Forensic and Privacy; Adversarial Machine Learning and AI Security, etc. He is also a senior member of the IEEE.

Speech Title: Image Manipulation Localization with Deep Neural Networks

Abstract: Creating fake pictures has become more accessible than ever, but tampered images are more harmful because the Internet propagates misleading information so rapidly. Reliable digital forensic tools are, therefore, strongly needed. Traditional methods based on hand-crafted features are only useful when tampered images meet specific requirements, and the low detection accuracy prevents them from being used in realistic scenes. Recently proposed learning-based methods have improved accuracy, but neural networks usually require training on large labeled databases. This is because commonly used deep and narrow neural networks extract high-level visual features and neglect low-level features where there are abundant forensic cues. In this talk, we will discuss some solutions to this problem. Two novel image splicing localization methods are proposed using deep neural networks, which mainly concentrate on learning low-level forensic features and consequently can detect splicing forgery, although the network is trained on a small automatically generated splicing dataset.

 

INVITED SPEAKER I IN IPMV 2025

 

Dr. Muhammad Asif Khan, Qatar University, Qatar

 

BIO:Muhammad Asif Khan is a Research Scientist at Qatar Mobility Innovations Center (QMIC), Doha, Qatar. He was a postdoctoral research fellow at Qatar University. He received a Ph.D. degree in electrical engineering from Qatar University (2020), an M.Sc. degree in telecommunication engineering from the University of Engineering and Technology Taxila, Pakistan (2013), and a B.Sc. degree in telecommunication engineering from the University of Engineering and Technology Peshawar, Pakistan (2009). He is the recipient of the Postdoctoral Research Award (PDRA) from the Qatar National Research Fund (QNRF) in 2022. He has published over 50 peer-reviewed articles and book chapters. He is a senior member of IEEE, a member of IET, and a Chartered Engineer (CEng) with the Engineering Council (UK). Dr. Khan also serves as an Associate Editor of IEEE Transactions on Consumer Electronics (TCE), IEEE Transactions on Technology and Society (TTS), and IEEE Future Directions Technology Policy and Ethics Newsletter.

Speech Title: Real-time Crowd Counting at the Edge

Abstract: Recent research on crowd counting shows the efficacy of deep learning methods such as CNNs due to their strong capability of auto-feature extraction. To achieve higher accuracy in dense scenes, deeper models with a large number of parameters are developed. However, although achieving good accuracy, deeper models create performance bottlenecks in real-time applications due to large memory requirements, higher training complexity, and large inference delay. On the contrary, shallow models are lightweight, incur low inference delay, and require low memory, but are usually disregarded by some due to their limited accuracy in dense crowd scenes. This talk will discuss how to design efficient and more robust lightweight models for crowd-counting applications at the edge providing useful insights for prospective researchers and DL practitioners. This talk provides a retrospective overview of the mainstream research works and new research directions to design, build, train, and deploy crowd-counting models at edge devices for real-world scenarios. The talk will cover novel techniques such as annotation techniques, density estimation, knowledge distillation, curriculum learning, dataset pruning, importance scoring, and sample ranking.