Abstract
Human vision is highly adaptive, efficiently sampling intricate environments by sequentially fixating on task-relevant regions. In contrast, prevailing machine vision models passively process entire scenes at once, resulting in excessive resource demands scaling with spatial–temporal input resolution and model size, yielding critical limitations impeding both future advancements and real-world application. Here we introduce AdaptiveNN, a general framework aiming to enable the transition from ‘passive’ to ‘active and adaptive’ vision models. AdaptiveNN formulates visual perception as a coarse-to-fine sequential decision-making process, progressively identifying and attending to regions pertinent to the task, incrementally combining information across fixations and actively c…
Abstract
Human vision is highly adaptive, efficiently sampling intricate environments by sequentially fixating on task-relevant regions. In contrast, prevailing machine vision models passively process entire scenes at once, resulting in excessive resource demands scaling with spatial–temporal input resolution and model size, yielding critical limitations impeding both future advancements and real-world application. Here we introduce AdaptiveNN, a general framework aiming to enable the transition from ‘passive’ to ‘active and adaptive’ vision models. AdaptiveNN formulates visual perception as a coarse-to-fine sequential decision-making process, progressively identifying and attending to regions pertinent to the task, incrementally combining information across fixations and actively concluding observation when sufficient. We establish a theory integrating representation learning with self-rewarding reinforcement learning, enabling end-to-end training of the non-differentiable AdaptiveNN without additional supervision on fixation locations. We assess AdaptiveNN on 17 benchmarks spanning 9 tasks, including large-scale visual recognition, fine-grained discrimination, visual search, processing images from real driving and medical scenarios, language-driven embodied artificial intelligence and side-by-side comparisons with humans. AdaptiveNN achieves up to 28 times inference cost reduction without sacrificing accuracy, flexibly adapts to varying task demands and resource budgets without retraining, and provides enhanced interpretability via its fixation patterns, demonstrating a promising avenue towards efficient, flexible and interpretable computer vision. Furthermore, AdaptiveNN exhibits closely human-like perceptual behaviours in many cases, revealing its potential as a valuable tool for investigating visual cognition.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Additional access options:
Data availability
Most data used in this study are publicly available, including from ImageNet25 at https://www.image-net.org/, CUB-200-2011100 at https://www.vision.caltech.edu/datasets/cub_200_2011/, NABirds101 at https://dl.allaboutbirds.org/nabirds, Oxford-IIIT Pet102 at https://www.robots.ox.ac.uk/~vgg/data/pets/, Stanford Dogs103 at https://paperswithcode.com/dataset/stanford-dogs, Stanford Cars104 at https://paperswithcode.com/dataset/stanford-cars, FGVC-Aircraft[105](https://www.nature.com/articles/s42256-025-01130-7#ref-CR105 “Maji, S., Rahtu, E., Kannala, J., Blaschko, M. & Vedaldi, A. Fine-grained visual classification of aircraft. Preprint at https://arxiv.org/abs/1306.5151
(2013).“) at https://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/, STSD74 at https://www.cvl.isy.liu.se/research/datasets/traffic-signs-dataset/, MNIST[48](https://www.nature.com/articles/s42256-025-01130-7#ref-CR48 “LeCun, Y. The MNIST Database of Handwritten Digits (MNIST, 1998); http://yann.lecun.com/exdb/mnist/
“) at https://paperswithcode.com/dataset/mnist, RSNA pneumonia detection76 at https://www.rsna.org/rsnai/ai-image-challenge/rsna-pneumonia-detection-challenge-2018, CALVIN78 at https://github.com/mees/calvin, SALICON79 at http://salicon.net and MIT1003106 at https://saliency.tuebingen.ai/. A minimum dataset for our visual Turing tests is provided in Supplementary Figs. 12 and 13.
Code availability
Implementation code is available via GitHub at https://github.com/LeapLabTHU/AdaptiveNN (ref. [107](https://www.nature.com/articles/s42256-025-01130-7#ref-CR107 “Yue, Y. LeapLab: LeapLabTHU/AdaptiveNN: official release. Zenodo https://doi.org/10.5281/zenodo.16810996
(2025).“)).
References
Biederman, I. Perceiving real-world scenes. Science 177, 77–80 (1972).
Sperling, G. & Melchner, M. J. The attention operating characteristic: examples from visual search. Science 202, 315–318 (1978).
Sagi, D. & Julesz, B. ‘Where’ and ‘what’ in vision. Science 228, 1217–1219 (1985).
Moran, J. & Desimone, R. Selective attention gates visual processing in the extrastriate cortex. Science 229, 782–784 (1985).
Ölveczky, B. P., Baccus, S. A. & Meister, M. Segregation of object and background motion in the retina. Nature 423, 401–408 (2003).
Moore, T. & Armstrong, K. M. Selective gating of visual signals by microstimulation of frontal cortex. Nature 421, 370–373 (2003).
Najemnik, J. & Geisler, W. S. Optimal eye movement strategies in visual search. Nature 434, 387–391 (2005).
Carrasco, M. Visual attention: the past 25 years. Vis. Res. 51, 1484–1525 (2011).
Wolfe, J. M. & Horowitz, T. S. Five factors that guide attention in visual search. Nat. Hum. Behav. 1, 0058 (2017).
Alayrac, J.-B. et al. Flamingo: a visual language model for few-shot learning. In Proc. 36th International Conference on Neural Information Processing Systems 23716–23736 (ACM, 2022). 1.
OpenAI Gpt-4 Technical Report (OpenAI, 2023). 1.
Gemini Team Google Gemini: A Family of Highly Capable Multimodal Models Technical Report (Google, 2023). 1.
Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature 634, 466–473 (2024). 1.
Kaufmann, E. et al. Champion-level drone racing using deep reinforcement learning. Nature 620, 982–987 (2023).
Zitkovich, B. et al. RT-2: vision-language-action models transfer web knowledge to robotic control. In Proc. 7th Conference on Robot Learning (eds Jie, T. & Marc, T.) 2165–2183 (PMLR, 2023). 1.
O’Neill, A. et al. Open X-Embodiment: robotic learning datasets and RT-X models: Open X-Embodiment collaboration. In 2024 IEEE International Conference on Robotics and Automation 6892–6903 (IEEE, 2024). 1.
Gehrig, D. & Scaramuzza, D. Low-latency automotive vision with event cameras. Nature 629, 1034–1040 (2024).
Chen, A. I., Balter, M. L., Maguire, T. J. & Yarmush, M. L. Deep learning robotic guidance for autonomous vascular access. Nat. Mach. Intell. 2, 104–115 (2020).
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024). 1.
Wang, X. et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature 634, 970–978 (2024). 1.
Schäfer, R. et al. Overcoming data scarcity in biomedical imaging with a foundational multi-task model. Nat. Comput. Sci. 4, 495–509 (2024).
Lake, B. M., Salakhutdinov, R. & Tenenbaum, J. B. Human-level concept learning through probabilistic program induction. Science 350, 1332–1338 (2015).
Article MathSciNet Google Scholar 1.
Orhan, A. E. & Lake, B. M. Learning high-level visual representations from a child’s perspective without strong inductive biases. Nat. Mach. Intell. 6, 271–283 (2024). 1.
Vong, W. K., Wang, W., Orhan, A. E. & Lake, B. M. Grounded language acquisition through the eyes and ears of a single child. Science 383, 504–511 (2024).
Russakovsky, O. et al. Imagenet large scale visual recognition challenge. Int. J. Computer Vis. 115, 211–252 (2015).
Article MathSciNet Google Scholar 1.
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (eds Lourdes, A. et al.) 770–778 (IEEE, 2016). 1.
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (eds Jim, R. et al.) 4700–4708 (IEEE, 2017). 1.
Dosovitskiy, A. et al. An image is worth 16 × 16 words: transformers for image recognition at scale. In International Conference on Learning Representations (eds Katja, H. et al.) (ICLR, 2021). 1.
Dehghani, M. et al. Scaling vision transformers to 22 billion parameters. In Proc. 40th International Conference on Machine Learning 7480–7512 (PMLR, 2023). 1.
Zou, Z., Chen, K., Shi, Z., Guo, Y. & Ye, J. Object detection in 20 years: a survey. Proc. IEEE 111, 257–276 (2023).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Marina, M. & Tong, Z.) 8748–8763 (PMLR, 2021). 1.
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J. & Maier-Hein, K. H. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 18, 203–211 (2021).
Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022).
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Du, Z. et al. ShiDianNao: shifting vision processing closer to the sensor. In Proc. 42nd Annual International Symposium on Computer Architecture (ed. David, A.) 92–104 (ACM, 2015). 1.
Bai, J., Lian, S., Liu, Z., Wang, K. & Liu, D. Smart guiding glasses for visually impaired people in indoor environment. IEEE Trans. Consum. Electron. 63, 258–266 (2017).
Howard, A. G et al. MobileNets: efficient convolutional neural networks for mobile vision applications. Preprint at https://arxiv.org/abs/1704.04861 (2017). 1.
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. MobileNetV2: inverted residuals and linear bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds David, F. et al.) 4510–4520 (IEEE, 2018). 1.
Huang, G., Liu, S., Van der Maaten, L. & Weinberger, K. Q. CondenseNet: an efficient DenseNet using learned group convolutions. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds David, F. et al.) 2752–2761 (IEEE, 2018). 1.
Chen, J. & Ran, X. Deep learning with edge computing: a review. Proc. IEEE 107, 1655–1674 (2019).
Wang, X. et al. Convergence of edge computing and deep learning: a comprehensive survey. IEEE Commun. Surv. Tutor. 22, 869–904 (2020).
Murshed, M. S. et al. Machine learning at the network edge: a survey. ACM Comput. Surv. 54, 1–37 (2021).
Bourzac, K. Fixing AI’s energy crisis. Nature https://doi.org/10.1038/d41586-024-03408-z (2024). 1.
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
LeCun, Y. et al. Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems 396–404 (NeurIPS, 1989). 1.
Arbib, M. A. The Handbook of Brain Theory and Neural Networks (MIT, 1995). 1.
LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
LeCun, Y. The MNIST Database of Handwritten Digits (MNIST, 1998); http://yann.lecun.com/exdb/mnist/ 1.
Kaplan, J. et al. Scaling laws for neural language models. Preprint at https://arxiv.org/abs/2001.08361 (2020). 1.
Chen, Z. et al. Intern VL: scaling up vision foundation models and aligning for generic visual-linguistic tasks. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Zeynep, A. et al.) 24185–24198 (IEEE, 2024). 1.
Oquab, M. et al. DINOv2: learning robust visual features without supervision. Trans. Mach. Learn. Res. (2024). 1.
Ward, D. J. & MacKay, D. J. Fast hands-free writing by gaze direction. Nature 418, 838–838 (2002).
Ma, W. J., Navalpakkam, V., Beck, J. M., van den Berg, R. & Pouget, A. Behavior and neural basis of near-optimal visual search. Nat. Neurosci. 14, 783–790 (2011).
Henderson, J. M. & Hayes, T. R. Meaning-based guidance of attention in scenes as revealed by meaning maps. Nat. Hum. Behav. 1, 743–747 (2017).
Wolfe, J. M. & Horowitz, T. S. What attributes guide the deployment of visual attention and how do they do it? Nat. Rev. Neurosci. 5, 495–501 (2004).
Hanning, N. M., Fernández, A. & Carrasco, M. Dissociable roles of human frontal eye fields and early visual cortex in presaccadic attention. Nat. Commun. 14, 5381 (2023).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Mnih, V., Heess, N., Graves, A. & Kavukcuoglu, K. Recurrent models of visual attention. In Advances in Neural Information Processing Systems 2204–2212 (NeurIPS, 2014). 1.
Ba, J., Mnih, V. & Kavukcuoglu, K. Multiple object recognition with visual attention. In International Conference on Learning Representations (eds Brian, K. et al.) (ICLR, 2015). 1.
Yang, L. et al. Resolution adaptive networks for efficient inference. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Ce, L. et al.) 2369–2378 (IEEE, 2020). 1.
Zelinsky, G. J., Chen, Y., Ahn, S. & Adeli, H. Changing perspectives on goal-directed attention control: the past, present, and future of modeling fixations during visual search. Psychol. Learn. Motiv. 73, 231–286 (2020).
Wang, Y., Huang, R., Song, S., Huang, Z. & Huang, G. Not all images are worth 16 × 16 words: dynamic transformers for efficient image recognition. In Proc. 35th International Conference on Neural Information Processing Systems 11960–11973 (NeurIPS, 2021). 1.
Rao, Y. et al. DynamicViT: efficient vision transformers with dynamic token sparsification. In 35th Conference on Neural Information Processing Systems 13937–13949 (NeurIPS, 2021). 1.
Huang, G. et al. Glance and focus networks for dynamic visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45, 4605–4621 (2022).
Bolya, D. et al. Token merging: your ViT but faster. In International Conference on Learning Representations (eds Been, K. et al.) (ICLR, 2023). 1.
Gottlieb, J. & Oudeyer, P.-Y. Towards a neuroscience of active sampling and curiosity. Nat. Rev. Neurosci. 19, 758–770 (2018).
Navon, D. Forest before trees: the precedence of global features in visual perception. Cogn. Psychol. 9, 353–383 (1977).
Chen, L. Topological structure in visual perception. Science 218, 699–700 (1982).
Hochstein, S. & Ahissar, M. View from the top: hierarchies and reverse hierarchies in the visual system. Neuron 36, 791–804 (2002).
Ganel, T. & Goodale, M. A. Visual control of action but not perception requires analytical processing of object shape. Nature 426, 664–667 (2003).
Oliva, A. & Torralba, A. Building the gist of a scene: the role of global image features in recognition. Prog. Brain Res. 155, 23–36 (2006).
Peelen, M. V., Berlot, E. & de Lange, F. P. Predictive processing of scenes and objects. Nat. Rev. Psychol. 3, 13–26 (2024).
Touvron, H. et al. Training data-efficient image transformers & distillation through attention. In Proc. 38th International Conference on Machine Learning (eds Marina, M. & Tong, Z.) 10347–10357 (PMLR, 2021). 1.
Larsson, F. & Felsberg, M. Using Fourier descriptors and spatial models for traffic sign recognition. In Proc. Image Analysis: 17th Scandinavian Conference, SCIA 2011 (eds Heydn, A. e al.) 238–249 (Springer, 2011). 1.
Valliappan, N. et al. Accelerating eye movement research via accurate and affordable smartphone eye tracking. Nat. Commun. 11, 4553 (2020).
Shih, G. et al. Augmenting the national institutes of health chest radiograph dataset with expert annotations of possible pneumonia. Radiology: Artif. Intell. 1, 180041 (2019).
Li, X. et al. Vision-language foundation models as effective robot imitators. In International Conference on Learning Representations (eds Swarat, C. et al.) (ICLR, 2024). 1.
Mees, O., Hermann, L., Rosete-Beas, E. & Burgard, W. CALVIN: a benchmark for language-conditioned policy learning for long-horizon robot manipulation tasks. IEEE Robot. Autom. Lett. 7, 7327–7334 (2022).
Jiang, M., Huang, S., Duan, J. & Zhao, Q. SALICON: Saliency in Context. In IEEE Conference on Computer Vision and Pattern Recognition (eds Kristen G. et al.) 1072–1080 (IEEE, 2015). 1.
Itti, L. & Koch, C. Computational modelling of visual attention. Nat. Rev. Neurosci. 2, 194–203 (2001).
Henderson, J. M. Human gaze control during real-world scene perception. Trends Cogn. Sci. 7, 498–504 (2003).
Guo, D. et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature 645, 633–638 (2025).
Kellman, P. J. & Spelke, E. S. Perception of partly occluded objects in infancy. Cogn. Psychol. 15, 483–524 (1983).
Spelke, E. S., Breinlinger, K., Macomber, J. & Jacobson, K. Origins of knowledge. Psychol. Rev. 99, 605–632 (1992).
Spelke, E. Initial knowledge: six suggestions. Cognition 50, 431–445 (1994).
Viola Macchi, C., Turati, C. & Simion, F. Can a nonspecific bias toward top-heavy patterns explain newborns’ face preference? Psychol. Sci. 15, 379–383 (2004).
Simion, F., Di Giorgio, E., Leo, I. & Bardi, L. The processing of social stimuli in early infancy: from faces to biological motion perception. Prog. Brain Res. 189, 173–193 (2011).
Ullman, S., Harari, D. & Dorfman, N. From simple innate biases to complex visual concepts. Proc. Natl Acad. Sci. USA 109, 18215–18220 (2012).
Stahl, A. E. & Feigenson, L. Observing the unexpected enhances infants’ learning and exploration. Science 348, 91–94 (2015).
Reynolds, G. D. & Roth, K. C. The development of attentional biases for faces in infancy: a developmental systems perspective. Front. Psychol. 9, 315789 (2018).
Bambach, S., Crandall, D., Smith, L. & Yu, C. Toddler-inspired visual object learning. In Proc. 32nd International Conference on Neural Information Processing Systems 1209–1218 (ACM, 2018). 1.
Orhan, E., Gupta, V. & Lake, B. M. Self-supervised learning through the eyes of a child. In 34th Conference on Neural Information Processing Systems 9960–9971 (NeurIPS, 2020). 1.
Schulman, J., Moritz, P., Levine, S., Jordan, M. & Abbeel, P. High-dimensional continuous control using generalized advantage estimation. In International Conference on Learning Representations (eds Hugo, L. et al.) (ICLR, 2016). 1.
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. Proximal policy optimization algorithms. Preprint at https://arxiv.org/abs/1707.06347 (2017). 1.
Sutton, R. S., McAllester, D., Singh, S. & Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. In Proc. 13th International Conference on Neural Information Processing Systems 1057–1063 (ACM, 1999). 1.
Silver, D. et al. Mastering the game of go with deep neural networks and tree search. Nature 529, 484–489 (2016).
Wang, L. et al. Incorporating neuro-inspired adaptability for continual learning in artificial intelligence. Nat. Mach. Intell. 5, 1356–1368 (2023).
Miller, G. A. WordNet: a lexical database for English. Commun. ACM 38, 39–41 (1995).
Wah, C., Branson, S., Welinder, P., Perona, P. & Belongie, S. The Caltech-UCSD Birds-200-2011 Dataset (Caltech, 2011). 1.
Van Horn, G. et al. Building a bird recognition app and large scale dataset with citizen scientists: the fine print in fine-grained dataset collection. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (eds Kristen, G. et al.) 595–604 (IEEE, 2015). 1.
Parkhi, O. M., Vedaldi, A., Zisserman, A. & Jawahar, C. Cats and dogs. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (eds Serge, B. et al.) 3498–3505 (IEEE, 2012). 1.
Khosla, A., Jayadevaprakash, N., Yao, B. & Li, F.-F. Novel dataset for fine-grained image categorization: Stanford Dogs. In Proc. CVPR Workshop on Fine-grained Visual Categorization (FGVC) 2 (2011). 1.
Krause, J., Stark, M., Deng, J. & Fei-Fei, L. 3D object representations for fine-grained categorization. In 2013 IEEE International Conference on Computer Vision Workshops (eds Kyros, K. et al.) 554–561 (IEEE, 2013). 1.
Maji, S., Rahtu, E., Kannala, J., Blaschko, M. & Vedaldi, A. Fine-grained visual classification of aircraft. Preprint at https://arxiv.org/abs/1306.5151 (2013). 1.
Judd, T., Ehinger, K., Durand, F. & Torralba, A. Learning to predict where humans look. In 2009 IEEE 12th International Conference on Computer Vision (eds Roberto, C. et al.) 2106–2113 (IEEE, 2009). 1.
Yue, Y. LeapLab: LeapLabTHU/AdaptiveNN: official release. Zenodo https://doi.org/10.5281/zenodo.16810996 (2025). 1.
Caesar, H. et al. nuScenes: a multimodal dataset for autonomous driving. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition 11618–11628 (IEEE, 2020). 1.
NIH chest X-ray dataset. Kaggle www.kaggle.com/datasets/nih-chest-xrays/data (2017). 1.
Indian diabetic retinopathy image dataset (IDRiD). Kaggle www.kaggle.com/datasets/mohamedabdalkader/indian-diabetic-retinopathy-image-dataset-idrid (2018). 1.
COCO 2017 dataset. Kaggle www.kaggle.com/datasets/awsaf49/coco-2017-dataset (2017). 1.
CUB2002011 dataset. Kaggle www.kaggle.com/datasets/wenewone/cub2002011 (2011). 1.
ImageNet-1k-valid dataset. Kaggle www.kaggle.com/datasets/sautkin/imagenet1kvalid (2015). 1.
The Oxford-IIIT pet dataset. Kaggle www.kaggle.com/datasets/tanlikesmath/the-oxfordiiit-pet-dataset (2012). 1.
Stanford cars (folder, crop, segment) dataset. Kaggle www.kaggle.com/datasets/senemanu/stanfordcarsfcs (2013). 1.
FGVC aircraft dataset. Kaggle www.kaggle.com/datasets/seryouxblaster764/fgvc-aircraft (2013). 1.
Awadalla, A. et al. OpenFlamingo: an open-source framework for training large autoregressive vision-language models. Preprint at https://arxiv.org/abs/2308.01390 (2023).
Acknowledgements
G.H. is supported by the National Key R&D Program of China under grant no. 2024YFB4708200, the National Natural Science Foundation of China under grant nos. U24B20173 and 62276150, and the Scientific Research Innovation Capability Support Project for Young Faculty under grant no. ZYGXQNJSKYCXNLZCXM-I20. S.S. is supported by the National Natural Science Foundation of China under grant no. 42327901. We thank S. Zhang, M. Yao and Y. Wu for helpful discussions and comments on an earlier version of this paper.
Author information
Author notes
These authors contributed equally: Yulin Wang, Yang Yue, Yang Yue.
Authors and Affiliations
Department of Automation, Tsinghua University, Beijing, China
Yulin Wang (王语霖), Yang Yue (乐洋), Yang Yue (乐阳), Huanqian Wang, Haojun Jiang, Yizeng Han, Zanlin Ni, Yifan Pu, Minglei Shi, Rui Lu, Qisen Yang, Andrew Zhao, Zhuofan Xia, Shiji Song & Gao Huang
Authors
- Yulin Wang (王语霖)
- Yang Yue (乐洋)
- Yang Yue (乐阳)
- Huanqian Wang
- Haojun Jiang
- Yizeng Han
- Zanlin Ni
- Yifan Pu
- Minglei Shi
- Rui Lu
- Qisen Yang
- Andre