Master Thesis: Interaction-Derived Knowledge Bases for Self-Improving UI Agents
Organization: askui GmbH
Location: Karlsruhe, Baden-Württemberg, Germany
Your ProfileCurrently pursuing a Master’s degree in Computer Science or a related field with strong academic performanceexcellent programming skills in Pythonproven hands-on experience working with Large Language Models (LLMs) and popular LLM frameworks and tooling (e.g., LangChain, Hugging Face, PyTorch, …)profound experience designing or working with agentic AI systemsability to work independently, think analytically, and structure complex problemsvery good English skills (written and spoken)based in GermanyBonus points:Prior experience with Retrieval-Augmented Generation (RAG) or other memory-augmented LLM architecturesexperience with experimental evaluation or applied research in AI / MLfirst experience with writing or publishing scientific papersGerman language skills are helpful, but not requiredOur Offerclose collaboration and supportive mentoring with fast feedback cycles, while giving you the freedom to explore your own ideas and drive your work independentlywork with a team that combines strong research expertise with a hands-on, product-driven mindset on bleeding edge AI systems that are used by companies worldwideopportunity to have a high and lasting impact on product, company and peopleChoose how you want to work: Fully remote, in our office in Karlsruhe or hybridFlexible working hoursA place where you are valued not only for you expertise but also as a human as part of a team where most consider each other not only to be a colleagues but friendsYour RoleBackground and Goal of the ThesisWhen humans interact with unfamiliar or poorly designed user interfaces, especially in enterprise and legacy systems, they rarely succeed to accomplish a task immediately. Instead, they explore the interface, try different paths, make mistakes. During this process, they gradually build an understanding of where relevant functionality is located and how tasks can be completed efficiently. Hence, the next time they need to execute the same task again, humans navigate the user interface (UI) more directly, and avoid previous errors. Even when performing entirely different tasks later on, humans can reuse knowledge acquired “along the way”: for example, they may remember having seen a certain function buried in a menu during earlier exploration and later recall exactly where to find it when it becomes relevant.While there are initial efforts to enable such behavior for LLM-based agents that operate UIs (UI Agents) [1-7], yet they lack the ability to systematically accumulate and reuse knowledge gained from previous interactions and hence improve over time. This leads to two key problems. First, even when an agent succeeds in completing a task, it is likely to repeat the same inefficient actions and errors the next time it is asked to perform the same or a similar task. And second, in scenarios where the UI Agent fails to complete a task entirely, it will not be able to unblock itself by learning how to complete it over time. This limitation becomes particularly evident when UI Agents need to operate enterprise and legacy UIs that do not follow modern design guidelines and might differ significantly from the interfaces the agent was exposed to during training time.The goal of this thesis is to enable UI Agents to learn from their own interactions by designing a continuously updated knowledge base. The agent should store and refine knowledge about UI structures, action outcomes, navigation paths, and discovered functionality, and leverage this knowledge in future interactions. This includes remembering efficient workflows for recurring tasks as well as incidental observations that may become relevant in different contexts later on.To these means, this thesis should address the following three research questions:RQ1: How can interaction-derived knowledge be represented, updated, and retrieved efficiently during agent execution?RQ2: What interaction-derived knowledge is useful for long-term improvement of UI-operating agents?RQ3: How can additional knowledge, e.g. from user manuals, be induced into such knowledge bases?This work will contribute to the development of smarter and more powerful UI agents that improve through use and are suitable for long-lived, real-world deployment.[1] Xie, B., Shao, R., Chen, G., Zhou, K., Li, Y., Liu, J., ... & Nie, L. (2025). Gui-explorer: Autonomous exploration and mining of transition-aware knowledge for gui agent. arXiv preprint arXiv:2505.16827.[2] Ma, S., Xiao, X., & Ye, Y. (2025). Agent+ P: Guiding UI Agents via Symbolic Planning. arXiv preprint arXiv:2510.06042.[3] Wen, H., Li, Y., Liu, G., Zhao, S., Yu, T., Li, T. J. J., ... & Liu, Y. (2024, May). Autodroid: Llm-powered task automation in android. In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking (pp. 543-557).[4] Lee, S., Choi, J., Lee, J., Wasi, M. H., Choi, H., Ko, S. Y., ... & Shin, I. (2023). Explore, select, derive, and recall: Augmenting llm with human-like memory for mobile task automation. arXiv preprint arXiv:2312.03003.[5] Guan, Z., Li, J. C. L., Hou, Z., Zhang, P., Xu, D., Zhao, Y., ... & Wong, N. (2025, November). KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (pp. 5396-5405).[6] Wang, Z. Z., Mao, J., Fried, D., & Neubig, G. (2024). Agent workflow memory. arXiv preprint arXiv:2409.07429.[7] Zheng, L., Wang, R., Wang, X., & An, B. (2023). Synapse: Trajectory-as-exemplar prompting with memory for computer control. arXiv preprint arXiv:2306.07863.Your Roleconduct a comprehensive literature review on existing work and state-of-the-art methodsanalyze existing methods, systems, and tools related to the problem domaindesign and explore novel approaches to the problemdevelop and implement prototypes or experimental systems to validate proposed ideasdefine and conduct experiments and evaluations to assess performance, robustness, and limitationscontinuously present your work in team presentations and written reportscollaborate closely with researchers and engineers to align research outcomes with real-world applicationsApply Now!Join our mission!