November 18, 2024
Details
November 21st
TMCB 1170 @ 11:00 AM
Talk Title
Toward Decision Making in the Real World: From Trustworthy to Actionable
Abstract
Reinforcement learning (RL) has achieved success in areas such as gaming, robotics, and language models, sparking curiosity about its applicability in the real world. When applying RL to real-world decision-making, challenges arise related to data and models. We will discuss the implications of these challenges on the feasibility of RL and share preliminary efforts to address them. These efforts include developing realistic simulators and bridging the gap between simulation and the real world through uncertainty quantification and the use of language models.
Biography
Hua Wei is an assistant professor at the School of Computing and Augmented Intelligence (SCAI) in Arizona State University (ASU). He got his PhD from Pennsylvania State University in 2020. He specializes in data mining, artificial intelligence and machine learning. He has been awarded the Best Paper at ECML-PKDD 2020, and his students and his own research work have been published in top conferences and journals in the fields of machine learning, artificial intelligence, data mining, and control (NeurlPS, AAAI, CVPR, KDD, IJCAI, ITSC, ECML-PKDD, WWW). His research has been funded by NSF, DoE and DoT.
November 07, 2024
When: November 14th @ 11am
Where: TMCB 1170
Talk Title: Predicting Liver Segmentation Model Failure with Feature-Based Out-of-Distribution Detection and Generative Adversarial Networks
Advanced liver cancer is often treated with radiotherapy, which requires precise liver segmentation. While deep learning models excel at segmentation, they struggle on image attributes not seen during training. To ensure quality care for all patients, my research focuses on automated, scalable, and interpretable solutions for detecting liver segmentation model failures. In this talk, I will first present accurate and scalable solutions that utilize model features extracted at inference. I will then introduce generative modeling for the localization of novel information, an approach that integrates interpretability into the detection pipeline.
October 31, 2024
When: November 7th @ 11am
Where: TMCB 1170
Talk Title: Reconstructing parental genomes and near perfect hphasing using data from millions of people
The advent of large genotyped cohorts from genetic testing companies and biobanks have opened the door to a host of analyses and implicitly include data for massive numbers of relatives. Genetic relatives share identity-by-descent (IBD) segments they inherited from common ancestors and several methods have been developed to reconstruct ancestors’ DNA from relatives. We present HAPI-RECAP, a tool that reconstructs the DNA of parents from full siblings and their relatives. Given data for one parent, phasing alone with HAPI2 reconstructs large fractions of the missing parent’s DNA, between 77.6% and 99.97% among all families, and 90.3% on average in three- and four-child families. When reconstructing both parents, HAPI- RECAP infers between 33.2% and 96.6% of the parents’ genotypes, averaging 70.6% in four-child families. Reconstructed genotypes have average error rates < 10−3, comparable to those from direct genotyping. Besides relatives, massive genetic studies enable precise haplotype inference. We benchmarked state-of the-art methods on > 8 million diverse, research-consented 23andMe, Inc. customers and the UK Biobank (UKB), finding that both perform exceptionally well. Beagle’s median switch error rate (after excluding single SNP switches) in white British trios from UKB is 0.026% compared to 0.00% for European ancestry 23andMe research participants; 55.6% of European ancestry 23andMe participants have zero non-single SNP switches, compared to 42.4% of white British trios in UKB. SHAPEIT and Beagle excel at ‘intra-chromosomal’ phasing, but lack the ability to phase across chromosomes, motivating us to develop an inter-chromosomal phasing method called HAPTIC (HAPlotype TIling and Clustering), that assigns paternal and maternal variants discretely genome-wide. Our approach uses IBD segments to phase blocks of variants on different chromosomes. We ran HAPTIC on 1022 UKB trio children, yielding a median phase error of 0.08% in regions covered by IBD segments (33.5% of sites) and on 23andMe trio children, finding a median phase error of 0.92% in Europeans (93.8% of sites) and 0.09% in admixed Africans (92.7% of sites). HAPTIC’s precision depends heavily on data from relatives, so will increase as datasets grow larger and more diverse. HAPTIC and HAPI-RECAP enable analyses that require the parent-of-origin of variants, such as association studies and ancestry inference of untyped parents.
October 28, 2024
When: October 31st @ 11am
Where: TMCB 1170
Talk title: Translation and Multilinguality in the Age of Large Language Models
Abstract: We currently witness a convergence in the field of natural language processing into a unifying framework built on foundational language models. This talk traces the development of these models out of the interplay of translation and language model research. Large language models have fundamentally changed how machine translation systems are currently being built. The talk will highlight where these technologies stand, what new capabilities language models enable for translation, and what constitutes best practices when building machine translation systems for deployment. The talk will also show how language models currently struggle to work well inlanguages beyond English and what can be done to address this.
October 24, 2024
Where: TMCB 1170
When: October 24, 2024, 11am
Come meet Nathaniel Bennett, BYU CS grad and current PhD student at University of Florida!
October 23, 2024
Where: TMCB 1170
When: October 24th @ 11am
Talk title: RANsacked: Domain-Informed Fuzzing to Secure LTE/5G Core Infrastructure
Abstract: Cellular network infrastructure serves as the backbone of modern mobile wireless communication. As such, cellular cores must be proactively secured against external threats to ensure reliable service. Compromised base station attacks against the core are a rising threat to cellular networks, while user device inputs have long been considered as an attack vector; despite this, few techniques exist to comprehensively test RAN-Core interfaces against malicious input. In this talk, we'll explore the technique of fuzz testing and its more recent applications to network-connected applications. We'll then focus specifically on the domain of cellular networks and highlight some of the challenges with fuzzing cellular infrastructure that has hampered the application of current fuzzing approaches. Our research, which devises a fuzzing framework that performantly fuzzes cellular interfaces accessible from a base station or user device, overcomes several of these challenges in fuzzing LTE/5G network components. We also find our efforts in the cellular domain yield cross-domain applications. For instance, we develop and release a tool (ASNfuzzgen) that compiles arbitrary ASN.1 specifications into structure-aware fuzzing modules, thereby facilitating effective fuzzing exploration of protocols across cellular, automotive, space and industrial control systems. We evaluate our approaches against seven open-source and commercial cores and discover 119 vulnerabilities, with 93 CVEs assigned. Our results reveal common implementation mistakes across several cores that lead to vulnerabilities, and the successful coordination of patches for these vulnerabilities across several vendors demonstrates the practical impact ASNfuzzgen has on hardening cellular deployments.
October 16, 2024
When: October 17th @ 11am
Where: TMCB 1170
Talk Title: Towards Interactive, Robust, and Aligned AI Systems
We are in an age of “AI everywhere, all at once.” As AI systems become more prevalent in daily life it is increasingly important that their behavior is aligned with human intent and that these AI systems do what we actually want them to do, despite the fact that human intent is often nuanced and hard to formally specify. In this talk I will discuss recent progress towards using human input to enable interactive, robust, and aligned AI systems with a focus on three main topics: (1) how to enable AI systems to estimate human intent, (2) how to make AI systems that are calibrated and robust to uncertainty over human intent, and (3) how robots and other AI systems can efficiently query for additional human input to actively reduce uncertainty and improve their performance.
October 03, 2024
When: October 10th @ 11am
Where: TMCB 1170
Talk Title: Finding the Courage to Build a Better World
Travis E. Oliphant (BS ’95, MS ’96) is a luminary in the Python and AI communities. He started the SciPy project in 1999 as a student at the Mayo Clinic and built NumPy in 2005 while a professor in Electrical and Computer Engineering at BYU. These libraries have been adopted throughout the scientific and business world. His work enabled Python to become the number one computer language in the world and the foundation for modern AI.
September 26, 2024
When: October 3rd @ 11am
Where: TMCB 1170
Talk Title: Adventures in intermittent power and faith
"The Internet of Things has a battery problem. We simply can’t afford to recharge, replace, and dispose of trillions of batteries. Batteryless computing offers hope of a more sustainable future with devices that can be deployed maintenance-free for decades, but they are difficult to design, program, test, and deploy, due to frequent and unpredictable power failures. This talk will explore lessons learned from two decades of research on intermittently-powered systems and the transformative power of uncertainty and hope in the journeys of disciples and scholars."
September 19, 2024
When: September 26th @ 11am
Where: TMCB 1170
Talk Title: Interaction with(in) Extended Reality: Tangible, Dynamic, Simulated
Extended Reality (XR), which is an umbrella term for Augmented Reality (AR), Virtual Reality (VR), and Mixed Reality (MR), is poised to introduce new ways of interacting with people, education, entertainment, and training by employing new technological solutions. While many applications are focused on individual, indoor, and visual-auditory interactions for work and home entertainment, interactions outside living rooms, multimodal aspects, and simulations have a long way to go. In this talk, I will focus on three main aspects of interacting with and within XR: (1) tangible, (2) dynamic, and (3) simulated. I will present works that enable haptic feedback, ranging from facilitating a feeling of being touched, drawing in VR, and crafting for children; interaction with XR while moving, e.g., in cars, bicycles, e-scooters, and walking, and when interacting with participants in XR spaces; and lastly, I will talk about simulated environments and how to increase their realism and thus ecological validity of controlled experiments.
September 18, 2024
When: September 19th @ 11am
Where: TMCB 1170
Talk Title: Technical interview questions, getting a job, and saving the planet!
We will do some practical training on algorithms and explore space and runtime efficiency. We'll briefly explore what to do to get a job including some resume tips and people tips. Finally, if none of that was interesting to you please come and learn how to write efficient code to reduce your code's impact on the environment! These may all seem unrelated, but come find out why they are one and the same.
September 09, 2024
When: September 12th @ 11am
Where: TMCB 1170
Talk Title: My Beautiful Odyssey as a Disciple-Scholar
This talk presents some of what I have learned about being a disciple-scholar during my years at BYU, first as a student and later as a professor.
August 12, 2024
July Alumni Spotlight
"Involve God at every step. Find mentors who believe in you even when you don’t believe in yourself. And it’s ok to want to quit a million times!" Alicia Wood
June 27, 2024
“It was still stressful, but it was also an amazing experience to be with the whole organization and to all be working together toward the common goal. And amazing to watch some of the best players in the world making plays all over the field.”
BYU Student Awarded 2023 World Series Ring
April 18, 2024
Tyler Stahle from BYU Communications reports, "This year’s animated short story was The Witch’s Cat, a heartwarming tale of a witch’s feline that grows jealous of the attention the witch is giving to her new boyfriend. The cat’s attempts to thwart the relationship leaves viewers laughing while anxiously waiting to see what happens next. The film was directed by BYU animation student Abby Staker and produced by Jessica Fink Blaine."
The Witch's Cat wins Student Emmy for BYU Animation this April, 2024.
April 08, 2024
When: April 11th @ 11am
Where: TMCB 1170
Talk Title: Kung Fu Panda 4: Building the World of Yin and Yang
As we enter the fourth installment in the Kung Fu Panda franchise, Po is faced with a new challenge of growth, his promotion to the spiritual leader of the Valley of Peace. We accompany Po as he visits both familiar and new locales where he meets new friends and a dangerous villain. But in order to become the spiritual leader, he must understand what it means to have the balance of Yin and Yang. This seminar will explore how the filmmakers kept balance at the forefront of their design and technology decisions; landscape and architecture reflecting opposite emotions, the use of real time game engines to build out and discover new worlds, the innovative new FX techniques for transforming animated characters, the various threads of the story - good vs evil, familiar vs new, heart vs action.
April 02, 2024
When: April 4th, 2024
Where: TMCB 1170
Talk Title: Language Models and Embeddings: Towards AI for Data Understanding
New tools of artificial intelligence and machine learning allow data scientists to move beyond charts and graphs when trying to harvest insight from data. In this talk, I will discuss how language models like ChatGPT can be used as a serious tool for data analysis, allowing researchers and scientists to perform a variety of analysis on unstructured text. Language models can also be combined with embeddings to provide powerful new ways of thinking about how to cluster and abstract data. Throughout the talk, I will discuss these tools in the context of my current research, involving applications in psychology and political science.
March 28, 2024
Where: TMCB 1170
When: March 28th @11am
Talk Title: Navigating Crisis: Leveraging Social Media for Disaster Management
The ubiquity of social media platforms has revolutionized the way information is disseminated during crises. From natural disasters to public health emergencies, individuals turn to platforms like Twitter (now X), Facebook, Reddit, Instagram, and others to share real-time updates, seek assistance, and provide situational awareness. This flood of user-generated content presents both opportunities and challenges for emergency management. This presentation draws upon a blend of case studies and empirical research, including insights gleaned from my own work, to explore ways in which social media data analytics, geospatial mapping, natural language processing, and machine learning techniques can be leveraged to extract actionable insights from the vast volume of online information. Furthermore, I will discuss recent research aimed at mitigating challenges related to misinformation, aggregating data from multimodal sources, and enhancing machine classifier training through sustainable context-sensitive labeling methods in the disaster setting. Through interdisciplinary collaboration and innovative socio-technological solutions, I will demonstrate how we can harness the power of digital platforms to save lives, mitigate suffering, and foster disaster resilience in an increasingly interconnected world.
March 08, 2024
Where: TMCB 1170
When: February 29th @11am
Talk Title: Multi-fidelity Learning and Active Learning for Scientific Machine Learning
Abstract: Multi-fidelity learning involves using training examples at different fidelities or resolutions. High-fidelity examples are of high-quality but often are much more costly to collect than inaccurate, low-fidelity examples. How to retrieve and leverage examples at multiple fidelities is the key to reduce the learning cost while maximizing the efficiency. This talk will introduce our recent work in multi-fidelity learning and active learning for scientific machine learning. Physical simulation is a central task in science and engineering domains. However, traditional numerical methods are known to be computationally costly and lack generalizability across different problems. Learning data-driven surrogate models presents a promising strategy for cost reduction. I will discuss three multi-fidelity, multi-resolution active learning approaches designed to dynamically acquire simulation examples and their fidelities, enhancing surrogate learning performance while minimizing data acquisition costs. I will demonstrate the efficacy of these methods through applications in standard benchmarks of physical simulation and topology structure optimization. Furthermore, I will introduce an infinite fidelity surrogate learning framework, which extends the traditional finite fidelity space to an infinite continuous space. This framework offers exciting possibilities for advancing scientific machine learning by enabling more comprehensive representations of complex systems and phenomena.
February 22, 2024
Where: TMCB 1170
When: February 29th @11am
Talk title: Ultra-Scale Intelligent Systems: A New BYU Research Area Investigating Intelligence at Scale
After seeing so much progress in AI over the last few years, it’s hard to understand why many enterprises still struggle to bring AI models into production. This research investigates the fundamental drivers of AI system development to identify roadblocks and potential solutions. To address these issues, the relatively new field of Machine Learning Operations, or MLOps, seeks to improve AI development across the entire lifecycle. These improvements promise a shorter time to market, lower costs, and higher value. Leveraging enhanced AI development processes, we explore the possibilities of scaling AI systems to support millions of models in production. At this scale, typical model metrics are no longer sufficient, and new analysis methods are needed. We present a framework for characterizing Ultra-Scale systems and discuss potential approaches for research and analysis. These Ultra-Scale systems could enable novel applications across diverse areas such as manufacturing, demand forecasting, social networks, IoT systems, traffic management, and swarm intelligence.
February 13, 2024
Where: TMCB 1170
When: February 15th @11am
Talk title: Bayesian Skill Estimation: Methods and Applications
Abstract: Actions in many real-world domains cannot be executed exactly. An agent’s performance in these domains is influenced by two critical factors: the ability to select effective actions (decision-making skill), and how precisely they can execute those selected actions (execution skill). For an AI to make effective action recommendations to a person, knowledge of their execution skill is required. This talk addresses the problem of estimating both the execution and decision-making skill of an agent, given observations. Several execution skill estimation methods will be presented, each of which utilize different information from the observations and make assumptions about the agent’s decision-making ability. A final novel method forgoes these assumptions about decision-making and instead estimates the execution and decision-making skills simultaneously under a single Bayesian framework. Experimental results in several domains evaluate the estimation accuracy of the estimators, especially focusing on how robust they are as agents and their decision-making methods are varied. These results demonstrate that reasoning about both types of skill together significantly improves the robustness and accuracy of execution skill estimation. A case study using the proposed methods to estimate the skill of Major League Baseball pitchers is presented, along with a complete AI system that utilizes this estimate to provide pitch recommendations. This case study demonstrates how these skill estimation methods can be applied to real-world data sources and provide effective action recommendations to humans.
February 05, 2024
Where: TMCB 1170
When: February 8th @ 11am
Talk title: Grand Challenges in HCI for Sports and Recreation
Abstract: Interactive computational devices, such as smartphones, motion trackers, and smartwatches, are integral to many human activities, including amateur sports and outdoor recreation. Sports and recreation are critical components of human wellness. However, the relationship between interactive computing and the wellness benefits of sports and recreation is not well understood. In this talk, I will present both research results and grand challenges related to the use of interactive computing in sports and recreation. Research results include an interview study on the use of motion data for figure skating coaching and a survey study on the use and non-use of headphones during hiking. Grand challenges involve understanding the athlete as a multi-faceted individual and designing for engagement in nature recreation.
January 26, 2024
Where: TMCB 1170
When: February 1st @ 11am
Talk title: Generative AI Programming Assistant
Abstract: Many are asking will Generative AI and Large Language Models (LLMs) put programmers out of business? Will it displace my research and make it irrelevant? This talk provides a fresh perspective on how Generative AI is helping with the 9 hardest things programmers have to do, and our crucial role as computing researchers in this new era. We first present upcoming services in leading IDEs that enable developers to use Generative AI to (i) explain code, bug fixes, summarize recent changes, (ii) generate documentation, commit messages, name suggestions, etc. A top concern remains the trustworthiness of the solutions provided by Generative AI. While many solutions resemble the ones produced by expert developers, LLMs are known to produce hallucinations, i.e., solutions that seem plausible at first, but are deeply flawed. To help software developers trust LLM solutions, we developed novel research that synergistically combines the creative potential of LLMs with the safety of static and dynamic analysis from program transformation systems. Our current results show that our approach is effective: it safely automates code changes and is 14x more effective than previous state of the art that relies solely on static analysis. Moreover, our approach produces results that expert developers trust: we submitted patches generated by our LLM-powered tools to famous open-source projects whose developers accepted most of our contributions. This shows the usefulness of our novel approach and ushers us into a new era when LLMs become effective AI assistants for developers. We hope to inspire you with ideas on how LLMs can help you and your research go even further.
January 23, 2024
Where: TMCB 1170
When: January 25th @ 11AM
Talk title: AI in the Real World
Abstract: AI in the digital domain (i.e, text and images) is progressing at a dizzying pace, in large part due to an abundance of data and an increasing reliance on a larger compute budget. However, AI in the physical domain significantly lags behind. In fields such as engineering, healthcare, robotics, or economics data tends to be scarce, difficult to label, or challenging to aggregate due to privacy concerns. Moreover, while digital AI systems can make mistakes without real-world consequences, an error made by a physical AI system can cause harm, death, or financial loss. In this talk, I discuss strategies for improving the performance and robustness of physical AI systems. I present techniques for solving difficult physical perceptual tasks, ideas for addressing the small data problem, and methods for constructing agents that can perform reliably when the cost of an error is high. Additionally, I discuss successful deployments of these systems across a variety of fields, and present ideas to make physical AI systems more robust in the future.
January 17, 2024
Where: TMCB 1170
When: January 18th @ 11AM
Title: Technology and Accounting for Vulnerable Populations
Abstract: This talk explores classes of individuals who are commonly vulnerable in context of technology use. I will discuss unique considerations that technologists must take into account in order to avoid inadvertently introducing harm to already vulnerable populations. Beyond avoiding harm, I will also discuss opportunities to do good and lift these individuals through technology design. I will also touch on common biases in reporting and analysis, best practices for studying these populations, and explore how to design technologies to be inclusive.
December 04, 2023
Talk title: Advancements in Sequence Alignment for CRISPR, Single-Cell Disease Simulation, and Beyond
Where: TMCB 1170 @ 11am
Application of computer science algorithms to the life sciences can substantially improve analysis and interpretation of biological data. This talk explores improvements in sequence alignment algorithms applied to the analysis of CRISPR genome editing—a cutting-edge DNA-editing technology that was recently approved in the UK as a permanent treatment for sickle cell disease. By modifying sequence alignment algorithms to reflect the biological mechanisms of CRISPR systems, we improved accuracy in identifying DNA sequence changes arising from CRISPR genome editing. These improvements are implemented in the software CRISPResso2, a widely-used tool for analysis of genome editing. We have also developed methods to analyze DNA sequences from single cells, and have used these tools to model disease initiation by introducing specific mutations into healthy cells. This talk will demonstrate how sequence alignment algorithms can be applied to cutting-edge biological technologies to improve analysis and interpretation, and ultimately affect human health.
November 10, 2023
Talk title: How Profilers Can Help Navigate Type Migration
Sound gradual types strengthen code with formal guarantees, but may require expensive run-time checks depending on how typed and untyped code interact. In this paper, we explore profile-guided strategies to discover gradual codebases that run as quickly as untyped code. One strategy rises to the top, but it succeeds in only 50% of all trials. Going forward, we need better profilers to measure type costs. Our experiment was made possible by the Rational Programmer method for automatically testing hypotheses about human programmers.
November 07, 2023
Brigham Young University, now a leading force in competitive programming, is thrilled to announce that our talented programming team has received a prestigious invitation to participate in the International Collegiate Programming Contest (ICPC) World Finals. This will mark the university’s third appearance at the World Finals, and the first in more than 20 years.
The ICPC World Finals are scheduled to take place in Luxor, Egypt on April 18, 2024.
November 02, 2023
Speaker: Paul Merrell
Where: TMCB 1170, 11am
Talk title: Procedural Modeling Using Graph Grammars
Details on this weeks Graduate Seminar Series:
September 25, 2023
Paco is Research Scientist Manager supporting translation teams in Meta AI (FAIR). He works in the field of machine translation with the aim to break language barriers. He joined Meta in 2016 and has co-led several initiatives (e.g. SeamlessM4T, NLLB , FLORES). His research has been published in top-tier NLP venues like ACL, EMNLP. He was the co-chair of the Research director at AMTA (2020-2022) and Ethics co-chair at EMNLP 2023. He has organized several research competitions focused on low-resource translation and data filtering. Paco obtained his PhD from the ITESM in Mexico, was a visiting scholar at the LTI-CMU from 2008-2009 and participated in DARPA’s GALE evaluation program. Paco was a post-doc and scientist at Qatar Computing Research Institute in Qatar in 2012-2016