Dr. Mirman's Accelerometer
Welcome to "The Accelerometer," a cutting-edge podcast at the intersection of technology and artificial intelligence, hosted by Dr. Matthew Mirman. Armed with a Ph.D. in AI safety from ETH, Dr. Mirman embarked on a unique journey, to join the accelerationist dark side and found his YC funded company, Anarchy. In each episode, Dr. Mirman engages with brilliant minds and pioneers in the YC, ETH, and AI spheres, exploring thought-provoking discussions that transcend the boundaries of traditional AI safety discourse. "The Accelerometer" offers listeners a front-row seat to the evolving landscape of technology, where Dr. Mirman's insights and connections with intriguing individuals promise to unravel the complexities of our rapidly advancing digital age. Join us as we navigate the future of AI with The Accelerometer.
Dr. Mirman's Accelerometer
Solving AI Uncertainty with NYU Professor Andrew Wilson
Unlock the mysteries of AI's decision-making process with Professor Andrew Wilson of NYU in a discussion on Bayesian deep learning. We dive into how AI grapples with uncertainty, differentiating between the types we can predict and those we can't, and why this matters for crafting dependable AI forecasts. Professor Wilson, a leading mind in machine learning, demystifies the complexities of epistemic and aleatoric uncertainties, relates the critical role of invariances like translation or rotation for image classification, and delves into the intricacies of teaching AI to understand these concepts without prior knowledge. Plus, stay tuned for an eye-opening exploration of deep generative models, such as variational autoencoders and diffusion models, and a fresh take on the curious phenomena of neural networks' overparameterization and overfitting. We further venture into the realm of Gaussian processes, where Professor Wilson's trailblazing research is making waves from drug discovery to astrophysics. Learn how neural networks are being harnessed to crack complex equations in general relativity, offering a glimpse into a future where AI might solve problems unfathomable to the human mind.
Andrew Wilson Google Scholar
Episode On Youtube
Accelerometer Podcast
Accelerometer Youtube
Anarchy
Anarchy Discord
Anarchy LLM-VM
Anarchy Twitter
Anarchy LinkedIn
Matthew Mirman LinkedIn
Hello and welcome to the accelerometer. Today we have Professor Andrew Wilson from NYU, who's research on Bayesian deep learning I personally find particularly exciting. I'd love to hear a little bit more about your research.
Speaker 2:It's really great to be here. Thanks for reaching out. So I'm a professor at the Quran Institute of Mathematical Sciences and Center for Data Science at NYU.
Speaker 2:Part of my research involves Bayesian deep learning, which revolves around having principal representations of uncertainty in our models. Those who are familiar kind of loosely with AI probably have become aware of phenomena like hallucination in large language models and maybe these models not telling us things that are correct, but rather saying things that seem to be plausible in some sense. And uncertainty modeling is kind of very important for understanding. For example, you know with what probability do we believe, you know what a model is saying is actually correct, and of course that's crucial for decision making. It's it's not really actionable to have a prediction in isolation, and so I mean one could argue that without uncertainties you can't reasonably make decisions with AI and machine learning, and so Bayesian methods represent a particular type of uncertainty called epistemic uncertainty, which is uncertainty over which solution to your problem might be correct. Give it a limited, limited information. There's also a leotoric uncertainty, which is associated, say, with noise and your data measurements.
Speaker 2:I'm also very broadly interested in the foundations of building systems which can learn and make decisions without intervention, and so this involves considering certain transformations that we might want our models to be invariant or equivariant to. So if you're doing image classification, you might want your model to be translation or rotation invariant. In sciences and physical modeling problems, there are all sorts of other variances that correspond to conservation laws. I'm also interested in how do we learn these invariances if we don't know them a priori. And really, how should we think about model construction towards good generalization? Quite broadly, although the field has really made a lot of empirical progress, there isn't much consensus on how we should approach model construction in general and maybe a lot of general understanding around the extent to which we can have, say, a general purpose, intelligence, and so in this sense, my work is actually very broad. So we look at uncertainty, representation, decision making, equivariances, optimization, how do you effectively train these models, as well as a number of other questions.
Speaker 1:So in the case of using Bayesian optimization for deep learning, I've seen a lot of work in the past. I've read a lot of work, for example, the original autoencoder papers that got a lot of coverage. Do you think the Bayesian techniques there were really what led to the success of these works?
Speaker 2:That's a great question. So, as you mentioned, variational autoencoders are one of kind of the seminal deep generative models. You know VAEs, GANs and now diffusion models, and VAEs were very inspired by Bayesian ways of thinking. Diffusion models, which are probably the latest and greatest deep generative models, also in some sense had some Bayesian inspiration. If you look at the first paper that introduced diffusion models, they were thinking about Bayesian marginal likelihoods and so on. So I think they did provide some inspiration in thinking about model construction. At this point. I think those models don't necessarily hinge on deploying Bayesian principles in order to achieve their goals, but I think that the people who proposed those models certainly were influenced by a Bayesian way of thinking and even though I would say probably 80% of my projects at this point aren't explicitly Bayesian, they are influenced by Bayesian ways of thinking about various problems.
Speaker 1:So you mentioned a little bit before that you were also working on ways to, I guess, systematize the construction of neural networks, but there weren't, there was not consensus about that at the moment. Do you think that's even possible to obtain?
Speaker 2:I don't think it's easy. This was something I noticed, actually, when I started my PhD. So people in different communities would have very different ideas about how you should approach model construction in machine learning and artificial intelligence. So if you talk to econometricians, they'll always emphasize parsimony building models with a very small number of parameters, that are very interpretable, that won't overfit, that have very strong what are called inductive biases assumptions that allow you to do inductive reasoning Whereas in the machine learning community at that time, and now as well, there seemed to be more of a focus on building a model as big as a house and throwing a lot of data at it and achieving good generalization as a consequence. And so I would say, in the first year of my PhD, I spent a lot of time reflecting on how to reconcile these different perspectives and ultimately did conclude that they weren't really contradicting each other. They were just different pieces of a higher dimensional picture in some sense. And now I believe that you should always embrace flexibility, but at the same time, you want fairly strong inductive biases to be able to efficiently learn and to be able to generalize in maybe a variety of settings, and I think you know LLMs are a good example of this, large language models. Like we had a paper out about a year ago trying to understand why LLMs are so good at general purpose problem solving. So LLMs combine flexibility, so the ability to represent many, many different possible solutions to a given problem, with a very powerful simplicity bias, as measured by something called Comagrov complexity, and so it will try to find essentially kind of the most compressible solution to a given problem that is consistent with what we observe In terms of consensus.
Speaker 2:I mean, there are a number of phenomena in deep learning that are often presented as mysterious. So like the success of what's called overparameterization, if you have models with more parameters and data points, something called benign overfitting, the observation that neural nets and in fact other models can fit, say, images with random labels perfectly, with no training loss. But they also generalize on structure problems, which shows that they're capable of horribly overfitting on problems. But they don't seem to when those problems have structure. Things like double descent where, like your generalization, performance improves as you increase model flexibility for a time, then it gets worse, corresponding to overfitting, and then it gets better again. These are all presented as mysterious.
Speaker 2:I think they're already kind of immediately understandable, For example through the lens of probability theory and building models that are expressive but also have strong biases towards simple solutions.
Speaker 2:So we already are kind of using models that are more flexible than any neural network that you could ever fit in a computer and finding we achieve very good generalization with those models, and I would say, especially if we're thinking about, say, gaussian processes on problems with a relatively small number of data points. So in a sense this is already kind of demonstrated. But I would say there are, you know, phenomena like these that are presented as mysteries, and even though I think there are relatively good explanations for these phenomena, it has been surprisingly hard to kind of bring the community on board towards kind of basically viewing these questions as at least largely resolved. I think there probably are a few open sub questions and so on, but I think you know, sometimes it is very difficult to get the community to even kind of acknowledge progress, and so I think that's the challenge really in reaching consensus In this example of GPs, where we have infinite parameters, is there a reason that we're not using those currently for our language completion systems?
Speaker 2:I think that you know there are some methodological challenges towards scaling these models, although there's also been a huge amount of progress. I think you also in many cases want to learn the kernel, so this involves doing what's sometimes called representation learning, so you can actually view training a neural network in some sense as learning a similarity metric for the data, so like an embedding, which is a good description of whether it's a two images are similar or different, whereas with kernel methods, including Gaussian processes, this is something that you usually fix almost entirely a priori, and so popular kernels correspond to notions of similarity like Euclidean distance between vectors of pixel intensities, which is a horrible kind of notion of distance for images. Like if you're trying to classify a two and then you kind of just shift it on the screen to the right, that'll be a very different vector of pixel intensities as measured by Euclidean distance, but essentially the same image that you're trying to classify same goes with, you know rotations, or if you're looking at a picture of a car and then you turn off the lights, etc. And so models like convolutional neural nets have certain biases baked into them to try to provide invariances to these types of transformations we might expect, but they also have the ability to learn a lot of the similarity between the things that they're trying to model, whereas in a kernel method, in order to learn that, you have to do kernel learning, and that's possible, but it's not always trivial.
Speaker 2:And so I think that, in the end, gaussian processes are not really competing with neural nets. They're really complimentary and they can be combined, sometimes to to great effect. So I have some work called deep kernel learning which is trying to do that, and so it allows you to have some representation of epistemic uncertainty, which is something I think, in principle, you always want to have. It does have non parametric flexibility, so essentially it corresponds to a model with an infinite number of parameters, but it can also do representation learning helped by by a neural network, and so this is a way that you can basically, you know, have a GP wherever you would apply a neural net. The challenge then might be scalability, but actually a huge amount of progress has been been made on scalability.
Speaker 1:Is that something that you would suggest that people who are interested in doing research right now take up as a project?
Speaker 2:I would suggest that everyone learn a lot of the when learn about models like Gaussian processes, even if they don't anticipate using those models themselves, because, in addition to being quite useful models still even in the era of deep learning and state of the art for a lot of applications like Bayesian optimization and spatial temporal modeling and certain areas of neuroscience, et cetera, they provide a perspective that makes what might otherwise seem very mysterious, like over parameterization, relatively approachable and understandable, and so, I think, just as like a foundational tool for understanding new innovations and even proposing your own innovations in deep learning, it can be very useful.
Speaker 2:Also, this, this Um procedure I described, deep kernel learning, has been used in ways that are kind of surprising. So, uh, I found out recently that someone was using deep kernel learning in a probe that was being sent to one of Jupiter's moons to search for alien life and so on, and so I never would have envisaged like it being used in these kinds of ways, and so this is a very general purpose model, and it's found its way into all sorts of um unanticipated applications.
Speaker 1:Is there anything else? Uh, that has happened downstream from some of the papers that you've published. That is particularly exciting for you.
Speaker 2:Yeah, so I get. I get excited by both applications like that Um and you know very specific problems that we're trying to solve that have some kind of scientific relevance. Like um, I've been working a fair bit on biological sequence design for a drug discovery. That's something I'm very excited about. Uh, I also see, uh, my work on Gaussian processes being used a lot in astrophysics, which is something that I find very exciting, partly because I originally actually had a background in physics and was working on problems in general relativity, and I've actually started to revisit some of these, these problems, even with some of the physicists that I was working with as an undergrad who don't do machine learning right now but with machine learning tools, and so I've been very excited by those types of applications. But I also get really excited by abstract innovation. So, um, you asked me, like, what are the reasons that perhaps models like Gaussian processes aren't being more broadly used?
Speaker 2:One of the reasons historically has been scalability and um, this is because you need to solve a linear system and compute other matrix operations with this end by end kernel matrix, which naively incurs n cubed complexity, where n is the number of training data points you have, and so this is meant historically, that it can be difficult to apply Gaussian processes to problems with more than about, you know, 10 to 50,000 data points. The memory is also n squared and that's that's pretty significant for n, you know, larger than 50,000. We sort of recognize that. This is, you know, a linear algebra problem. We're trying to solve these linear systems and these, these matrices actually have a lot of structure that is just being left unexploited. That arises as a consequence of our modeling assumptions, like maybe we have a stationary covariance function, meaning that it's translation invariant, or maybe our data has a particular type of structure that gives rise to structure in this, this matrix. And if we can recognize that structure, then quite often we can solve those linear systems very, very efficiently.
Speaker 2:Um, we can also develop algorithms that harmonize with advances in hardware design, like GPU acceleration.
Speaker 2:So rather than use algorithms like the Cholesky decomposition as a first step towards solving a linear system, which is a recursive algorithm that's hard to paralyze, we can instead pursue approaches called Krylov subspace methods and sounds fancy, but it's really not like linear conjugate gradients and so on which are iterative methods that involve just a lot of matrix multiplications and this is something that's very trivial to paralyze on a GPU, and so these methods actually can lead to pretty extraordinary scalability, and you can have these so-called matrix free approaches, where you don't actually need to whole generate the whole matrix at a time and so on, and you can distribute this very effectively.
Speaker 2:And so we built a whole library around that GPI torch, and this has led to Gaussian processes actually being used quite broadly, and so, um, this is much more of an abstract question than how do we simulate like black hole collisions or Um, uh, you know, build, you know model, sort of disease spread or something like this Um, but I do find it very exciting because if we can have some compelling solutions, then then it could impact sort of any number of applications, including many that we don't necessarily anticipate, and so I do get very excited by these sorts of findings as well.
Speaker 1:You tell me a little bit more about, I guess, the applications that you've been working on, uh of machine learning to general relativity problems.
Speaker 2:So that's more of a recent thing, um. So when I was in undergrads I worked in physics and I collaborated with a group that was working on what's called numerical relativity Uh, and so that particular group and its collaborators actually had a big breakthrough around the time I was working in that lab. So for decades there had been this challenge, uh of how to simulate black hole collisions and essentially you're trying to solve partial differential equations and general relativity Um using um classical finite element and finite difference methods, and there are all sorts of like numerical questions that arise when you try to do this. It was very hard. It was sort of the binary black hole challenge. It existed for a long time, um, the advisor who I was working with at the time, matt chopped to it uh actually developed a whole programming language around trying to solve these problems, and one of his postdocs, franz Pretorius, was really the first to solve this, and you know this uh, really uh, you know uh was was explosive news, um, not just in that community but kind of just broadly like I think it was, you know, announced by NASA, and so it was in a lot of you know just kind of publications for general readerships and so on. Um, and so that was really inspiring and exciting to be part of.
Speaker 2:And, uh, more recently, in the last couple years, I've been thinking about how we can use neural networks to solve partial differential equations, and they have very different grows and cons relative to the classical finite difference, finite element methods, so they don't have a cursive dimensionality in the same sense. So, um, when you are using these sort of finite difference methods, you're working with grids typically, and so this makes these methods relatively intractable If you are trying to solve PDEs in more than about three dimensions. Um, the classical methods don't assume regularities in the solutions. Um, and this has other sorts of limitations, and so, um, kind of by side, stepping this cursive dimensionality and making modeling assumptions that allow you to exploit regularities in the solutions, you can sometimes solve much harder and much different types of problems and are very different types of problems.
Speaker 2:And so, uh, we've been, you know, working on this kind of in a fundamental sense, sort of in the like, can we scale GP, sort of respect, um, how do we make approaches to solving PDEs with neural nets much more numerically stable? Um, how do we make them more general purpose? And we've, at the same time, been kind of revisiting some of these questions that I was actually thinking about as an, as an undergrad, working with people on on numerical relativity problems, and so we were kind of revisiting that the binary black hole collision problem, and so, yeah, we haven't sort of hosted anything about this yet, but we've had some very exciting preliminary results and we did have a paper actually on neural nets for for PDEs and initial value problems, and one of the examples was was from general relativity.
Speaker 1:So at the beginning we were talking a little bit about invariances, and variances are something that came up quite a bit. Invariances equivariance is ensuring that. Invariances, and learning and variances, as something that came up quite a bit in my PhD work. Uh, some of the questions that I've had were how, how does it apply and how do you figure out these invariances in the language setting? Is this something that you've looked into at all?
Speaker 2:That's a really great question.
Speaker 2:I mean, it's related to how do we do data augmentation. For example, for language, it's obvious for images like we might want to include examples of the images which are translated or rotated or, you know, have been put through some other sort of transformation that we want invariance to, and that is sort of the brute force way of trying to encode some of these things into our models. For language it's less clear and I don't know if we're going to develop good intuitions for this. We can still have approaches for it. So I did actually have a paper on what we call multimodal data augmentation. So the idea is you have models which are simultaneously acting over different modalities like language, images, text, tabular data and we want a way of doing augmentation that respects the nuanced correlations between these things. So one example of this is like if you're trying to classify potentially offensive memes, you need to have an understanding of how the captions in the images relate to each other. In isolation they might not be very offensive, and so you really need a multimodal model and you also need to do data augmentation such that you're not breaking these important correlations. So just kind of applying standard augmentation techniques to each modality separately might not work very well, and so what we did essentially was we did data augmentation in a latent space of the model. So you would have this sort of backbone, this multimodal backbone model, which would convert images, text, tabular data, whatever else, into some latent vector, and then you would have an augmentation vector, actually augmentation network, which was operating on those latent vectors to reduce other augmented latent vectors. And in order to train that augmentation network, we had an adversarial loss which would try to sort of increase the informativeness of these augmentations, as well as a consistency regularizer that would try to preserve the semantic structure of the augmentations. So we're not kind of like just flipping the label or something like that when we do the augmentation. And this works really really well. It is very simple. I mean, something I liked about is very simple and it works much better than just, you know, doing standard transformations to each modality separately, as well as some fairly tailored approaches to augmentation. So, yeah, this, this was something we could do for tabular data as well as for for language data and so on. So in some sense it's sort of learning. You know structure that we might want our model to be invariant to, but it is still fairly inscrutable. So it's not like, not like with images.
Speaker 2:The question of how to learn in variances is also, I think, really interesting and fundamentally related to how we can build intelligent systems like here. We want to learn a constraint which is, you know, very different than the way learning tasks are often set up and machine learning, where the most flexible model will always be favored by our objective because it'll allow us to fit our data better than any less flexible version of that model. In this case we need sort of some Occam's razor property when we're selecting for the representation that we want to have. You can do something called regularization, which is sort of a hard coded way of trying to encourage simple solutions in your objective, but it won't necessarily help you discover in variances or say the right in variances, like if I am doing image classification with digits, a six is rotation invariant until it starts to look like a nine, and then, at the end of the day, I learned the exact right amount of rotation invariance in that problem.
Speaker 2:And so there are objectives actually inspired by Bayesian thinking, like the Bayesian marginal likelihood, which are consistent estimators for these constraints, meaning that as you get more and more information that actually collapse onto the right value of the constraints.
Speaker 2:You can also do this for learning, say, intrinsic dimensions, like if your data appears to be high dimensional but actually lives on a lower dimensional manifold, how do you, how do you estimate the real dimensionality of this manifold?
Speaker 2:And so this is trying to learn a constraint. It's something that we're very good at doing, like if I showed you some unknown characters and then you know there was some amount of rotation, invariance and so on. Labels is something you would probably pick up upon very quickly and you would use it to help you generalize. This is actually very difficult learning problem, but I think you know we can make progress using Bayesian thinking, and these sorts of objectives, like the Bayesian marginal likelihood, are connected to notions and information theory, like minimum length, where maybe you want to view your model as a compression of the data and you want, like the best kind of lossless compression of the data, so you want to fit your training loss really well, but you also want to represent your model using a very small number of bits, and so this is kind of Occam's razor in another form.
Speaker 1:So I'm going to give you a little bit on, I guess the middle part, where you were talking about using an adversarial I guess an adversarial loss to increase the or an adversarial loss to assist with the augmentation of vectors in the case of the network. I was a little bit curious how you measure, in that case, the informativeness of your augmentations.
Speaker 2:So the adversarial loss essentially tries to find an augmentation of your model, of your latent vector, which will kind of increase the loss if you preserve the same label, and this has the effect of creating augmentations that are sort of near the decision boundary of your model. And so this is where your model actually really needs information, but at the same time you don't want to cross that decision boundary, because otherwise you're just flipping the label of these augmentations and that's not going to be helpful. And that's why we had to have this other term, this kind of consistency regularizer that was trying to kind of, you know, preserve some of the semantic structure of this augmentation. It is actually a little bit hard to know how to get this balance exactly right. Fortunately, the performance of this particular approach wasn't too sensitive to the exact balance and both terms were really important, but it is actually a, you know, a bit of an art.
Speaker 1:What has been the hardest part of professorship for you?
Speaker 2:management is arguably one of the more challenging aspects of this, because there's not really one approach you can take with everyone that's going to be successful.
Speaker 2:You really need to tailor how you work with people to the person you're working with, and understanding that and knowing how to do that efficiently, I would say, was, you know, something that needed to be learned. It wasn't just automatic. Teaching is something that I've actually really enjoyed and, I think, benefit it from in a lot of unexpected ways. So of course, it was a bit challenging initially, but If you get used to speaking to a room of 500 people multiple times a week without very much preparation, it's incredibly good exercise for just going to conferences and maybe like moderating panel discussions or speaking in an off the cuff way, without being incredibly self-conscious and struggling to sort of navigate those situations, and so it's given me an ability to communicate in ways that I found very difficult, I would say, before.
Speaker 2:It also, I think, helps me in my research in just understanding what questions people have and how they think about things, which is quite often very different from how I think about things, how they learn, which also can be very different, and from how I learn, but also from how other students learn sometimes, and so I think this has been extremely useful for just writing and communicating research, which is also very important for the research to have an impact. So some of the things that I found a bit challenging initially have also been quite beneficial in a lot of unexpected ways.
Speaker 1:I've been working on a lot of open source code recently. I'm curious to know what the process looks like for you guys open sourcing code, if you've been working on anything in the open source that you're particularly excited about.
Speaker 2:Yeah, so I briefly mentioned GPY Torch, which is an open source library we developed a few years ago for scalable Gaussian processes and since then we've released Bo torch, which is for Bayesian optimization and it uses. Gpy torch essentially is a backhand. And very, very recently we've released a library called Cola compositional linear algebra and it's really the final manifestation of this idea that I mentioned earlier when I was talking about scaling GP's and how modeling assumptions can kind of manifest themselves as algebraic structure which we can exploit for scalability. So Cola is called, stands for compositional linear algebra, and really it's trying to enable people to prototype all sorts of different algebraic structures which might help them, you know, accelerate their own methodology, but also enables people to kind of like use the library to automatically construct very efficient algorithms based on certain properties they might be aware of. So it recursively exploits compositional structure in kind of the matrices of these models and you know, in some sense like this is very, very general it's sort of hard to describe because I think it's it is actually very broadly applicable.
Speaker 2:In a way, sort of machine learning can be viewed largely as linear algebras and neural nets are just like a bunch of matrix. Operations pass through non linearities and different types of neural nets embrace different types of sort of algebraic structures and so on and so forth, and so the components have parameters, sharing and sparsity and so on from the convolutions and and this library really, you know, allows you to kind of immediately benefit from that structure, but also, if you're not necessarily that aware of what structure, you can kind of automatically construct the most efficient algorithms based on your hardware constraints and so on. And so I'm really really excited about this. I think this is really going to define my research agenda for the next few years, but I'm also hopeful that this is going to be integrated and virtually, you know, every kind of major machine learning system that involves computational linear algebra.
Speaker 1:What sort of projects have people been using this for so far?
Speaker 2:So we basically just released it. We did use it for the neural PDE project I mentioned. You can use it for Gaussian processes. We have a paper that's going to be appearing at the NURIPS conference which is all about Gaula, and in that paper we have many, many different types of applications. So we look at dimensionality reduction, we look at Gaussian processes and then we look at the differential equations, both from, like, the neural network side but also even the classical side.
Speaker 2:So I think this could be very useful to people who aren't even necessarily doing machine learning but just, you know, need scalable linear algebra, need to compute some eigenvalues, need to solve a linear system, need to do some other matrix operation and you know, these things just are just ubiquitous in machine learning and beyond this. This library should help you do those things much more efficiently and also help you reason about, like, various modeling assumptions that could give rise to certain types of structures that might, you know, ultimately lead to huge savings in computation and memory and could give some insights into, you know, what assumptions actually make make sense for these problems, just through the ability to do this kind of rapid prototyping. So I'm hopeful that this library, gaula could be useful in virtually any computational effort that involves linear algebra, which is pretty much all of machine learning.
Speaker 1:What's something that worries you or scares you about the future of AI?
Speaker 2:So the types of tasks that are potentially going to be automated are very different than what's been automated historically through technological innovation like the industrial revolution and so on. So some of these tasks are, you know, cognitive tasks like writing and communicating and so on, and so I think that you know tools like large language models can be useful in this respect, but I am a little bit worried that if people start relying on them too much, then maybe they'll start to lose the ability to have original thoughts and ideas and maybe to think very independently, and so that's sort of a worry that I have. It's not something I've heard many people talk about, but like and I'm not broadly worried about existential risks of AI, but this is sort of a type of existential risk that does concern me a bit like kind of just giving up sort of cognitive capacity for things that you know really matter in terms of how we how we think and you know interact with the world.
Speaker 1:What do you think the future of the field is going to look like?
Speaker 2:So I think we're going to see artificial intelligence become an integral part of most scientific disciplines. So it's going to be less and less people from within the core ML community applying their methodology to particular scientific problems and other domains and more domain scientists actually also being machine learning scientists. I think that's a change that we'll see. I think that we're going to see this really emerge as a major discipline in its own right. So right now we have computer science departments, chemistry departments and so on, physics departments. I think machine learning is going to be a discipline like that.
Speaker 2:So we're now seeing a lot of data science departments, but I think that we are going to have, you know, departments for machine learning and AI research that are very much like departments of chemistry and physics and so on, and this means that we'll have, you know, many more students, I think, also working on these problems and also an ability to have a very interdisciplinary approach to kind of recruiting people to work in this space. So one thing that can be a bit challenging and, you know, normally is like maybe you have a student with like a physics background who could be really great for machine learning research, but they don't have like a conventional statistics or CS pedigree or something like that. I mean, you can fight for these students and they can still sometimes be admitted, but if you had kind of a department that was devoted to this discipline, it wouldn't even really need to be a fight.
Speaker 1:Well, thank you very much. I think, yeah, that sounds like we are accelerating and I am very excited for that future.
Speaker 2:Thank you.
Speaker 1:Thank you for coming on.