Dr. Mirman's Accelerometer
Welcome to "The Accelerometer," a cutting-edge podcast at the intersection of technology and artificial intelligence, hosted by Dr. Matthew Mirman. Armed with a Ph.D. in AI safety from ETH, Dr. Mirman embarked on a unique journey, to join the accelerationist dark side and found his YC funded company, Anarchy. In each episode, Dr. Mirman engages with brilliant minds and pioneers in the YC, ETH, and AI spheres, exploring thought-provoking discussions that transcend the boundaries of traditional AI safety discourse. "The Accelerometer" offers listeners a front-row seat to the evolving landscape of technology, where Dr. Mirman's insights and connections with intriguing individuals promise to unravel the complexities of our rapidly advancing digital age. Join us as we navigate the future of AI with The Accelerometer.
Dr. Mirman's Accelerometer
3d with your SLR: Forrest Briggs
Unlock the future of media with Dr. Matthew Mirman and our special guest, Forrest Briggs, as we uncover the transformative power of volumetric media in AI-driven experiences. Forrest, the visionary founder and CEO of Lifecast, is reshaping the landscape of virtual environments with tools that offer unparalleled immersion. Together, we examine how six degrees of freedom and cutting-edge advancements like neural radiance fields are making 3D content creation not only more accessible but also more revolutionary, allowing creators to craft dynamic and responsive virtual spaces.
Picture this: filmmaking without the constraints of traditional green screens. Discover how volumetric technology is revolutionizing the industry with tools that capture photorealistic environments using VR180 cameras, as showcased in productions like "The Mandalorian." Forrest and I discuss the profound implications of this technology, from saving time and money with reshoots in controlled environments to blending computer vision and generative AI for near-perfect 3D reconstructions. We ponder the exciting possibilities of creating entirely new volumetric scenes from simple text prompts, balancing reality capture with imaginative innovation.
As we look ahead, the conversation shifts to the broader implications of AI in media, including economic disruptions and the democratization of content creation. Forrest shares insights into how generative AI and large language models are set to alter the media landscape, making personalized content creation easier than ever. We explore the ethical and economic challenges of AI's rapid advancement, questioning how these technologies can align with human values while presenting opportunities for unparalleled creative expression. Join us for a journey through the thrilling world of AI and volumetric media, as we consider the limitless potential of this groundbreaking technology.
Episode on Youtube
Accelerometer Podcast
Accelerometer Youtube
Anarchy
Anarchy Discord
Anarchy LLM-VM
Anarchy Twitter
Anarchy LinkedIn
Matthew Mirman LinkedIn
We are not entirely in the business of just capturing the real world. They were very unreliable in many cases. I'm imagining a future where billions of people have wearable 3D cameras on their bodies.
Speaker 2:Hello and welcome to another episode of the Accelerometer. I'm Dr Matthew Merman, ceo and founder of Anarchy. Today, we're going to be discussing AI's place in future media with Forrest Briggs, founder and CEO of Lifecast, an AI toolset for volumetric media. So can you tell us a little bit more about what your company, lifecast, an AI tool set for volumetric media? So can you tell us a little bit more about what your company, lifecast, does?
Speaker 1:Sure, we have a whole suite of tools that are designed to make it practical for people to create volumetric media, and there's several dimensions in which our tools do things a little bit different from others.
Speaker 1:So I like to think of it as practical, immersive volumetric media, and we can unpack a few of those terms, but immersive is a big one.
Speaker 1:So immersive to me means you could experience this content in VR, and it covers at least half a sphere, maybe a whole sphere.
Speaker 1:But yeah, I mean, it's not just a little rectangle that you're looking into and it's also not just a person that's captured with an outside-in array of cameras and no environment as well, it's the whole environment, the whole experience that we're trying to capture. So that's the immersive part of this, and then practical is a big one. There's a lot in there, but it means you can capture this stuff with cameras that actually exist or that you can buy or that are practical to take into interesting locations and capture things that are happening spontaneously or in remote locations, for example. And then it also means you can process the data in a reasonable amount of time, edit it and then deploy it where you want to, which is in VR, on the web in game engines like Unreal and Unity for virtual production and special effects on holographic, glasses-free displays like the looking glass. So we try to make it really easy to get into the volumetric representation of things and then to get that out wherever you want to go.
Speaker 2:I mentioned at the beginning of the podcast that you just mentioned this term of volumetric media. Can you tell us a little bit about what that term means?
Speaker 1:Sure, sure, for our purposes. The important thing is it means it's a type of 3D media that you can freely move around in and look at from any different point of view. And that's different from what some people would think of as 3D media, like a 3D movie or most 3D VR video content today, where there's an image for your left eye and an image for your right eye, but if you move your head side to side, then the scene can't respond. And it's not just useful for vr, like in virtual production.
Speaker 1:Let's say you're filming a shot where you have a camera moving on a dolly and then you have some actors in front of a green screen and you want to composite in a 3d background. You want the rendering of that background, replacing the green screen, to move in the same way as the camera on the dolly. So you need to be able to render these 3D environments from the point of view of an arbitrary moving camera. So that's what we mean by volumetric. A lot of other terms get used. Volumetric, six degrees of freedom, um, spatial people throw a lot around a lot of terms, but all we hear about is you can move the camera, the virtual camera, in 3d so I thought six degrees of freedom.
Speaker 2:Uh referred to information from the camera's perspective more than it referred to volumetric yeah, okay.
Speaker 1:So when we talk about six degrees of freedom, we mean that the virtual camera can translate or rotate. So there's three degrees of freedom for translation and three for rotations. You've got two, something like that. Okay, yeah, that makes sense. That's three translations.
Speaker 2:So what your company is trying to do is make it easier to create this type of content.
Speaker 1:Yes, yeah, and we have a lot of tools that are already available to people now, and we have some really cool stuff that's R&D that we hope to unveil soon and there's already previews available. But right now our tools work with VR180 cameras as input, and these are widely available. Cameras like the Canon R5 with dual fisheye lens which can film 8K stereoscopic 180 degree footage, so it's Hollywood quality. You know, it's a camera that you could make a Netflix documentary with, but it's capturing a stereo pair of 180 degree images, and that's normally what people use for VR video. But then we take it one step further a stereo pair of 180 degree images, and that's normally what people use for VR video. But then we take it one step further. We run those images through our own computer vision pipeline to actually build a proper 3D model of every frame of video which is volumetric or enables 6DoF rendering, and then we compress it into our own format which can be efficiently streamed over the web and decoded in our open source players for web and Unreal and Unity.
Speaker 2:Do you need that sort of expensive camera setup to use your tools?
Speaker 1:No, you don't. So there are many VR180 cameras. It just means a stereo fisheye camera. Probably the lowest cost one is the Lenovo Mirage at about $220. And then you can go all the way up to tens of thousands of dollars at the high end. And our future products will be compatible with even more different types of cameras that lots of people have, and we find that people want to use whatever camera that they actually have, and we aim to help do that.
Speaker 2:We find that people want to use whatever camera that they actually have, and we aim to help do that. What do you need in order to make that jump from this uh expensive, like 180 uh to uh 180, like dual fisheye camera lens technology, to like a normal iphone camera being used for this?
Speaker 1:Sure, well, we are right now working on doing this with iPhones as input, and we have some pretty cool demos that are on YouTube you can take a look at now. But one of our kind of research projects is creating neural radiance fields with an iPhone and then being able to render that into a VR format, and so we've demonstrated a really nice result with that recently.
Speaker 2:Maybe you can tell us a little bit about what a neural radiance field is. Sure, I'd love to.
Speaker 1:Okay, so this comes back to a classic approach for rendering 3D images in the field of computer graphics, which is ray tracing. The idea is, for each pixel in an image, you fire a ray and then you figure out what color the thing that ray intersects is. The difference with neural radiance fields is now, instead of tracing a bunch of primitives like spheres or cones or something like that, the thing that you're intersecting is a representation of a field that is stored in a neural net. That is stored in a neural net and you implement your ray tracer in something like Torch or TensorFlow or, excuse me, in Torch or TensorFlow, and then you can differentiate the whole thing and try to learn the neural net that represents the scene so that when you finish rendering, you get some target image from the real world. So you start with a collection of images. Maybe you capture them with your phone, or maybe it's a video, and that's like your training set, and you actually train a new neural net for each scene that you're doing. So that's part of it.
Speaker 1:It's a very computationally expensive approach, but this idea has recently sort of revolutionized the accuracy with which we can reconstruct 3d models of scenes from multiple images and be able to do novel view synthesis, which is rendering the scene from a new point of view. So that's exactly what lifecast wants to do. And, um, you know, we have both research and we have product and we try to keep our eye on what can we actually bring to customers that's useful and not just like pure abstract research. So that's always an interesting balance to strike. That's really cool.
Speaker 2:So one of these neural radiance fields. How would you go about creating one? How would you go about creating one?
Speaker 1:how would you go about creating one. Well, I mean, you would write a bunch of code in python or c++, but um, maybe you can go into a little detail about what's so training pipeline looks like here we need to talk about a few basic concepts in 3d geometric computer vision, right?
Speaker 1:so it's really important to have a model of your camera which tells you, for each pixel, what's the corresponding ray direction and origin and that connects the pixels in 2D with the geometry of the scene in 3D. So you wanna know things like the lens distortion or the field of view, and you also need to know the position and the rotation of each camera, and it's really important to have precise knowledge of these facts about your cameras. And usually, but not always, this is done before you try to construct a neural radiance field with some other piece of software, like CallMap, which is an open source tool for structure from motion. Structure from motion is the problem in computer vision, which is given a set of images, figure out the pose and camera parameters of each camera that took them. So you can do this with like collections of images that you find off the internet, or you can do them with a sequence of images from your phone or whatever, but that's really important. You need to get that right before you can even attempt to construct a neural radiance field. You need to get that right before you can even attempt to construct a neural radiance field.
Speaker 1:And then you know there's a few interesting pieces of a NeRF implementation as well, Like there's the radiance model itself, which in the classic implementation was just a multilayer perceptron. So this is actually a simple idea that I think is so simple that it was maybe overlooked for a long time. But you can make a neural net compute a function from XYZ to RGB, right, Given this point in space XYZ, what's the color RGB at that point? And you can also ask what's the density at that point as well. And if you ask those four things of it, then it's a radiance field. So a function that maps positions to colors and densities is a radiance field, and you can compute that function with a simple multi-layer perceptron, right and sort of like transformers.
Speaker 1:In the field of language modeling people figured out that if you put a positional encoder on the input to this MLP it does a much better job at learning high frequency signals.
Speaker 1:So that's the core Radiance Field idea as it was when it first came out a few years ago, and since then there have been lots of approaches to make that much faster and to train more efficiently. And then you have a few other components, like basically the differentiable volumetric rendering piece of this which is much like a classic ray tracer, but for each pixel it's shooting a ray and then, along that ray, it's sampling the radiance field at several points, blending all the colors according to their density to compute the final pixel of that color which is then compared to the ground truth pixel of your data set to get a loss. And it really is as simple as minimizing the squared error between the image you took from the real world and the image that this ray tracer is spitting out, and then letting it find the weights of the MLP such that the radiance field stores the 3D geometry of the scene Very cool.
Speaker 2:So, beyond creating these volumetric, I mean, I saw you taking a video of the setup earlier and, I presume, creating a nerf with it.
Speaker 1:Yes, we will do that.
Speaker 2:I'm going to want to walk around VR by the way of my own setup at some point.
Speaker 1:I think we should be able to send you a VR video that you could watch on YouTube in a VR headset of this room, which I got from a five-second video moving my phone around.
Speaker 2:We should absolutely link that in this podcast. If we can Try to send it to you after the podcast, oh, that would be so cool. So, beyond creating, creating these nerfs, what else do you intend for your tools to?
Speaker 1:be able to do. I think it's really important to focus on the applications right. So one of the biggest applications right now is for virtual production, which is really revolutionizing filmmaking today. Stuff like the Mandalorian or Dune use this technique where, instead of having a green screen behind your actors, you now have a giant LED wall, and this is great because it creates a more realistic illumination on the actors. You can't you know, if you have a green screen, you have a green tint on your actors. If you have a 3D environment of dunes, you have the right tint on the actors and you don't need to mess with it later on.
Speaker 1:So one of the expensive parts of this process is having artists create 3D environments that are photorealistic in something like Unreal Engine so that they can then be rendered from the point of view of the cinema camera that's filming these shots. And we have a tool that lets you capture photorealistic volumetric environments very efficiently from the real world using VR180 cameras. And it's different from photogrammetry or what most people are doing with nerf right now, because it's video, and video makes the environments feel alive, like there's something kind of different about having a background environment where the trees are swaying in the wind, versus like everything is frozen in time. So that's really important, um, and we do this both in our neural radiance field stuff and in our current generation of products, which are based on layered depth images, which is the representation that we developed for volumetric video that is actually practical to use right now. Anyway, we have all these tools that let you capture photorealistic 3D environments as video and then render them in unreal engine for use in virtual production, and this can save people time and money when they are making 3d environments for this purpose, and it can also mitigate risk.
Speaker 1:When you go out and shoot like you bring some expensive actors to a unique location and you get a shot at a particular time of day, and then you come back and you get a shot at a particular time of day, and then you come back and you find that they said the line wrong. What are you going to do? Are you going to fly them back there? Can you ensure that you can get the same weather at the same time of day again? No, instead, you could capture just the environment with a VR180 camera and then use our software to construct a 3D environment of it, and then, if you need to reshoot that, you can do it in a studio.
Speaker 2:I like this example of like you need to get the weather right because it brings up an interesting question what happens when you want to create video of your environment and turn that into like a video nerve?
Speaker 1:Yeah, so okay, I want to be a little clear about this. We have two generations of products right now. We have layered deathmage video and we have nerf stuff, and we're working on nerf video, but it's not ready to release to the public yet. Um, there are lots of challenges in either case, but with our current products we're using stereo cameras, which means we have two points of view of the same moment of time, so we can do deep learning based computer vision to estimate a very accurate depth map of the scene. And then we do other stuff to temporarily stabilize it and we decompose it into multiple layers and use generative models to imagine what's behind things in places where the original camera never saw, and we try to do that in a temporarily stable manner. So it's still really complicated to try to do a 3D volumetric video reconstruction with two cameras. It's even harder to do it with one camera and we're working on that as well, Like even in the two camera setup.
Speaker 2:what's been the main challenge so far?
Speaker 1:We want to create useful tools for people who are very serious about image quality in video and they expect nearly perfect 3D reconstructions. So we're always looking to use the state-of-the-art model for stereo depth estimation or neural radiance fields and then extend it as many ways as we need to to actually make it into a useful product as possible. But I would say that only in the last year or two has computer vision even really been up to the task of making reconstructions that are accurate enough to satisfy people. So that's one of the big challenges is just doing the computer vision and 3D geometry accurately enough to meet the needs of video professionals. As soon as you give someone the freedom to move around in a volumetric video, they try to look into parts of the scene that were not captured by the real cameras and they expect to see something reasonable. And that's where you can no longer rely purely on 3D geometric computer vision. You also need to use some generative AI to try to imagine the missing parts of the scene. So you know, years ago we had state-of-the-art stereo fisheye deep learning-based depth estimation and we could render six-dof scenes. But people weren't satisfied with what happened when they looked behind something, and that took a lot longer, and I think it's still an open problem that we're really interested in.
Speaker 1:But it raises this interesting point that I think about sometimes, that we are not entirely in the business of just capturing the real world. We are also, to some extent, always in the business of imagining missing details with AI, and anything that people ultimately see will not be 100% real. It will be some spectrum between entirely real with a little bit of missing details filled in all the way up to completely generated. And you know we have another product it's called HoloVolotv, where we use stable diffusion to generate volumetric scenes from text prompts, right, and it comes in the same format as our other stuff and it integrates with all of our other tools. But the point is that, yeah, like there's this continuum between completely real and completely imaginary and we actually can't exist all the way on one end of the spectrum.
Speaker 2:Is it your aspiration to eventually exist entirely on both ends of the spectrum, or do you have to choose a spectrum at the end?
Speaker 1:It's so tempting to do generative stuff once you see what you can do with it. The original vision was more about capturing the real world, but what I've learned is that you have to have some generative AI in there to fill in the missing details, no matter what, and if you're going to do a really good job at that, you're also going to be able to make up entirely new scenes anyway.
Speaker 2:So what's the most interesting thing that people have used your tools for so far?
Speaker 1:Recently it has been used with the looking glass glasses free holographic display and um was used to show a 3d video of pearl drums at an expo. Right, so they're at this expo. They have a 3D display. It's showing holographic video of a drummer. That holographic video was created from a VR camera using Livecast software and made possible to display on these displays in a relatively streamlined manner. I don't know. It was actually pretty interesting how easy it was to get it set up, because we are already making all these tools for WebXR and the Looking Glasses API integrated with that. So it ended up just being a few lines of code and bam, it was on that display.
Speaker 2:Going a little bit to your founder journey. What were you doing before this that, I guess, led to this?
Speaker 1:Yeah, yeah. So I've worked in tech for quite a while on various things involving perception.
Speaker 1:So at Facebook, I was the first engineer on their 3D VR camera team and I wrote significant pieces of their open source software for stitching 3D VR video and also parts of the computer vision pipeline for their prototype volumetric cameras, which were like these giant balls with 20 cameras looking in all different directions.
Speaker 1:And then after that I joined Lyft's self-driving car project and I led calibration, which is part of the perception system, where you have to understand the position and orientation and distortion of all the sensors, so cameras, lidar, radar and things like that. And then after that I worked at Google X, where I also led calibration for the Everyday Robot Project, and there I was focused on calibrating the cameras and LIDARs and actuators in a human-sized office robot, let's say. So. All of those different experiences had something in common, which was that I was working on the software for understanding sensors like cameras and how they form a perception of the world in 3D. And then, in many cases, how do you render that from a different point of view for visualization purposes? And the thing that always stuck with me was the applications of volumetric video were really interesting and I wanted to bring what I learned working on robotics perception systems back to that field.
Speaker 2:I feel like in all of those previous roles, the data that you had must have been significantly higher quality than what you're working with right now In some ways.
Speaker 1:yes, I mean, one of the things about all the robots and experimental cameras that I worked on was that they were very unreliable in many cases, and it was a miracle if all 20 cameras turned on and the USB data came through untarnished. So with LifeCast, I've been more focused on practical cameras that work reliably and using more machine learning and less hardware yeah, I mean that's cool.
Speaker 2:I feel like I mean two cameras, like I've got like a three camera setup here for this podcast and one of them is always failing, yeah, yeah, but you know, even in that case where you know a few of your cameras are failing, if you have 20 cameras, like it's a fair point. The software should be just fine in those cases, but yeah, yeah, but even then, like you'd say that with your two cameras, like that, increased reliability matters that much more Definitely Okay, yeah, so why weren't they doing stuff like this at Facebook?
Speaker 1:Well, we tried, but I think the state of the art in computer vision wasn't up to the task in 2015.
Speaker 2:So yeah, it's just changed that much since then.
Speaker 1:Stuff has come a long way since then. Deep learning was just getting started back in those days.
Speaker 2:Yeah, what are the sizes that we're typically looking at for these nerfs?
Speaker 1:State-of-the-art nerfs typically use a neural multi-resolution hash map encoder which is using as many elements of your neural hash map as you can store in GPU memory, so like 24 gigabytes worth of hash table to positionally encode a point. It's a little more complicated than than like. What's the size of?
Speaker 2:an mlp. Well, what's the neural hash map? I've never heard of this before.
Speaker 1:Good question sorry, um, okay. So nvidia's instant ngp is a very nice paper that came out in the last couple years, where they enhanced the basic nerf formulation. Instead of using a positional encoder, they introduced neural multi-resolution hash map, which is a differentiable data structure that encodes points with a latent code that can then be decoded with a small NLP, and it turns out that that learns a lot faster and higher fidelity 3D representations than just a vanilla NLP.
Speaker 2:How would you translate that though? So what I'm trying to wonder or figure out in my head right now is how nerfs compare to like the state of the art diffusion models in terms of size and flops required to compute.
Speaker 1:Okay, sure sure, nerfs are very computationally intensive on their own and there are many tricks for accelerating them. But let's say you're trying to render a one megapixel image, so a thousand by a thousand pixels. You're going to shoot one ray for each pixel and then maybe 128 samples on each ray and for each of those samples you're evaluating an MLP and an encoder.
Speaker 2:So it's 128 billion MLp inferences per image so that's a huge amount of flops, yeah, so, and the reason that I'm wondering this is because the uh executive order just came out, uh, potentially limiting flops ah, with like a couple days ago. So, like, if you're getting to like this, you know number of flops that like stable diffusion is getting. Like I estimate, or I'm guessing, that in a couple years it's going to get worse, right?
Speaker 1:so I don't know if you've been keeping uh track of what's been going on there well, I think the good news is that when we're talking about neural radiance fields or 3D, gaussian splatting is another technique that came out really recently. There's tons of research in this field which is making it more efficient, and it's already become two orders of magnitude more efficient in the last two years, and I don't think there's any sign that that trend will stop. So researchers are doing a great job of figuring out how to get higher fidelity reconstructions with fewer flops over time.
Speaker 2:So we have, in at least the LLM space, the interesting thing where, like, as LLMs are getting more efficient, we're just putting more and more data into them. We're making them bigger and more efficient. At the same time Is the same thing happening with nerfs.
Speaker 1:That's an interesting question. Nerfs definitely do better the more images you put in. But on the flip side particularly for LifeCast, from the practicality standpoint we're interested in making nerfs work better with fewer images as input right. The fewer the better, even only one image. Ideally you can treat it as a one-shot learning problem. So at that point you start kind of bringing in more generative concepts and you start thinking about foundation models for radiance fields and stuff like that, so you can get better nerfs with fewer images.
Speaker 1:If you've already digested lots of other images or something like that, what were you thinking when you started this company? Good question, what was I thinking? Why did I quit my great job at Google X leading the calibration of robots to start a company? That is, in my mind, also a moonshot. I'm imagining a future where billions of people have wearable 3D cameras on their bodies that can be used to capture memories and replay those in mixed reality, let's say.
Speaker 1:And a few years ago I tried putting an existing VR camera on my head and I recorded a video with it and I watched it back in VR and it made me super nauseous. I had to lie down for a couple hours. Had to lie down for a couple hours I thought about why that was and I realized that the current generation of VR video technology just wasn't sufficient for this use case of capturing and replaying memories. And what's happening is kind of interesting. There's a phenomenon called vestibulo-ocular conflict, which is a disagreement between your eyes and your inner ear about the perception of motion. So if you capture a video with a moving camera and then you play that right back into your eyes stereoscopically while you are sitting on a couch not moving at all, your inner ear says we're not moving and your eyes say you're moving and that conflict makes you nauseous.
Speaker 1:So I realized that if you create a volumetric representation of the video and you do several other things which are patent pending, you can trick that part of your brain's perception to not have that reaction and you can replay the memory without causing motion sickness. So that was the first thing that we were doing with LifeCast, and building all these tools for volumetric video is just a means, to the end, of having a powerful enough volumetric video system to handle replaying memories. We do have working prototypes of that, but I think it's still a little bit ahead of its time and not quite ready for public use. But we will definitely be releasing products along those lines in the future.
Speaker 2:So for you, the problem was just so important that you had to be working on it.
Speaker 1:Yeah, I just am obsessed with volumetric video and I'm always thinking about it, so I figured I should just do that as my day job.
Speaker 2:That's awesome. How has starting a company been for you? Is this your first company? This?
Speaker 1:is my first company. Getting into Y Combinator was super helpful and you know wouldn't be here without them guiding me along the way. It's been kind of a journey as a more technical founder with less like sales and business experience, had to learn things like go-to-market strategies and the importance of addressable markets and stuff like that which I never thought I would consider. I just wanted to, you know, build the technology to capture and replay memories, but, um, at some point companies got to make money.
Speaker 2:Oh, my God, I feel that pain. It took me, I think, six months to understand what a go-to-market strategy was. Every VC was just like what is your go-to-market strategy? I was like we're going to build it and we're going to go to market and it's going to be in the market and we're going to sell it and it's going to be good.
Speaker 1:I've been guilty of giving some pretty bad answers to that in the past, but I think our latest products that are coming out are more on track with this Input from iPhones instead of input from VR cameras, for example means a lot more people can use our software.
Speaker 2:So what's been the most challenging part of building a company for you so far? I?
Speaker 2:wish we had raised a little more money so that I could have just hired the vast team of computer vision researchers to to do everything, and instead we've had to be really scrappy and work with a small team, and I'm really happy to have the folks that I have um so I feel like that's good, like honestly, like I mean, you learn so much being scrappy at the beginning, yeah, like, yeah, I feel like I've I've had to figure out how to figure out if people are good, like at their jobs and not just like like I've been in these situations where an engineer comes and like I'm like this is going to be the best engineer, they're going to change my company, and then you know they ask for like half a million dollars to work with them and then you get, get into it, and then you're just like but are they gonna change?
Speaker 2:Like what we're doing that much like at the moment, like that one person, and then you're just like but yeah, why don't you personally, like want to take a bigger risk into this company, like at this stage that we are?
Speaker 1:so, yeah, something that you have to figure out how to do and yeah, I mean, I think it does take some level of belief in what we're doing to either be an investor or an employee of life cast, because we are a moonshot in some sense, like the vision is to completely upgrade the infrastructure for volumetric media, and we're a tiny company, but we have some of the secret sauce to get there, so that's it Okay, so let's switch topics to the future of AI and media.
Speaker 2:Sounds cool, okay.
Speaker 1:Where do you think it's going? Where do I think the future of AI and media Sounds cool, okay, where do you think it's going? Where do I think the future of AI and media is going? Combination of capturing the real world with exquisite fidelity, which is our part, and generative AI for creating video, getting better and better and essentially becoming indistinguishable from video captured with a real camera in the real world, or just enabling really easy editing of existing video to add or subtract things or render it from a different point of view, or whatever you want. It's going to be so easy to make content that I'm not sure what that will do to the landscape.
Speaker 2:So, like everybody, will be a content creator for VR if they want yeah, I think.
Speaker 1:Well, that probably means that things can be more personalized. I don't know if everyone wants that, but I think certainly some people just want to say I want this and then instantly it's generated for them.
Speaker 2:I think what's really interesting about this is that at the moment everybody is not a painter, even though, like, literally it's very easy to paint, and everybody is not, like, a photo editor, even though we've gotten fairly easy to edit photos, but basically everybody is a writer. So I feel like there's still a lot of room for improvement in all of these other fields, like to make it just as easy as it is to write, to generate photos, and we've kind of gotten to that point with-journey in Dali, I'm sure.
Speaker 1:You know, as I think about this more, actually, what I see is that not everyone wants to be a creator, and even though the tools are now making it really easy for anyone to create something that's pretty good, not everyone wants to consume something that's pretty good, because there's a million things that are pretty good out there being generated per second. People really want to see the best thing that is available to them, and what all these ai tools are doing is just raising the bar on what the best possible output a creator can make is, and so that's kind of what it's going to do is the quality bar, for what will get people's attention will just go up as a result of all these technologies well, I guess there's also like.
Speaker 2:So I that's, I would say, is like true for? Like youtube, it's true for content, that where you don't know the person, just like general content from the internet. But I guess, like facebook and myspace, they they noticed that it's not true for content from your friends. There's other information that you might want to convey. So, yeah, maybe you'd want to create a social VR where, I mean, we already have social VR, but a social VR editor class Absolutely.
Speaker 1:Well, another phenomenon that is pretty clear is that, over time, media just gets more immersive, right? So, starting with photos, then video, then it's stereoscopic 3d video, then it's going to be volumetric video and people will consume, like what? What is normal to consume now, photos and videos, that will be like consuming text is you know, in the future. Right, it'll just be normal that you will consume holograms in the future.
Speaker 2:Yeah, is there anything that worries you about the future of AI?
Speaker 1:Oh man, I worry less about the future of media in AI and more about LLMs disrupting the world economy and stuff like that.
Speaker 2:How do you think LLMs are going to disrupt the world economy?
Speaker 1:I think that many jobs that are currently done by a human could be done by an LLM or an embodied robot, and so that will pose a real challenge, because companies are going to do it, whether we like it or not, and society will have to restructure to respond to that. Change seems somewhat inevitable, uh that's what I foresee.
Speaker 2:Yeah, so you're just worried about like job, job loss. Are you not worried about the like animators that would get displaced by? Well nerf software.
Speaker 1:I think you can make the case for either that. Sometimes the jobs aren't completely replaced or destroyed, they're just changed into a different job, like the animator's job is no longer to create a 3D model, it's to capture a nerf correctly and then massage it a little bit to get it to be presentable. I mean, this is kind of a non-serve job doesn't disappear, they're just expected to write faster and produce higher quality novels I.
Speaker 2:I think this is also like a nonsense question because, like, people still do claymation, yeah, we, we have significantly more advanced and convenient techniques to create animations, but people still like to by hand animation and claymation and use all of the existing methods to your earlier question, though, like what really worries me?
Speaker 1:people used to ask me, like what are you worried about ai? When's it gonna come and be like super intelligent terminators? And I was like no, it'll never happen. It's like so far in the future, uh, brains are way more powerful than computers, and I was actually surprised by the advances of llms in the last couple years. So I realized that my assumptions about how fast ai was progressing were probably wrong, and so I have to adjust that now and look to a future where ai is progressing really fast and it becomes difficult to predict what its capabilities are going to be in the future and also difficult to control it. You know, like there's obviously the alignment problem of like how do you control a super intelligent AI? I mean, I think it'll problem in the next three years probably.
Speaker 2:I think it's really interesting that, like, yeah, I basically thought the same thing, but like I think we all we looked at moore's law and thought, okay, this just applies to personal computers. So like we're gonna need a personal computer to have more computational power than a human brain, uh, in order to get this sort of super alignment and I what I think very few people saw coming was that it wasn't just the personal computers we would be increasing the amount of money that we should be putting into trading and then utilizing giant, giant like supercomputers.
Speaker 1:That is an interesting point, yeah, but it isn't even necessary, right Like give it a couple more years of algorithmic development and then anyone can do the same thing.
Speaker 2:No, that's also true. Yeah, so on that note, thank you for coming on. It's been a lovely conversation, thank you.