Connect with us

Brand Stories

AI’s next leap demands a computing revolution

Published

on


We stand at a technological crossroads remarkably similar to the early 2000s, when the internet’s explosive growth outpaced existing infrastructure capabilities. Just as dial-up connections couldn’t support the emerging digital economy, today’s classical computing systems are hitting fundamental limits that will constrain AI’s continued evolution. The solution lies in quantum computing – and the next five to six years will determine whether we successfully navigate this crucial transition.

The computational ceiling blocking AI advancement

Current AI systems face insurmountable mathematical barriers that mirror the bandwidth bottlenecks of early internet infrastructure. Training large language models like GPT-3 consumes 1,300 megawatt-hours of electricity, while classical optimization problems require exponentially increasing computational resources. Google’s recent demonstration starkly illustrates this divide: their Willow quantum processor completed calculations in five minutes that would take classical supercomputers 10 septillion years – while consuming 30,000 times less energy.

The parallels to early 2000s telecommunications are striking. Then, streaming video, cloud computing, and e-commerce demanded faster data speeds that existing infrastructure couldn’t provide. Today, AI applications like real-time molecular simulation, financial risk optimization, and large-scale pattern recognition are pushing against the physical limits of classical computing architectures. Just as the internet required fiber optic cables and broadband infrastructure, AI’s next phase demands quantum computational capabilities.

Breakthrough momentum accelerating toward mainstream adoption

The quantum computing landscape has undergone transformative changes in 2024-2025 that signal mainstream viability. Google’s Willow chip achieved below-threshold error correction – a critical milestone where quantum systems become more accurate as they scale up. IBM’s roadmap targets 200 logical qubits by 2029, while Microsoft’s topological qubit breakthrough promises inherent error resistance. These aren’t incremental improvements; they represent fundamental advances that make practical quantum-AI systems feasible.

Industry investments reflect this transition from research to commercial reality. Quantum startups raised $2 billion in 2024, representing a 138 per cent increase from the previous year. Major corporations are backing this confidence with substantial commitments: IBM’s $30 billion quantum R&D investment, Microsoft’s quantum-ready initiative for 2025, and Google’s $5 million quantum applications prize. The market consensus projects quantum computing revenue will exceed $1 billion in 2025 and reach $28-72 billion by 2035.

Expert consensus on the five-year transformation window

Leading quantum computing experts across multiple organizations align on a remarkably consistent timeline. IBM’s CEO predicts quantum advantage demonstrations by 2026, while Google targets useful quantum computers by 2029. Quantinuum’s roadmap promises universal fault-tolerant quantum computing by 2030. IonQ projects commercial quantum advantages in machine learning by 2027. This convergence suggests the 2025-2030 period will be as pivotal for quantum computing as 1995-2000 was for internet adoption.

The technical indicators support these projections. Current quantum systems achieve 99.9 per cent gate fidelity – crossing the threshold for practical applications. Multiple companies have demonstrated quantum advantages in specific domains: JPMorgan and Amazon reduced portfolio optimization problems by 80 per cent, while quantum-enhanced traffic optimization decreased Beijing congestion by 20 per cent. These proof-of-concept successes mirror the early internet’s transformative applications before widespread adoption.

Real-world quantum-AI applications emerging across industries

The most compelling evidence comes from actual deployments showing measurable improvements. Cleveland Clinic and IBM launched a dedicated healthcare quantum computer for protein interaction modeling in cancer research. Pfizer partnered with IBM for quantum molecular modeling in drug discovery. DHL optimized international shipping routes using quantum algorithms, reducing delivery times by 20 per cent.

These applications demonstrate quantum computing’s unique ability to solve problems that scale exponentially with classical approaches. Quantum systems process multiple possibilities simultaneously through superposition, enabling breakthrough capabilities in optimization, simulation, and machine learning that classical computers cannot replicate efficiently. The energy efficiency advantages are equally dramatic – quantum systems achieve 3-4 orders of magnitude better energy consumption for specific computational tasks.

The security imperative driving quantum adoption

Beyond performance advantages, quantum computing addresses critical security challenges that will force rapid adoption. Current encryption methods protecting AI systems will become vulnerable to quantum attacks within this decade. The US government has mandated federal agencies transition to quantum-safe cryptography, while NIST released new post-quantum encryption standards in 2024. Organizations face a “harvest now, decrypt later” threat where adversaries collect encrypted data today for future quantum decryption.

This security imperative creates unavoidable pressure for quantum adoption. Satellite-based quantum communication networks are already operational, with China’s quantum network spanning 12,000 kilometers and similar projects launching globally. The intersection of quantum security and AI protection will drive widespread infrastructure upgrades in the coming years.

Preparing for the quantum era transformation

The evidence overwhelmingly suggests we’re approaching a technological inflection point where quantum computing transitions from experimental curiosity to essential infrastructure. Just as businesses that failed to adapt to internet connectivity fell behind in the early 2000s, organizations that ignore quantum computing risk losing competitive advantage in the AI-driven economy.

The quantum revolution isn’t coming- it’s here. The next five to six years will determine which organizations successfully navigate this transition and which turn into casualties of technological change. AI systems must re-engineer themselves to leverage quantum capabilities, requiring new algorithms, architectures, and approaches that blend quantum and classical computing.

This represents more than incremental improvement; it’s a fundamental paradigm shift that will reshape how we approach computation, security, and artificial intelligence. The question isn’t whether quantum computing will transform AI – it’s whether we’ll be ready for the transformation.

(Krishna Kumar is a technology explorer & strategist based in Austin, Texas in the US. Rakshitha Reddy is AI developer based in Atlanta, US)



Source link

Continue Reading
Click to comment

You must be logged in to post a comment Login

Leave a Reply

Brand Stories

The analysis of learning investment effect for artificial intelligence English translation model based on deep neural network

Published

on


Datasets collection

This experiment employs two widely recognized standard datasets in MMT: Multi30K and Microsoft Common Objects in Context (MS COCO)27,28. The Multi30K dataset comprises image-text pairs spanning various domains and is commonly used for image caption generation and multimodal translation tasks. The dataset contains three language pairs: English to German (En-De), English to French (En-Fr), and English to Czech (En-Cs). Specifically, the Multi30K training set encompasses 29,000 bilingual parallel sentence pairs, 1000 validation samples, and 1000 test samples. Each sentence is paired with an image to ensure the consistency between the text description and the image content, thus providing high-quality multimodal data for model training. The test16 and test17 datasets are used here. MS COCO is a dataset containing a wide range of images and their descriptions, extensively used in multiple tasks in computer vision and NLP. Beyond its established role as a standard benchmark for image captioning evaluation, the dataset’s rich semantic annotations make it particularly suitable for assessing model performance in cross-domain and cross-lingual translation scenarios.

Experimental environment

This experiment utilizes the Fairseq toolkit built upon the PyTorch framework. Fairseq is an open-source toolkit widely used in NLP tasks, particularly for constructing and training MT models. It supports various model architectures, including RNNs, convolutional neural networks, and Transformers, enabling effective performance enhancement in MT tasks. Based on Fairseq, the experimental model framework can be easily constructed, and the corresponding training tasks can be configured. The toolkit provides efficient parallel computing support and optimized training workflows, enabling effective large-scale model training.

Parameters setting

Table 1 exhibits the parameter settings for the experiment.

Table 1 Experimental parameter settings.

Two evaluation metrics, Bilingual Evaluation Understudy (BLEU) and Meteor, are used to comprehensively evaluate the performance of the FACT model29,30,31. These two metrics are among the most commonly used and representative automated evaluation tools in the current field of MT research. They have been widely applied in authoritative translation evaluation tasks such as the Workshop on Machine Translation (WMT), and have good universality and reliability. BLEU measures translation quality by calculating the n-gram match between the translated text and the reference answer. Specifically, BLEU calculates the precision of n-grams in the translated text, and its equation is as follows:

$${P}_{n}=\frac{{c}_{n}}{{r}_{n}}$$

(18)

\({P}_{n}\) refers to the n-gram precision; \({c}_{n}\) represents the number of times the n-gram units in the translation match those in the reference answer; \({r}_{n}\) denotes the total number of n-gram units in the translation. The final BLEU score of the translation is the weighted average of the precision for each n-gram unit, which can be written as:

$$BLEU=\text{exp}\left(\sum_{n=1}^{N}{\omega }_{n}\text{log}{P}_{n}\right)$$

(19)

\({\omega }_{n}\) is the weighting factor for each n-gram unit. To avoid giving overly high scores to shorter translations, BLEU introduces a brevity penalty (BP) to adjust the score. The calculation of BP reads:

$$BP=\left\{\begin{array}{l}1,\quad if\,r\ge c\\ \text{exp}(1-\frac{r}{c}),\quad if\, r

(20)

r and c represent the length of the reference and candidate translations. The final BLEU score is obtained by combining the BP of short sentences with the weighted average of n-gram precision, as follows:

$$BLEU=BP\cdot \text{exp}(\sum_{n=1}^{N}{\omega }_{n}\text{log}{P}_{n})$$

(21)

The advantages of BLEU lie in its simplicity and speed of computation, making it suitable for large-scale evaluations. However, it relies solely on lexical-level matching, neglecting linguistic features such as semantic similarity and syntactic variations. As a result, it demonstrates limited effectiveness when handling synonyms, word order changes, or translations that maintain semantic consistency but are expressed differently.

In contrast to BLEU, Meteor adopts a word alignment-based evaluation method, which better considers semantic information and word order. Meteor establishes a one-to-one correspondence between the words in the candidate translation and the reference translation to calculate precision and recall. The expression is as follows:

$$P=\frac{{m}_{w}}{{M}_{hypothesis}}$$

(22)

$$R=\frac{{m}_{w}}{{N}_{reference}}$$

(23)

P represents the proportion of words in the translation that match the reference words; \({m}_{w}\) denotes the number of matched words; \({M}_{hypothesis}\) and \({N}_{reference}\) refer to the total number of words in the translation and the reference. R implies the proportion of words in the reference that match the words in the translation. Meteor calculates an F1 score by combining precision and recall, and gives higher weight to recall. The equation is as follows:

$${F}_{\beta }=\frac{(1+{\beta }^{2})\cdot P\cdot R}{{\beta }^{2}\cdot P+R}$$

(24)

\(\beta\) controls the weight between precision and recall. To better handle word order issues, Meteor also introduces a chunking mechanism that penalizes translations with word order mismatches, as given in Eq. (25):

$$Penalty=\frac{{C}_{hypothesis}}{{C}_{reference}}$$

(25)

\({C}_{hypothesis}\) and \({C}_{reference}\) represent the number of chunks in the translated text and the reference answer, respectively. The final Meteor score combines the F1 score with the word order penalty, and is calculated using Eq. (26):

$$Meteor \,Score={F}_{\beta }-Penalty$$

(26)

Compared to BLEU, Meteor places greater emphasis on translation fluency, semantic retention, and linguistic naturalness, thus generally exhibiting higher correlation with human evaluation in simulations. By employing both BLEU and Meteor metrics simultaneously, a comprehensive evaluation of the FACT model’s translation performance can be conducted from two dimensions: formal accuracy and semantic acceptability. This makes a more authentic reflection of its practical effectiveness in MMT.

Performance evaluation

  1. (1)

    Comparison of model performance

Five representative baseline models are selected for comparison to comprehensively evaluate the performance of the proposed FACT model in MNMT tasks. These models are Transformer, Latent Multimodal Machine Translation (LMMT), Dynamic Context-Driven Capsule Network for Multimodal Machine Translation (DMMT), Target-modulated Multimodal Machine Translation (TMMT), and Imagined Representation for Multimodal Machine Translation (IMMT). Among them, the Transformer model is a classic architecture in MT and, as a pure text baseline model, effectively verifies the performance gains brought by multimodal mechanisms. LMMT uses latent variables to model multimodal interactions, emphasizing the potential semantic expressive power of image-text fusion in the latent space. DMMT introduces a dynamic context capsule mechanism to enhance semantic coupling between modalities during translation. TMMT guides visual information to participate in the translation generation process under a target modulation mechanism, improving target alignment between modalities. IMMT attempts to use an “imagination” mechanism to generate intermediate image representations for assisting semantic understanding and translation generation. All of the above models are representative methods in recent MNMT research, with strong representativeness and comparability. The primary reasons for not including large multimodal language models such as Generative Pre-trained Transformer 4 omni (GPT-4o) or Large Language and Vision Assistant (LLaVA) in this experiment are as follows. (1) These models are closed-source or commercialized, making fair comparisons under unified datasets and parameter configurations difficult; (2) Their training data and computing resources far exceed those accessible to the FACT model, rendering direct comparability infeasible; (3) FACT prioritizes structural lightness, training efficiency, and language learning adaptability over scale advantages. The above publicly structured and representative multimodal translation models are selected for horizontal comparison to ensure fair comparisons under unified datasets and parameter configurations. This enables more objective validation of the FACT model’s performance advantages in semantic consistency modeling and future context information guidance. The BLEU and Meteor evaluation results of each model on the En-De translation task are depicted in Fig. 3. To further verify the statistical reliability of this advantage, a paired significance test is conducted on the performance scores between FACT and each benchmark model. The results are outlined in Table 2.

Fig. 3

Comparison of different models on the En-De translation task.

Table 2 Significance test.

In Fig. 3, the FACT model proposed in this work outperforms other comparative models in both BLEU and Meteor scores. In the En-De translation tasks on the test16, test17, and MS COCO datasets, the FACT model achieves BLEU scores of 41.3, 32.8, and 29.6, respectively, which are significantly higher than those of the baseline models. In terms of Meteor scores, the FACT model also performs excellently, reaching 58.1, 52.6, and 49.6, outperforming other models. Although the performance of each model varies across different datasets, the FACT model consistently maintains leadership in BLEU and Meteor metrics, demonstrating its advantages in multimodal machine translation. Combined with Table 2, the p values of FACT compared with Transformer, LMMT, and DMMT are all less than 0.005, indicating highly significant performance differences. The p values with TMMT and IMMT are 0.015 and 0.028, respectively, below the conventional significance level of 0.05. This demonstrates that FACT’s performance advantages are statistically significant. The statistical results reveal that FACT remarkably outperforms all comparative methods in overall translation performance, fully confirming its effectiveness and advancement in MNMT. This is because the FACT model introduces two key innovations in structural design and modeling strategies compared to baseline models. On one hand, in future context information modeling, FACT leverages an attention-based future information guidance module to explicitly model the interaction among future target-side words, current source language, and visual features. Thus, it optimizes the directionality and contextual coherence of translation generation, which has not been systematically addressed in existing models. On the other hand, in multimodal consistency mechanisms, FACT constructs a loss function, strengthening the collaborative expressive capability between visual and linguistic modalities. By aligning the semantic space projections of images and texts, the collaborative expression ability between visual and language modalities is strengthened, and the robustness and generalization of image-text semantic fusion are improved. These two mechanisms complement each other, enabling FACT to outperform existing models in the granularity of information modeling and the depth of semantic alignment, significantly leading in multiple evaluation metrics such as BLEU and Meteor.

  1. (2)

    Ablation experiment

Ablation experiments are conducted by creating the FACT model’s variants to explore how this model integrates visual features to enhance translation performance. Table 3 lists the model variants.

Table 3 Names and descriptions of model variants.

Figure 4 demonstrates the results of ablation experiments on the En-De translation task, including BLEU and Meteor scores for the FACT model, three variant models, and the Transformer model. The “Transformer” in Fig. 4 is a pure text model without any image information or consistency modeling, serving as a baseline control.

Fig. 4
figure 4

Ablation experiment results on En-De translation task.

Figure 4 reveals that for the En-De translation task, the BLEU and Meteor scores of the FACT model decrease when either the future target context information supervision function \({L}_{fd}\), or the multimodal consistency loss function \({L}_{md}\) is decreased. When both \({L}_{fd}\) and \({L}_{md}\) are removed, the FACT model’s performance experiences the largest drop, but it still outperforms the Transformer model. Specifically, the BLEU scores decline by 6.05%, 8.23%, and 9.46% on the test16, test17, and MS COCO datasets. The Meteor scores decrease by 4.3%, 5.7%, and 7.86%, respectively. These results indicate that the future target context information and the multimodal consistency loss function remarkably influence the FACT model’s translation performance.

Ablation experiments are also performed on the En-Fr and En-Cs translation tasks to verify the FACT model’s generalization ability. Figures 5 and 6 show the results.

Fig. 5
figure 5

Ablation experiment results on En-Fr translation task.

Fig. 6
figure 6

Ablation experiment results on En-Cs translation task.

The results of the En-Fr translation task exhibit a similar pattern to the En-De findings. Both the future target context information supervision function \({L}_{fd}\) and the multimodal consistency loss function \({L}_{md}\) are deactivated. In this case, the FACT model achieves BLEU scores of 60.1, 53.0, and 43.8, and Meteor scores of 74.8, 70.1, and 63.7 on the test16, test17, and MS COCO datasets, respectively. These scores are all higher than those of the Transformer model.

Figure 6 shows that the results of the En-Cs translation task on the test2016 dataset are consistent with those of the En-De and En-Fr translation tasks. When the future target context information supervision function \({L}_{fd}\) and the multimodal consistency loss function \({L}_{md}\) are removed, the FACT model achieves BLEU and Meteor scores of 31.7 and 51.8, both exceeding those of the Transformer model. The results from En-Fr and En-Cs translation tasks further confirm that the FACT model can leverage multimodal consistency to learn future target context information, thus enhancing the performance of MMT.

  1. (3)

    Impact of sentence length on model performance

The generated sentence lengths and BLEU scores for the FACT and Transformer models on the En-De translation task across the test16 and test17 datasets under varying source language sentence lengths are compared. Figure 7 presents the results.

Fig. 7
figure 7

Performance comparison of models at different source sentence lengths.

Figure 7 shows that as the length of the source language sentence increases, the FACT model demonstrates a significant advantage in translation quality compared to the Transformer model. In the En-De translation task, the FACT model achieves a BLEU score of 44.1 for short sentences (0–10 words), outperforming the Transformer’s 41.0. The translated sentence length is relatively short, with FACT producing a length of 8.4 and Transformer 8.2. As the source sentence length grows, the FACT model’s translation quality advantage becomes even more pronounced. Additionally, its generated translation lengths adapt to the increase in source sentence length, producing more reasonable translation lengths for longer sentences. This indicates the model’s strong handling of long sentences. These findings demonstrate that the FACT model can more effectively predict future context when handling long sentence translation tasks, thereby improving translation quality.

  1. (4)

    Impact of Model on Learning Investment Effect

To explore the effectiveness of the FACT model, experiments are conducted to evaluate its application in language learning. Figure 8 compares the learning process quality, learning efficiency, and learning outcomes between FACT and Transformer models.

Fig. 8
figure 8

Comparison of model impact on learning investment effect.

Figure 8 suggests that the FACT model exhibits a distinct advantage over the Transformer model in language learning tasks. Specifically, it outperforms Transformer across multiple metrics, including learning efficiency, translation quality, user satisfaction, and understanding improvement. The learning efficiency of FACT is 83.2 words per hour, compared to 74.6 words per hour for the Transformer, highlighting FACT’s potential to accelerate the learning process. Additionally, FACT achieves a translation quality score of 82.7, higher than the Transformer’s 78.9, indicating its superior performance in translation quality. It also scores higher in both user satisfaction and understanding improvement. Overall, the FACT model offers higher efficiency and better learning outcomes in language learning tasks, demonstrating significant application potential.



Source link

Continue Reading

Brand Stories

Studying a galaxy far, far away could become easier with help from AI, says researcher

Published

on

By


Youssef Zaazou graduated with a master’s of science from the Memorial University of Newfoundland in 2025. (Memorial University/Richard Blenkinsopp)

A recent Memorial University of Newfoundland graduate says his research may help study galaxies more efficiently — with help from Artificial Intelligence.

As part of Youssef Zaazou’s master’s of science, he developed an AI-based image-processing technique that generates predictions of what certain galaxies may look like in a given wavelength of light.

“Think of it as translating galaxy images across different wavelengths of light,” Zaazou told CBC News over email.

He did this by researching past methods for similar tasks, adapting current AI tools for his specific purposes, finding and curating the right dataset to train the models, along with plenty of trial and error.

“Instead of … having to look at an entire region of sky, we can get predictions for certain regions and figure out, ‘Oh this might be interesting to look at,'” said Zaazou. “So we can then prioritize how we use our telescope resources.”

An excerpt from Zaazou's research showing green light inputs to the model, outputs of the model in red light, the true value of the red light the model aims to replicate, and the difference between rows two and three.

Zaazou developed an AI-based image-processing technique that generates predictions of what certain galaxies may look like in a given wavelength of light. (Submitted by Youssef Zaazou)

Zaazou recently teamed up with his supervisors Terrence Tricco and Alex Bihlo to co-author a paper on his research in The Astrophysical Journal, which is published by The American Astronomical Society.

Tricco says this research could also help justify allocation of high-demand telescopes like the Hubble Space Telescope, which has a competitive process to assign its use.

A future for AI in astronomy

Both Tricco and Zaazou emphasised the research does not use AI to replace current methods but to augment them.

Tricco says that Zaazou’s findings have the potential to help guide future telescope development, and predict what astronomers might expect to see, making for more efficient exploration.

Calling The Astrophysical Journal the “gold standard” for astronomy journals in the world, Tricco hopes the wider astronomical community will take notice of Zaazou’s findings.

“We want to have them be aware of this because as I was mentioning, AI, machine learning, and physics, astronomy, it’s still very new for physicists and for astronomers, and they’re a little bit hesitant about these tools,” said Tricco.

Terrence Tricco is an Assistant Professor in the Department of Computer Science at the Memorial University of Newfoundland.

Terrence Tricco, an assistant professor at MUN’s Department of Computer Science , says Zaazou’s findings have the potential to help guide future telescope development. (Submitted by Terrence Tricco )

Tricco praised the growing presence of space research in general at Memorial University.

“We are here, we’re doing great research,” he said.

He added growing AI expertise is also transferable to other disciplines.

“I think that builds into our just tech ecosystem here as well.”

‘Only the beginning’

Though Zaazou’s time as a Memorial University student is over, he hopes to see research in this area continue to grow.

“I’m hoping this is the beginning of further research to be done,” he said.

Though Zaazou described his contribution to the field as merely a “pebble,” he’s happy to have been able to do his part.

“I’m an astronomer. And it just feels great to be able to say that and to be able to have that little contribution because I just love the field and I’m fascinated by everything out there,” said Zaazou.

Download our free CBC News app to sign up for push alerts for CBC Newfoundland and Labrador. Sign up for our daily headlines newsletter here. Click here to visit our landing page.



Source link

Continue Reading

Brand Stories

‘You can make really good stuff – fast’: new AI tools a gamechanger for film-makers | Artificial intelligence (AI)

Published

on


A US stealth bomber flies across a darkening sky towards Iran. Meanwhile, in Tehran a solitary woman feeds stray cats amid rubble from recent Israeli airstrikes.

To the uninitiated viewer, this could be a cinematic retelling of a geopolitical crisis that unfolded barely weeks ago – hastily shot on location, somewhere in the Middle East.

However, despite its polished production look, it wasn’t shot anywhere, there is no location, and the woman feeding stray cats is no actor – she doesn’t exist.

Midnight Drop, an AI film depicting US-Israeli bombings in Iran

The engrossing footage is the “rough cut” of a 12-minute short film about last month’s US attack on Iranian nuclear sites, made by the directors Samir Mallal and Bouha Kazmi. It is also made entirely by artificial intelligence.

The clip is based on a detail the film-makers read in news coverage of the US bombings – a woman who walked the empty streets of Tehran feeding stray cats. Armed with the information, they have been able to make a sequence that looks as if it could have been created by a Hollywood director.

The impressive speed and, for some, worrying ease with which films of this kind can be made has not been lost on broadcasting experts.

Last week Richard Osman, the TV producer and bestselling author, said that an era of entertainment industry history had ended and a new one had begun – all because Google has released a new AI video making tool used by Mallal and others.

A still from Midnight Drop, showing the woman who feeds stray cats in Tehran in the dead of night. Photograph: Oneday Studios

“So I saw this thing and I thought, ‘well, OK that’s the end of one part of entertainment history and the beginning of another’,” he said on The Rest is Entertainment podcast.

Osman added: “TikTok, ads, trailers – anything like that – I will say will be majority AI-assisted by 2027.”

For Mallal, a award-winning London-based documentary maker who has made adverts for Samsung and Coca-Cola, AI has provided him with a new format – “cinematic news”.

The Tehran film, called Midnight Drop, is a follow-up to Spiders in the Sky, a recreation of a Ukrainian drone attack on Russian bombers in June.

Within two weeks, Mallal, who directed Spiders in the Sky on his own, was able to make a film about the Ukraine attack that would have cost millions – and would have taken at least two years including development – to make pre-AI.

“Using AI, it should be possible to make things that we’ve never seen before,” he said. “We’ve never seen a cinematic news piece before turned around in two weeks. We’ve never seen a thriller based on the news made in two weeks.”

Spiders in the Sky was largely made with Veo3, an AI video generation model developed by Google, and other AI tools. The voiceover, script and music were not created by AI, although ChatGPT helped Mallal edit a lengthy interview with a drone operator that formed the film’s narrative spine.

Film-maker recreates Ukrainian drone attack on Russia using AI in Spiders in the Sky

Google’s film-making tool, Flow, is powered by Veo3. It also creates speech, sound effects and background noise. Since its release in May, the impact of the tool on YouTube – also owned by Google – and social media in general has been marked. As Marina Hyde, Osman’s podcast partner, said last week: “The proliferation is extraordinary.”

Quite a lot of it is “slop” – the term for AI-generated nonsense – although the Olympic diving dogs have a compelling quality.

Mallal and Kazmi aim to complete the film, which will intercut the Iranian’s story with the stealth bomber mission and will be six times the length of Spider’s two minutes, in August. It is being made by a mix of models including Veo3, OpenAI’s Sora and Midjourney.

“I’m trying to prove a point,” says Mallal. “Which is that you can make really good stuff at a high level – but fast, at the speed of culture. Hollywood, especially, moves incredibly slowly.”

skip past newsletter promotion

Spiders in the Sky, an AI film directed by Samir Mallal, tells the story of Ukraine’s drone attacks on Russian airfields. Photograph: Oneday Studios

He adds: “The creative process is all about making bad stuff to get to the good stuff. We have the best bad ideas faster. But the process is accelerated with AI.”

Mallal and Kazmi also recently made Atlas, Interrupted, a short film about the 3I/Atlas comet, another recent news event, that has appeared on the BBC.

David Jones, the chief executive of Brandtech Group, an advertising startup using generative AI – the term for tools such as chatbots and video generators – to create marketing campaigns, says the advertising world is about to undergo a revolution due to models such as Veo3.

“Today, less than 1% of all brand content is created using gen AI. It will be 100% that is fully or partly created using gen AI,” he says.

Netflix also revealed last week that it used AI in one of its TV shows for the first time.

A Ukrainian drone homes in on its target in Spiders in the Sky. Photograph: Oneday Studios

However, in the background of this latest surge in AI-spurred creativity lies the issue of copyright. In the UK, the creative industries are furious about government proposals to let models be trained on copyright-protected work without seeking the owner’s permission – unless the owner opts out of the process.

Mallal says he wants to see a “broadly accessible and easy-to-use programme where artists are compensated for their work”.

Beeban Kidron, a cross-bench peer and leading campaigner against the government proposals, says AI film-making tools are “fantastic” but “at what point are they going to realise that these tools are literally built on the work of creators?” She adds: “Creators need equity in the new system or we lose something precious.”

YouTube says its terms and conditions allow Google to use creators’ work for making AI models – and denies that all of YouTube’s inventory has been used to train its models.

Mallal calls his use of AI to make films “prompt craft”, a phrase that uses the term for giving instructions to AI systems. When making the Ukraine film, he says he was amazed at how quickly a camera angle or lighting tone could be adjusted with a few taps on a keyboard.

“I’m deep into AI. I’ve learned how to prompt engineer. I’ve learned how to translate my skills as a director into prompting. But I’ve never produced anything creative from that. Then Veo3 comes out, and I said, ‘OK, finally, we’re here.’”



Source link

Continue Reading

Trending

Copyright © 2025 AISTORIZ. For enquiries email at prompt@travelstoriz.com