Brand Stories

‘Sooner or later…’: Paytm founder sees impact of AI on jobs as ‘inevitable’; says artificial intelligence will also create new roles

Published

1 week ago

July 12, 2025

The impact of artificial intelligence (AI) on jobs is “inevitable”, according to Paytm founder Vijay Shekhar Sharma, who said AI will become part of routine business processes sooner rather than later. “Sooner or later we will have to start using AI as an employee or even as a CFO,” Sharma said at an AI-focused event in New Delhi.He said AI will eventually perform most tasks currently done by humans, and stressed the need to build core products instead. Sharma also outlined a broader vision for the company — to move beyond its fintech roots and become an AI-first firm, according to an ET report.Although AI will automate several human functions, Sharma said it will also create fresh roles in the workforce.The remarks come at a time when Paytm’s parent, One97 Communications, is undergoing job cuts. ET reported last month that the company had laid off an unspecified number of employees. The overall sales team headcount dropped by about 3,500 in the March 2024 quarter, taking the total to 36,521. This decline was largely linked to the Reserve Bank of India’s restrictions on Paytm Payments Bank operations.In the March quarter, Paytm posted a consolidated net loss of Rs 540 crore, compared with Rs 550 crore in the year-ago period.In June, Paytm stood third in the UPI rankings with 1.27 billion transactions worth Rs 1.34 lakh crore — accounting for 6.9% of total volume and 5.6% of value. PhonePe remained in the lead with 8.55 billion transactions valued at Rs 11.99 lakh crore.As part of its AI initiatives, Sharma announced a new passbook feature that would use Paytm’s data to generate a rap song summarising monthly expenses. Though no launch date has been shared, Sharma said it would be made available to users soon.Paytm had earlier partnered with US-based AI startup Perplexity to introduce AI-powered search capabilities in its app.

Source link

Up Next

Bollywood Meets AI: How Artificial Intelligence Is Shaking Up India’s Entertainment Scene

Don't Miss

Argius Global Secures Additional 1.5 Billion Euro in Commitments, Expands Private Equity Strategy into Aerospace, Tech-Defense, and Artificial Intelligence Sectors – Barchart.com

TOI Business Desk

Click to comment

You must be logged in to post a comment Login

Brand Stories

Artificial Intelligence on the Battlefield in 2025 – The Jerusalem Post

Published

40 minutes ago

July 20, 2025

Admin 5

Artificial Intelligence on the Battlefield in 2025 The Jerusalem Post

Source link

Brand Stories

The analysis of learning investment effect for artificial intelligence English translation model based on deep neural network

Published

2 hours ago

July 20, 2025

Shuangshuang Lyu

Datasets collection

This experiment employs two widely recognized standard datasets in MMT: Multi30K and Microsoft Common Objects in Context (MS COCO)^27,28. The Multi30K dataset comprises image-text pairs spanning various domains and is commonly used for image caption generation and multimodal translation tasks. The dataset contains three language pairs: English to German (En-De), English to French (En-Fr), and English to Czech (En-Cs). Specifically, the Multi30K training set encompasses 29,000 bilingual parallel sentence pairs, 1000 validation samples, and 1000 test samples. Each sentence is paired with an image to ensure the consistency between the text description and the image content, thus providing high-quality multimodal data for model training. The test16 and test17 datasets are used here. MS COCO is a dataset containing a wide range of images and their descriptions, extensively used in multiple tasks in computer vision and NLP. Beyond its established role as a standard benchmark for image captioning evaluation, the dataset’s rich semantic annotations make it particularly suitable for assessing model performance in cross-domain and cross-lingual translation scenarios.

Experimental environment

This experiment utilizes the Fairseq toolkit built upon the PyTorch framework. Fairseq is an open-source toolkit widely used in NLP tasks, particularly for constructing and training MT models. It supports various model architectures, including RNNs, convolutional neural networks, and Transformers, enabling effective performance enhancement in MT tasks. Based on Fairseq, the experimental model framework can be easily constructed, and the corresponding training tasks can be configured. The toolkit provides efficient parallel computing support and optimized training workflows, enabling effective large-scale model training.

Parameters setting

Table 1 exhibits the parameter settings for the experiment.

Table 1 Experimental parameter settings.

Two evaluation metrics, Bilingual Evaluation Understudy (BLEU) and Meteor, are used to comprehensively evaluate the performance of the FACT model^29,30,31. These two metrics are among the most commonly used and representative automated evaluation tools in the current field of MT research. They have been widely applied in authoritative translation evaluation tasks such as the Workshop on Machine Translation (WMT), and have good universality and reliability. BLEU measures translation quality by calculating the n-gram match between the translated text and the reference answer. Specifically, BLEU calculates the precision of n-grams in the translated text, and its equation is as follows:

$${P}_{n}=\frac{{c}_{n}}{{r}_{n}}$$

(18)

${P}_{n}$ refers to the n-gram precision; ${c}_{n}$ represents the number of times the n-gram units in the translation match those in the reference answer; ${r}_{n}$ denotes the total number of n-gram units in the translation. The final BLEU score of the translation is the weighted average of the precision for each n-gram unit, which can be written as:

$$BLEU=\text{exp}\left(\sum_{n=1}^{N}{\omega }_{n}\text{log}{P}_{n}\right)$$

(19)

${\omega }_{n}$ is the weighting factor for each n-gram unit. To avoid giving overly high scores to shorter translations, BLEU introduces a brevity penalty (BP) to adjust the score. The calculation of BP reads:

$$BP=\left\{\begin{array}{l}1,\quad if\,r\ge c\\ \text{exp}(1-\frac{r}{c}),\quad if\, r

(20)

r and c represent the length of the reference and candidate translations. The final BLEU score is obtained by combining the BP of short sentences with the weighted average of n-gram precision, as follows:

$$BLEU=BP\cdot \text{exp}(\sum_{n=1}^{N}{\omega }_{n}\text{log}{P}_{n})$$

(21)

The advantages of BLEU lie in its simplicity and speed of computation, making it suitable for large-scale evaluations. However, it relies solely on lexical-level matching, neglecting linguistic features such as semantic similarity and syntactic variations. As a result, it demonstrates limited effectiveness when handling synonyms, word order changes, or translations that maintain semantic consistency but are expressed differently.

In contrast to BLEU, Meteor adopts a word alignment-based evaluation method, which better considers semantic information and word order. Meteor establishes a one-to-one correspondence between the words in the candidate translation and the reference translation to calculate precision and recall. The expression is as follows:

$$P=\frac{{m}_{w}}{{M}_{hypothesis}}$$

(22)

$$R=\frac{{m}_{w}}{{N}_{reference}}$$

(23)

P represents the proportion of words in the translation that match the reference words; ${m}_{w}$ denotes the number of matched words; ${M}_{hypothesis}$ and ${N}_{reference}$ refer to the total number of words in the translation and the reference. R implies the proportion of words in the reference that match the words in the translation. Meteor calculates an F1 score by combining precision and recall, and gives higher weight to recall. The equation is as follows:

$${F}_{\beta }=\frac{(1+{\beta }^{2})\cdot P\cdot R}{{\beta }^{2}\cdot P+R}$$

(24)

$\beta$ controls the weight between precision and recall. To better handle word order issues, Meteor also introduces a chunking mechanism that penalizes translations with word order mismatches, as given in Eq. (25):

$$Penalty=\frac{{C}_{hypothesis}}{{C}_{reference}}$$

(25)

${C}_{hypothesis}$ and ${C}_{reference}$ represent the number of chunks in the translated text and the reference answer, respectively. The final Meteor score combines the F1 score with the word order penalty, and is calculated using Eq. (26):

$$Meteor \,Score={F}_{\beta }-Penalty$$

(26)

Compared to BLEU, Meteor places greater emphasis on translation fluency, semantic retention, and linguistic naturalness, thus generally exhibiting higher correlation with human evaluation in simulations. By employing both BLEU and Meteor metrics simultaneously, a comprehensive evaluation of the FACT model’s translation performance can be conducted from two dimensions: formal accuracy and semantic acceptability. This makes a more authentic reflection of its practical effectiveness in MMT.

Performance evaluation

(1)

Comparison of model performance

Five representative baseline models are selected for comparison to comprehensively evaluate the performance of the proposed FACT model in MNMT tasks. These models are Transformer, Latent Multimodal Machine Translation (LMMT), Dynamic Context-Driven Capsule Network for Multimodal Machine Translation (DMMT), Target-modulated Multimodal Machine Translation (TMMT), and Imagined Representation for Multimodal Machine Translation (IMMT). Among them, the Transformer model is a classic architecture in MT and, as a pure text baseline model, effectively verifies the performance gains brought by multimodal mechanisms. LMMT uses latent variables to model multimodal interactions, emphasizing the potential semantic expressive power of image-text fusion in the latent space. DMMT introduces a dynamic context capsule mechanism to enhance semantic coupling between modalities during translation. TMMT guides visual information to participate in the translation generation process under a target modulation mechanism, improving target alignment between modalities. IMMT attempts to use an “imagination” mechanism to generate intermediate image representations for assisting semantic understanding and translation generation. All of the above models are representative methods in recent MNMT research, with strong representativeness and comparability. The primary reasons for not including large multimodal language models such as Generative Pre-trained Transformer 4 omni (GPT-4o) or Large Language and Vision Assistant (LLaVA) in this experiment are as follows. (1) These models are closed-source or commercialized, making fair comparisons under unified datasets and parameter configurations difficult; (2) Their training data and computing resources far exceed those accessible to the FACT model, rendering direct comparability infeasible; (3) FACT prioritizes structural lightness, training efficiency, and language learning adaptability over scale advantages. The above publicly structured and representative multimodal translation models are selected for horizontal comparison to ensure fair comparisons under unified datasets and parameter configurations. This enables more objective validation of the FACT model’s performance advantages in semantic consistency modeling and future context information guidance. The BLEU and Meteor evaluation results of each model on the En-De translation task are depicted in Fig. 3. To further verify the statistical reliability of this advantage, a paired significance test is conducted on the performance scores between FACT and each benchmark model. The results are outlined in Table 2.

Fig. 3

Comparison of different models on the En-De translation task.

Table 2 Significance test.

In Fig. 3, the FACT model proposed in this work outperforms other comparative models in both BLEU and Meteor scores. In the En-De translation tasks on the test16, test17, and MS COCO datasets, the FACT model achieves BLEU scores of 41.3, 32.8, and 29.6, respectively, which are significantly higher than those of the baseline models. In terms of Meteor scores, the FACT model also performs excellently, reaching 58.1, 52.6, and 49.6, outperforming other models. Although the performance of each model varies across different datasets, the FACT model consistently maintains leadership in BLEU and Meteor metrics, demonstrating its advantages in multimodal machine translation. Combined with Table 2, the p values of FACT compared with Transformer, LMMT, and DMMT are all less than 0.005, indicating highly significant performance differences. The p values with TMMT and IMMT are 0.015 and 0.028, respectively, below the conventional significance level of 0.05. This demonstrates that FACT’s performance advantages are statistically significant. The statistical results reveal that FACT remarkably outperforms all comparative methods in overall translation performance, fully confirming its effectiveness and advancement in MNMT. This is because the FACT model introduces two key innovations in structural design and modeling strategies compared to baseline models. On one hand, in future context information modeling, FACT leverages an attention-based future information guidance module to explicitly model the interaction among future target-side words, current source language, and visual features. Thus, it optimizes the directionality and contextual coherence of translation generation, which has not been systematically addressed in existing models. On the other hand, in multimodal consistency mechanisms, FACT constructs a loss function, strengthening the collaborative expressive capability between visual and linguistic modalities. By aligning the semantic space projections of images and texts, the collaborative expression ability between visual and language modalities is strengthened, and the robustness and generalization of image-text semantic fusion are improved. These two mechanisms complement each other, enabling FACT to outperform existing models in the granularity of information modeling and the depth of semantic alignment, significantly leading in multiple evaluation metrics such as BLEU and Meteor.

(2)

Ablation experiment

Ablation experiments are conducted by creating the FACT model’s variants to explore how this model integrates visual features to enhance translation performance. Table 3 lists the model variants.

Table 3 Names and descriptions of model variants.

Figure 4 demonstrates the results of ablation experiments on the En-De translation task, including BLEU and Meteor scores for the FACT model, three variant models, and the Transformer model. The “Transformer” in Fig. 4 is a pure text model without any image information or consistency modeling, serving as a baseline control.

Figure 4 reveals that for the En-De translation task, the BLEU and Meteor scores of the FACT model decrease when either the future target context information supervision function ${L}_{fd}$, or the multimodal consistency loss function ${L}_{md}$ is decreased. When both ${L}_{fd}$ and ${L}_{md}$ are removed, the FACT model’s performance experiences the largest drop, but it still outperforms the Transformer model. Specifically, the BLEU scores decline by 6.05%, 8.23%, and 9.46% on the test16, test17, and MS COCO datasets. The Meteor scores decrease by 4.3%, 5.7%, and 7.86%, respectively. These results indicate that the future target context information and the multimodal consistency loss function remarkably influence the FACT model’s translation performance.

Ablation experiments are also performed on the En-Fr and En-Cs translation tasks to verify the FACT model’s generalization ability. Figures 5 and 6 show the results.

The results of the En-Fr translation task exhibit a similar pattern to the En-De findings. Both the future target context information supervision function ${L}_{fd}$ and the multimodal consistency loss function ${L}_{md}$ are deactivated. In this case, the FACT model achieves BLEU scores of 60.1, 53.0, and 43.8, and Meteor scores of 74.8, 70.1, and 63.7 on the test16, test17, and MS COCO datasets, respectively. These scores are all higher than those of the Transformer model.

Figure 6 shows that the results of the En-Cs translation task on the test2016 dataset are consistent with those of the En-De and En-Fr translation tasks. When the future target context information supervision function ${L}_{fd}$ and the multimodal consistency loss function ${L}_{md}$ are removed, the FACT model achieves BLEU and Meteor scores of 31.7 and 51.8, both exceeding those of the Transformer model. The results from En-Fr and En-Cs translation tasks further confirm that the FACT model can leverage multimodal consistency to learn future target context information, thus enhancing the performance of MMT.

(3)

Impact of sentence length on model performance

The generated sentence lengths and BLEU scores for the FACT and Transformer models on the En-De translation task across the test16 and test17 datasets under varying source language sentence lengths are compared. Figure 7 presents the results.

Figure 7 shows that as the length of the source language sentence increases, the FACT model demonstrates a significant advantage in translation quality compared to the Transformer model. In the En-De translation task, the FACT model achieves a BLEU score of 44.1 for short sentences (0–10 words), outperforming the Transformer’s 41.0. The translated sentence length is relatively short, with FACT producing a length of 8.4 and Transformer 8.2. As the source sentence length grows, the FACT model’s translation quality advantage becomes even more pronounced. Additionally, its generated translation lengths adapt to the increase in source sentence length, producing more reasonable translation lengths for longer sentences. This indicates the model’s strong handling of long sentences. These findings demonstrate that the FACT model can more effectively predict future context when handling long sentence translation tasks, thereby improving translation quality.

(4)

Impact of Model on Learning Investment Effect

To explore the effectiveness of the FACT model, experiments are conducted to evaluate its application in language learning. Figure 8 compares the learning process quality, learning efficiency, and learning outcomes between FACT and Transformer models.

Figure 8 suggests that the FACT model exhibits a distinct advantage over the Transformer model in language learning tasks. Specifically, it outperforms Transformer across multiple metrics, including learning efficiency, translation quality, user satisfaction, and understanding improvement. The learning efficiency of FACT is 83.2 words per hour, compared to 74.6 words per hour for the Transformer, highlighting FACT’s potential to accelerate the learning process. Additionally, FACT achieves a translation quality score of 82.7, higher than the Transformer’s 78.9, indicating its superior performance in translation quality. It also scores higher in both user satisfaction and understanding improvement. Overall, the FACT model offers higher efficiency and better learning outcomes in language learning tasks, demonstrating significant application potential.

Source link

Brand Stories

Artificial Intelligence topic of chamber discussion (VIDEO)

Published

2 hours ago

July 20, 2025

Jim Sabastian

MONTICELLO – The Sullivan County Chamber of Commerce, in partnership with the Orthodox Jewish Chamber of Commerce, the Brooklyn Chamber of Commerce, and the Greater New York Chamber of Commerce, is hosting a cross-chamber mixer and panel to explore the application of AI in business on Wednesday, July 30th.

The event, themed “Keeping Your Business Current in the Age of AI,” will run from 5 p.m. to 7:30 p.m. at The Kartrite Resort in Monticello.

Sullivan Chamber President Ashley Leavitt highlighted the goal of the event, stating, “We’re coming together as one, not only to network and mingle and mix as all business owners and entrepreneurs and whatever else. But also to transparently talk about AI because technology is changing the business game.”

Leavitt emphasized the importance of understanding AI in business, noting, “There are a lot of things that if you’re not keeping up to dating in technology, you’re falling behind, and there’s a lot of things that can be streamlined that’ll save our small businesses a lot of time, energy and money with the new technology. So staying up to date and how they can automate responses or how they can automate some, you know, like QuickBooks processes and stuff of that sort.”

For more information about this event, search for “keeping your business current in the age of AI” or visit catskills.com directly through the Chamber’s website.

Source link