Ai models show reduced risk-taking when emulating women, revealing embedded gender biases

AI Models Become More Cautious When Told to Emulate Women, Study Finds

Artificial intelligence systems are increasingly being evaluated not just for their capabilities, but for how closely they mirror human behavior. A new academic study has revealed that when large language models (LLMs) are instructed to “think like women,” their propensity to take risks drops significantly. Conversely, when prompted to act as men, these same models tend to exhibit bolder, more risk-tolerant behavior.

The research, conducted by Allameh Tabataba’i University in Tehran, Iran, involved testing AI models from major tech companies such as OpenAI, Google, Meta, and DeepSeek. The findings suggest that LLMs are not only shaped by their training data but also adapt their decision-making strategies based on the perceived gender identity they’re asked to adopt.

Among the tested models, Google’s Gemini 2.0 Flash-Lite and DeepSeek Reasoner displayed the most notable shifts in behavior, becoming significantly more risk-averse when simulating female perspectives. These changes were observed particularly in scenarios involving financial decisions, where the models were asked to choose between safe investments and higher-risk, high-reward options.

The study offers compelling evidence that AI does more than process data — it reflects social biases. It suggests that LLMs internalize patterns from their training inputs, including stereotypical gender behaviors. When prompted to act “like a woman,” these systems tended to opt for safer, more conservative choices. When simulating a male point of view, they leaned toward high-stakes decisions.

This dynamic raises critical questions about the design and deployment of AI technologies. If AI is being used to assist with decisions in finance, healthcare, or law enforcement, could it unintentionally reinforce gender stereotypes? The researchers argue that understanding these behavioral shifts is essential in building fairer, more equitable AI systems.

The implications of this study go beyond academic curiosity. In real-world applications — such as robo-advisors for investment, automated customer service, or even mental health counseling — AI’s ability to mimic human preferences and biases can have tangible consequences. A system that is more risk-averse when dealing with female-related queries could suggest different financial products or make different recommendations compared to male-oriented inputs.

Moreover, the study underscores the importance of transparency in AI development. Companies that train and release large language models need to be aware of how subtle cues — like gender assumptions — can influence outcomes. This isn’t just an issue of fairness; it’s a matter of accuracy and effectiveness in AI-driven systems.

Another aspect of the research worth noting is that not all models exhibited the same level of behavioral change. Some LLMs showed only minor variations when prompted with different gender identities, while others shifted dramatically. This suggests that the architecture of the model, its training data, and the fine-tuning methods used all play a role in how susceptible it is to social cues.

These findings also open up broader conversations about the role of identity in AI-human interactions. Should AI systems be designed to reflect human diversity, or should they strive for neutrality? And what does neutrality even mean in a world where data itself is not free from bias?

There’s also the question of whether AI should be allowed to assume human-like identities at all. When an AI is asked to “think like a woman” or “act like a man,” it implies that gender can be distilled into predictable behavioral patterns — a deeply problematic assumption in itself.

To mitigate these risks, AI developers are being urged to adopt more inclusive and balanced datasets, and to implement testing protocols that examine how models respond to identity-based prompts. Ethical frameworks and bias-detection tools are also being developed to help ensure that AI behaviors remain aligned with human values — and not just human prejudices.

Additionally, there’s growing interest in interdisciplinary approaches that bring together computer scientists, sociologists, psychologists, and ethicists. By working collaboratively, these experts can better understand the nuanced ways in which AI reflects and amplifies social norms.

Crucially, the study highlights that AI is not inherently neutral. It is shaped by the data it consumes and the instructions it receives. As such, its behavior — whether cautious or bold — is a mirror of the society that built it.

For businesses and policymakers, the takeaway is clear: AI systems must be scrutinized not just for what they can do, but for how and why they do it. Ignoring these behavioral nuances could lead to the perpetuation of stereotypes and systemic inequality, even in highly automated environments.

To address these challenges, training protocols could be redesigned to include counter-stereotypical data and scenarios. This would help models learn that risk tolerance is not inherently tied to gender — or any other identity marker. Additionally, user prompts could be structured in a way that discourages the association of behavior with identity, instead focusing on context-based reasoning.

In conclusion, while AI continues to revolutionize industries and redefine the boundaries of automation, its intersection with human identity remains a complex frontier. The Tehran study serves as a timely reminder that as we teach machines to think, we must also teach them to think responsibly.