Computer helps translate gap between ‘he said, she said'


“I am from Mars,'' he e-mailed his friend.

“I am from Venus,'' she replied.

“You can say that again,'' their computers interjected.

Women and men don't just talk past each other like planets in different orbits. They write differently too. Leave it to the sexless computers to figure this out.

Scientists in recent years have documented the fabled male-female communications gap - source of age-old conflict and confusion between the sexes - when it comes to spoken language. Now, employing a computer program that can scan text for sex-specific grammar, sentence structure, word preference, and other features, they have discovered distinct differences between the writing of men and women, as well.

The findings eventually might have far-reaching practical applications for improving commercial and workplace communication, making textbooks more effective, or even helping police identify the authors of ransom notes or otherwise solve crimes.

“We have shown convincingly that gender differences in language use do exist,'' said Dr. Shlomo Argamon of the Illinois Institute of Technology whose research team developed the computer program, aptly named Winnow, that can pinpoint the sex of an anonymous author with more than 80 percent accuracy.

The Winnow program uses text categorization technology similar to that of Internet search engines, which can retrieve specific documents containing keywords from vast oceans of data, and it searches for key differences in writing style between men and women.

U.S. and Israeli researchers using Winnow have started publishing their findings in obscure technical publications like Literary and Linguistic Computing. The latest report appears in the current edition of the journal Text.

“We think these differences are surprising and significant,'' Dr. Argamon said. “Most of the research done on communication styles has involved oral communications and our work is moving into new territory.''

Dr. Deborah Tannen, a professor of linguistics at Georgetown University whose pioneering work includes the 1990 book You Just Don't Understand: Women and Men in Conversation, agreed. “Relatively little research has been done on differences in written communication styles,'' Dr. Tannen said in an interview.

“If such differences exist in written language use between genders, I'm not surprised that a computer program can pick them up,'' said Dr. Kenneth R. Koedinger of the Human-Computer Interaction Center at Carnegie Mellon University in Pittsburgh.

From the standpoint of linguistics, the study of language, letters, and e-mail are similar to conversation. They involve a lot of “social interaction,” with the writer essentially “talking” to a specific individual.

“Formal written texts such as books and articles, on the other hand, are intended for a broad unseen audience,” Dr. Argamon explained. Exceptions do exist, such as romance and other genre books intentionally written for a specific audience. Researchers, however, expected the differences found in oral communication style and informal writing would disappear in formal writing.

That, however, was not the case.

In one study, for instance, they analyzed more than 600 documents in the British National Corpus ( It is a 100-million word collection of books and other documents, written since 1960, that represents a cross-section of current British English.

Winnow checked for certain linguistic patterns or “determiners” - such as word use and sentence structure - which it had developed from extensive “reading” and analysis of documents known to be written by men or women.

Women, for instance, are more likely to use the word she and words indicating relationships such as for, and, in, with, and not.

Men heavily favor words like one, two, some, more, a, that, its, and the, which specify the number or properties of objects. Women personalize their writing more than men, using more personal pronouns that identify the gender of things. They favor pronouns like I, you, she, her, their, myself, yourself and herself. Males favor impersonal pronouns, such as it, this, that, these, those and they.

The books and other documents in the British collection include nonfiction, science, business, art, science, politics, and other topics. The average document in the study contained 42,000 words, and the full set of data encompassed about 25 million words. Winnow scanned for more than 1,000 different male/female determiners, and then predicted the gender of each author of each document.

In one group of 264 novels, the program misidentified only six authors.

Dr. Argamon and his associates found gender-related differences occur even in technical scientific documents, which are written in a formal, impersonal style. Winnow identified the authors' sex in 73 percent of scientific documents it scanned.

Among the questions not yet addressed by researchers is the impact that text loaded with phrasing from one gender might have on the other.

Do men subconsciously consider documents with “female” determiners “soft” and take them less seriously? Are those containing heavy male content deemed more authoritative by men but emotionally distant and off-putting by women? Might some writing be so laden with gender-loaded characteristics that one sex or the other would need translation to understand it?

Dr. Tannen's research was among the first to stir awareness about differences in spoken communication styles between men and women.

Women, Dr. Tannen found, use conversation to connect with other people, to forge friendships; and draw closer emotionally. They discuss problems with close female friends, for instance, and attempt to do so with male intimates to become closer friends, not to solicit solutions.

Men use conversation to negotiate their status and keep people from pushing them around. They hear “troubles talk” from women as a request for advice and respond with well-crafted solutions. That's not what women want, so they think the man is trying to cut them off or make their problems seem trivial. And so the stereotypes emerge.

Men supposedly view women as constant talkers always ready to complain but rarely willing to do anything about it. Women bristle at the silent, insensitive male with head buried in the newspaper, or glued to the TV screen, tuning her out.

Dr. Tannen, who was not involved in the new research, said it may help in closing the communication gap between the sexes, particularly in the workplace.

E-mail, letters, and reports potentially can lead to more miscommunication than conversation, she noted, because the writer is not present to immediately answer questions or provide clarification.

Dr. Argamon said learning more about the differences in how men and women write and understand written language may also “have potentially great implications for our society in education.” One possibility is gender-specific textbooks. Books aimed at girls or boys could be tweaked to suit their respective learning styles.

Dr. Argamon and his co-workers are trying to extend the automated gender-ID technique so documents could be scanned to detect an anonymous author's age, educational level, or ethnic background.

Such techniques could help criminal investigators quickly compose a basic profile of the author of a ransom note, threat, or other communique. They also could be used to authenticate authorship.

Speculation about who wrote the Clinton-era novel Primary Colors lingered for months until a Vassar College expert manually compared the styles of various writers, identifying the author as Joe Klein.

Dr. Raul Valdes-Perez, on leave from CMU's computer science department, foresees other applications. He is a co-developer of Vivismo, software that categorizes Web search results, and president of Vivismo, Inc.

“Suppose that sales/service inquiries come in by e-mail,” he said. “Assume people like to be served by someone of the same sex, such as patients consulting doctors, or the opposite sex. The program could automatically assign a person of the same inferred sex to handle the inquiry.”