Students release stylometry tools | The Triangle

Students release stylometry tools

Students working in Drexel’s Privacy, Security and Automation Lab released alpha versions of two authorship analysis tools Dec. 29  at the 28th Chaos Communication Congress in Berlin. These tools will allow people to both identify and conceal the author of a piece of writing.

Rachel Greenstadt, assistant professor of computer science, led a team of graduate and undergraduate students to research stylometry, the application of the study of linguistic style. The team came up with the authorship recognition software JStylo and an authorship evasion tool called Anonymouth.

The project started when one of Greenstadt’s students, Michael Brennan, did a class project about what could be learned from a person’s writing style.

“There is a lot of work in that area, but I was interested in having him look at what happens when somebody tries to fool these systems,” Greenstadt said.

The PSAL team did a study in which they had people try to conceal their writing styles. After giving samples of their normal writing style, participants altered their writing by using their own intuition.

“We showed that the author recognition software works really well when people don’t try to hide their writing style, but it falls apart when they do,” Greenstadt said. “We figured people would actually be even better at hiding their writing style if they had access to the tools.”

The group first had to devise their JStylo tool to provide a platform for researchers to experiment methods for authorship recognition. Ariel Stolerman, JStylo’s lead developer, took the idea of the senior design project Stylo tool and started from scratch building JStylo to be a usable program.

“The idea of the tool is based on a previous tool called Stylo that started as a senior design project at PSAL and was designed more for proof of concept than a widely usable tool,” Stolerman, a computer science doctoral student, explained.

JStylo first compiles a set of anonymous documents that a person may want to attribute to an author. Next, a set of documents by candidate authors is put into the system. The tool only works if the anonymous author has written a document in this second set.

The tool then allows users to select what stylistic features they want to extract from the texts, such as sentence length, average word length and part-of-speech frequencies.

Next, a classification method based on mathematical and statistical data is applied and the analysis is run.

“The tool basically ‘learns’ what features characterize each of the candidate authors, examines the features of the anonymous documents and then matches them to the author they best fit, thus revealing their author,” Stolerman said.

The authorship recognition evasion tool Anonymouth was built off of JStylo and developed by Andrew McDonald, a junior computer engineering student.

As with JStylo, Anonymouth was originally a senior design project. Instead of helping to uncover anonymous authors, Anonymouth provides the information to distort writing and fool authorship recognition software. The tool compares the samples of an author’s writing with a set of other people’s writing.

Anonymouth users upload a writing sample and the document that needs to be modified. The tool then takes the average stylistic features of the documents already in the system and compares it to the author’s documents.

The tool generates suggestions of stylistic changes that can be made to the document being modified based on average stylistic features of documents already in the system.

“We’re still working out how to make it more automated. It’ll highlight things, tell you what to do, show you the value of the feature as you type,” McDonald said.

According to Greenstadt, the PSAL members are using the new tools to serve as a research platform to explore the ideas of deception in writing and authorship recognition.

In the past, Greenstadt’s team would get requests to classify documents and determine their authorship. This takes a lot of work.

“We’re trying to move away from that model, and we’re saying, ‘We’re releasing our tools, and now you can play with them and see what’s there,’” Greenstadt explained.

Her goal is to raise awareness regarding the effectiveness of authorship recognition tools as well as the notion that these tools aren’t perfect and can be deceived.