Doctoral candidate uses coding style to identify hackers | The Triangle

Doctoral candidate uses coding style to identify hackers

Photo Courtesy:  Drexel University  Office of Communications
Photo Courtesy: Drexel University Office of Communications

A recent study co-authored by Drexel University doctoral candidate Aylin Caliskan-Islam has shown that analyzing the style of computer coding in a program can allow the author of said code to be identified with up to 95 percent accuracy. Caliskan-Islam co-authored the paper with University of Maryland sophomore Andrew Liu. The findings are revolutionary because being able to identify the author of code allows authorities to find the authors of malware or malicious hackers.

The team of Caliskan-Islam and Liu studied over 250 programmers’ codes and programs, aiming to study their “style” of coding. The study was conducted during their time working at the Army Research Laboratory over the summer of 2014. The pair identified the different patterns that could be found through the process of naming code as well as deep structural differences.

The data that the pair sifted through consisted of information made available to the public through the 2014 Google Code Jam. The two studied the majority of the dataset based on code written in C++, a programming language.

Caliskan-Islam was the lead writer of the study findings, having first done preliminary experiments for the concept in 2013. After finding favorable results and talking to her advisor, Rachel Greenstadt, she opted to accept an offer from the ARL to continue working on the project. Caliskan-Islam was the first international student to join the ARL as part of its open campus initiative.

The process of identifying the authors of code took the majority of summer 2014 to complete and is still undergoing ratifications in order to become more comprehensive.

“We started developing the most recent process for identifying code in June 2014 and have been steadily improving the method to cover different privacy, security and software engineering problems.” Caliskan-Islam said. “The process not only identifies authors as part of a security enhancement but is also a privacy infringement method.

While Liu was there for the summer program, he will most likely not be returning to continue on the project. Currently, Caliskan-Islam and her advisor are the only researchers from Drexel on the project.

“Andrew joined us for the summer and I acted as his mentor, introducing him to machine learning and stylometry.” Caliskan-Islam said. “Both my advisor and myself are hoping to bring more students onto the project.”

The project is being worked on by a collaborated effort from the ARL, Princeton University and . There are future projects related to code stylometry in the works from Caliskan-Islam as well.

Caliskan-Islam is one of the first people to look into the style of written code and create a method to identify its author. She devoted a large amount of her studies to machine learning and stylometry, having presented various talks on the subject as well as publications under her belt. Her work benefits the intelligence community incredibly as it can help bolster future security methods.

“The applications are broad and can include forensics, copyright investigations and software engineering,” Caliskan-Islam said. “Other security methods can be greatly improved through this process.”

Whether future programmers will find a way around the new method and continue to be “anonymous” will remain to be seen. As history has proven, new challenges have only increased programmers’ creativity in solving problems.