Publications
This page lists all publications that were created during the course of the PhD thesis. For a full list of my publications, check here.
Journal Articles
Themistoklis Diamantopoulos and Andreas Symeonidis, "Enhancing Requirements Reusability through Semantic Modeling and Data Mining Techniques", in Enterprise Information Systems,
pp. 1-22,
December 2017.
| Bibtex
Enhancing the requirements elicitation process has always been of added value to software engineers,
since it expedites the software lifecycle and reduces errors in the conceptualization phase of
software products. The challenge posed to the research community is to construct formal models that
are capable of storing requirements from multimodal formats (text and UML diagrams) and promote easy
requirements reuse, while at the same time being traceable to allow full control of the system design,
as well as comprehensible to software engineers and end users. In this work, we present an approach
that enhances requirements reuse while capturing the static (functional requirements, use case diagrams)
and dynamic (activity diagrams) view of software projects. Our ontology-based approach allows for
reasoning over the stored requirements, while the mining methodologies employed detect incomplete
or missing software requirements, this way reducing the effort required for requirements elicitation
at an early stage of the project lifecycle.
Themistoklis Diamantopoulos, Michael Roth, Andreas Symeonidis and Ewan Klein, "Software Requirements as an Application Domain for Natural Language Processing", in Language Resources and Evaluation,
vol 51, no 2, pp. 495-524,
February 2017.
| Bibtex
| Preprint
Mapping functional requirements first to specifications and then to code is one of the most challenging tasks in software development. Since requirements are commonly written in natural language, they can be prone to ambiguity, incompleteness and inconsistency. Structured semantic representations allow requirements to be translated to formal models, which can be used to detect problems at an early stage of the development process through validation. Storing and querying such models can also facilitate software reuse. Several approaches constrain the input format of requirements to produce specifications, however they usually require considerable human effort in order to adopt domain-specific heuristics and/or controlled languages. We propose a mechanism that automates the mapping of requirements to formal representations using semantic role labeling. We describe the first publicly available dataset for this task, employ a hierarchical framework that allows requirements concepts to be annotated, and discuss how semantic role labeling can be adapted for parsing software requirements.
Christoforos Zolotas, Themistoklis Diamantopoulos, Kyriakos Chatzidimitriou and Andreas Symeonidis, "From requirements to source code: a Model-Driven Engineering approach for RESTful web services", in Automated Software Engineering,
vol 24, no 4, pp. 791-838,
September 2016.
| Bibtex
| Preprint
During the last few years, the REST architectural style has drastically changed
the way web services are developed. Due to its transparent resource-oriented model,
the RESTful paradigm has been incorporated into several development frameworks that
allow rapid development and aspire to automate parts of the development process.
However, most of the frameworks lack automation of essential web service functionality,
such as authentication or database searching, while the end product is usually not
fully compliant to REST. Furthermore, most frameworks rely heavily on domain specific
modeling and require developers to be familiar with the employed modeling technologies.
In this paper, we present a Model-Driven Engineering (MDE) engine that supports fast
design and implementation of web services with advanced functionality. Our engine provides
a front-end interface that allows developers to design their envisioned system through
software requirements in multimodal formats. Input in the form of textual requirements
and graphical storyboards is analyzed using natural language processing techniques and
semantics, to semi-automatically construct the input model for the MDE engine. The engine
subsequently applies model-to-model transformations to produce a RESTful, ready-to-deploy
web service. The procedure is traceable, ensuring that changes in software requirements
propagate to the underlying software artefacts and models. Upon assessing our methodology
through a case study and measuring the effort reduction of using our tools, we conclude
that our system can be effective for the fast design and implementation of web services,
while it allows easy wrapping of services that have been engineered with traditional
methods to the MDE realm.
Themistoklis Diamantopoulos and Andreas Symeonidis, "Localizing Software Bugs using the Edit Distance of Call Traces", in International Journal on Advances in Software,
vol 7, no 1, pp. 277-288,
October 2014.
| Bibtex
| Preprint
Automating the localization of software bugs that do not lead to crashes is a difficult
task that has drawn the attention of several researchers. Several popular methods
follow the same approach; function call traces are collected and represented as graphs,
which are subsequently mined using subgraph mining algorithms in order to provide a
ranking of potentially buggy functions-nodes. Recent work has indicated that the
scalability of state-of-the-art methods can be improved by reducing the graph dataset
using tree edit distance algorithms. The call traces that are closer to each other, but
belong to different sets, are the ones that are most significant in localizing bugs.
In this work, we further explore the task of selecting the most significant traces, by
proposing different call trace selection techniques, based on the Stable Marriage
problem, and testing their effectiveness against current solutions. Upon evaluating our
methods on a real-world dataset, we prove that our methodology is scalable and
effective enough to be applied on dynamic bug detection scenarios.
Themistoklis Diamantopoulos and Andreas L. Symeonidis, "AGORA: A Search Engine for Source Code Reuse", in SoftwareX,
under review.
| Bibtex
Abstract will be available after publication
Book Chapters
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas L. Symeonidis, "Assessing the User-Perceived Quality of Source Code Components using Static Analysis Metrics", in Communications in Computer and Information Science (CCIS),
vol 868, pp. 3-27,
June 2018.
| Bibtex
Nowadays, developers tend to adopt a component-based software engineering approach, reusing own
implementations and/or resorting to third-party source code. This practice is in principle
cost-effective, however it may also lead to low quality software products, if the components to
be reused exhibit low quality. Thus, several approaches have been developed to measure the
quality of software components. Most of them, however, rely on the aid of experts for defining
target quality scores and deriving metric thresholds, leading to results that are
context-dependent and subjective. In this work, we build a mechanism that employs static
analysis metrics extracted from GitHub projects and defines a target quality score based on
repositories' stars and forks, which indicate their adoption/acceptance by developers. Upon
removing outliers with a one-class classifier, we employ Principal Feature Analysis and examine
the semantics among metrics to provide an analysis on five axes for source code components
(classes or packages): complexity, coupling, size, degree of inheritance, and quality of
documentation. Neural networks are thus applied to estimate the final quality score given
metrics from these axes. Preliminary evaluation indicates that our approach effectively
estimates software quality at both class and package levels.
Conference Papers
Themistoklis Diamantopoulos and Andreas Symeonidis, "CodeCatch: Extracting Source Code Snippets from Online Sources", in IEEE/ACM 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE),
pp. 21-27,
Gothenburg, Sweden, May 2018.
| Bibtex
Nowadays, developers rely on online sources to find example snippets that address the
programming problems they are trying to solve. However, contemporary API usage mining
methods are not suitable for locating easily reusable snippets, as they provide usage
examples for specific APIs, thus requiring the developer to know which library to use
beforehand. On the other hand, the approaches that retrieve snippets from online sources
usually output a list of examples, without aiding the developer to distinguish among
different implementations and without offering any insight on the quality and the
reusability of the proposed snippets. In this work, we present CodeCatch, a system that
receives queries in natural language and extracts snippets from multiple online sources.
The snippets are assessed both for their quality and for their usefulness/preference by
the developers, while they are also clustered according to their API calls to allow the
developer to select among the different implementations. Preliminary evaluation of
CodeCatch in a set of indicative programming problems indicates that it can be a useful
tool for the developer.
Michail Papamichail, Themistoklis Diamantopoulos, Ilias Chrysovergis, Philippos Samlidis and Andreas Symeonidis, "User-Perceived Reusability Estimation based on Analysis of Software Repositories", in 2018 IEEE International Workshop on Machine Learning Techniques for Software Quality Evaluation (MaLTeSQuE),
pp. 49-54,
Campobasso, Italy, March 2018.
| Bibtex
The popularity of open-source software repositories has led to a new reuse paradigm,
where online resources can be thoroughly analyzed to identify reusable software components.
Obviously, assessing the quality and specifically the reusability potential of source code
residing in open software repositories poses a major challenge for the research community.
Although several systems have been designed towards this direction, most of them do not
focus on reusability. In this paper, we define and formulate a reusability score by
employing information from GitHub stars and forks, which indicate the extent to which
software components are adopted/accepted by developers. Our methodology involves applying
and assessing different state-of-the-practice machine learning algorithms, in order to
construct models for reusability estimation at both class and package levels. Preliminary
evaluation of our methodology indicates that our approach can successfully assess
reusability, as perceived by developers.
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos and Andreas L. Symeonidis, "Towards Modeling the User-perceived Quality of Source Code using Static Analysis Metrics", in 12th International Conference on Software Technologies (ICSOFT),
pp. 73-84,
Madrid, Spain, July 2017.
| Bibtex
| Preprint
Nowadays, software has to be designed and developed as fast as possible, while
maintaining quality standards. In this context, developers tend to adopt a
component-based software engineering approach, reusing own implementations and/or
resorting to third-party source code. This practice is in principle cost-effective,
however it may lead to low quality software products. Thus, measuring the quality of
software components is of vital importance. Several approaches that use code metrics
rely on the aid of experts for defining target quality scores and deriving metric
thresholds, leading to results that are highly context-dependent and subjective. In
this work, we build a mechanism that employs static analysis metrics extracted from
GitHub projects and defines a target quality score based on repositories' stars and
forks, which indicate their adoption/acceptance by the developers' community. Upon
removing outliers with a one-class classifier, we employ Principal Feature Analysis
and exam ine the semantics among metrics to provide an analysis on five axes for a
source code component: complexity, coupling, size, degree of inheritance, and quality
of documentation. Neural networks are used to estimate the final quality score given
metrics from all of these axes. Preliminary evaluation indicates that our approach
can effectively estimate software quality.
Michail Papamichail, Themistoklis Diamantopoulos and Andreas L. Symeonidis, "User-Perceived Source Code Quality Estimation based on Static Analysis Metrics", in 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS),
pp. 100-107,
Vienna, Austria, August 2016.
| Bibtex
| Preprint
The popularity of open source software repositories and the highly adopted paradigm of
software reuse have led to the development of several tools that aspire to assess the
quality of source code. However, most software quality estimation tools, even the ones
using adaptable models, depend on fixed metric thresholds for defining the ground truth.
In this work we argue that the popularity of software components, as perceived by
developers, can be considered as an indicator of software quality. We present a generic
methodology that relates quality with source code metrics and estimates the quality of
software components residing in popular GitHub repositories. Our methodology employs two
models: a one-class classifier, used to rule out low quality code, and a neural network,
that computes a quality score for each software component. Preliminary evaluation
indicates that our approach can be effective for identifying high quality software
components in the context of reuse.
Themistoklis Diamantopoulos and Antonis Noutsos and Andreas L. Symeonidis, "DP-CORE: A Design Pattern Detection Tool for Code Reuse", in 6th International Symposium on Business Modeling and Software Design (BMSD),
pp. 160-169,
Rhodes, Greece, June 2016.
| Bibtex
| Preprint
| Code
In order to maintain, extend or reuse software projects one has to primarily understand
what a system does and how well it does it. And, while in some cases information on
system functionality exists, information covering the non-functional aspects is usually
unavailable. Thus, one has to infer such knowledge by extracting design patterns
directly from the source code. Several tools have been developed to identify design
patterns, however most of them are limited to compilable and in most cases executable
code, they rely on complex representations, and do not offer the developer any control
over the detected patterns. In this paper we present DP-CORE, a design pattern detection
tool that defines a highly descriptive representation to detect known and define custom
patterns. DP-CORE is flexible, identifying exact and approximate pattern versions even
in non-compilable code. Our analysis indicates that DP-CORE provides an efficient
alternative to existing design pattern detection tools.
Themistoklis Diamantopoulos, Klearchos Thomopoulos, and Andreas Symeonidis, "QualBoa: reusability-aware recommendations of source code components", in IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR),
pp. 488-491,
Austin, Texas, USA, May 2016.
| Bibtex
| Preprint
| Code
| Dataset
Contemporary software development processes involve finding reusable software
components from online repositories and integrating them to the source code, both to
reduce development time and to ensure that the final software project is of high
quality. Although several systems have been designed to automate this procedure by
recommending components that cover the desired functionality, the reusability of these
components is usually not assessed by these systems. In this work, we present QualBoa,
a recommendation system for source code components that covers both the functional and
the quality aspects of software component reuse. Upon retrieving components, QualBoa
provides a ranking that involves not only functional matching to the query, but also a
reusability score based on configurable thresholds of source code metrics. The
evaluation of QualBoa indicates that it can be effective for recommending reusable
source code.
Themistoklis Diamantopoulos and Andreas Symeonidis, "Towards Interpretable Defect-Prone Component Analysis using Genetic Fuzzy Systems", in IEEE/ACM 4th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE),
pp. 32-38,
Florence, Italy, May 2015.
| Bibtex
| Preprint
The problem of Software Reliability Prediction is attracting the attention of several
researchers during the last few years. Various classification techniques are proposed in
current literature which involve the use of metrics drawn from version control systems
in order to classify software components as defect-prone or defect-free. In this paper,
we create a novel genetic fuzzy rule-based system to efficiently model the
defect-proneness of each component. The system uses a Mamdani-Assilian inference engine
and models the problem as a one-class classification task. System rules are constructed
using a genetic algorithm, where each chromosome represents a rule base (Pittsburgh
approach). The parameters of our fuzzy system and the operators of the genetic algorithm
are designed with regard to producing interpretable output. Thus, the output offers not
only effective classification, but also a comprehensive set of rules that can be easily
visualized to extract useful conclusions about the metrics of the software.
Themistoklis Diamantopoulos and Andreas Symeonidis, "Employing Source Code Information to Improve Question-answering in Stack Overflow", in IEEE/ACM 12th Working Conference on Mining Software Repositories (MSR),
pp. 454-457,
Florence, Italy, May 2015.
| Bibtex
| Preprint
Nowadays, software development has been greatly influenced by question-answering
communities, such as Stack Overflow. A new problem-solving paradigm has emerged, as
developers post problems they encounter that are then answered by the community. In this
paper, we propose a methodology that allows searching for solutions in Stack Overflow,
using the main elements of a question post, including not only its title, tags, and
body, but also its source code snippets. We describe a similarity scheme for these
elements and demonstrate how structural information can be extracted from source code
snippets and compared to further improve the retrieval of questions. The results of our
evaluation indicate that our methodology is effective on recommending similar question
posts allowing community members to search without fully forming a question.
Michael Roth, Themistoklis Diamantopoulos, Ewan Klein and Andreas L. Symeonidis, "Software Requirements: A new Domain for Semantic Parsers", in ACL 2014 Workshop on Semantic Parsing (SP14),
pp. 50-54,
Baltimore, Maryland, USA, June 2014.
| Bibtex
| Preprint
Software requirements are commonly written in natural language, making them prone to
ambiguity, incompleteness and inconsistency. By converting requirements to formal
emantic representations, emerging problems can be detected at an early stage of the
development process, thus reducing the number of ensuing errors and the development
costs. In this paper, we treat the mapping from requirements to formal representations
as a semantic parsing task. We describe a novel data set for this task that involves two
contributions: first, we establish an ontology for formally representing requirements;
and second, we introduce an iterative annotation scheme, in which formal representations
are derived through step-wise refinements.
Themistoklis Diamantopoulos and Andreas Symeonidis, "Towards Scalable Bug Localization using the Edit Distance of Call Traces", in Eighth International Conference on Software Engineering Advances (ICSEA),
pp. 45-50,
Venice, Italy, October 2013.
| Bibtex
| Preprint
Locating software bugs is a difficult task, especially if they do not lead to crashes.
Current research on automating non-crashing bug detection dictates collecting function
call traces and representing them as graphs, and reducing the graphs before applying a
subgraph mining algorithm. A ranking of potentially buggy functions is derived using
frequency statistics for each node (function) in the correct and incorrect set of
traces. Although most existing techniques are effective, they do not achieve
scalability. To address this issue, this paper suggests reducing the graph dataset in
order to isolate the graphs that are significant in localizing bugs. To this end, we
propose the use of tree edit distance algorithms to identify the traces that are closer
to each other, while belonging to different sets. The scalability of two proposed
algorithms, an exact and a faster approximate one, is evaluated using a dataset derived
from a real-world application. Finally, although the main scope of this work lies in
scalability, the results indicate that there is no compromise in effectiveness.