Part III: Source Code Mining
Chapter 5: Source Code Indexing for Component Reuse
This chapter refers to the challenge of indexing source code and proposes the Code Search Engine AGORA. AGORA is offered as an open source tool. The documentation page of AGORA, which is available here, includes links to all repositories of the tool as well as instructions for installing and using it. In addition, an an online running version can be found here.
The index is populated using the list of the 3000 most starred Java projects of GitHub, cleaned up by removing forked projects and invalid projects (e.g. without java code). The list of projects is available here.
The dataset used for our evaluation in a component reuse scenario, which includes the list of queries that were performed at the service, is available here.
Chapter 6: Mining Source Code for Component Reuse
This chapter includes the design of the Test-Driven Reuse system Mantissa. Mantissa is a system that acceptes queries for class-level software components and further assesses their functionality using tests. An online running version of the tool can be found here.
The example search scenario (case study) that is built using Mantissa is available here.
In addition, the three datasets used for evaluating Mantissa against other tools in a component reuse scenario are available here.
Chapter 7: Mining Source Code for API Snippet Reuse
This chapter describes the design of CodeCatch, a system that receives queries in natural language, and recommends useful snippets from multiple online sources. CodeCatch further evaluates the readability of the retrieved snippets as well as their preference/acceptance by the developer community.
An online running version of the tool, which also includes the queries used for evaluation, can be found here.
Furthermore, the list of projects used to create our API calls index can be found here.
Chapter 8: Mining Queries for Snippet Reuse
This chapter describes a methodology for finding solutions in question-answering systems, using the main elements of a question post, including its title, tags, body, and its source code snippets. Our methodology is applied on the official data dump of Stack Overflow as of September 26, 2014, which is available here.