Projects - Fall 2020
Fall 2020 Projects
We have six exciting projects lined up for Fall 2020:
- Eclipse OMR
- Keyman Predictive Text
- Language Server for Eclipse Jakarta EE
- OpenJ9 Performance Instrumentation for AI Systems
- Performance Robustness Evaluation for Statistical Classifiers (PRESC)
- Review Board
Eclipse OMR
Description
The Eclipse OMR project produces reusable technology components for building language runtimes like the Java Virtual Machine. It includes components like: a garbage collector for automatic memory management, a compiler for generating native code, platform porting library to enable a language to run on many platforms, etc. The project is aimed at developers who want proven runtime component implementations without reinventing the wheel. The key component for this CANOSP project is the JitBuilder library, which provides a simple API developers can use to generate native code during a program’s execution. These technology components are originally from and continue to be used by the Eclipse OpenJ9 Java Virtual Machine, which has been powering many enterprise system across the globe for 15 years.
URL
Project Mentors
- Nazim Bhuiyan
- Mark Stoodley
- Igor Braga
Affiliated Organization/Company
- Eclipse Foundation
Number of Students
- 1 to 2
Required Skills/Experiences
- Python programming
Preferred Skills/Experiences
- Familiarity with Java and C++ programming
Keyman Predictive Text
Description
A simple web-based interface for creating dictionaries to provide predictive text and autocorrect facilities on smartphones for less-resourced languages. Its intended users are language activists that may have a spreadsheet of words in their language, but want to have predictive text on their smartphone keyboard. The functionality exists today, but is relegated to a Windows desktop app. This process should be as straightforward as possible. The core technologies are TypeScript and the web platform (e.g., LocalStorage, WebWorkers, IndexedDB). A web framework (if any) has not yet been decided. The data does need to be private (no sharing on a server), so any sensitive data is not transmitted on the network.
URLs
Project Mentor
- Eddie Antonio Santos
Affiliated Organization/Company
- National Research Council
- SIL International
Number of Students
- 2 to 4
Required Skills/Experiences
- Version control with git
- Basic understanding of HTML/CSS
Preferred Skills/Experiences
- TypeScript
- npm
- Continuous integration/continuous deployment
- Frontend frameworks such as Svelte, Vue, or React
- Frontend testing with Cypress
- Module bundlers such as Webpack, Rollup, or Parcel
Language Server for Eclipse Jakarta EE
Description
Develop a language server and the associated Eclipse client for the set of open cloud-native Java APIs in Eclipse Jakarta EE to boost developer productivity.
Eclipse Jakarta EE is an open source community-driven collaboration on defining and innovating on the next generation of cloud-native Java APIs.
The Language Server Protocol (LSP) enables language-specific assistance for IDEs and editors such as validations, auto-complete etc to be built in a common reusable way.
This project looks to develop a common Language Server using the Language Server Protocol for Jakarta EE APIs and the associated client for the Eclipse IDE.
URL
Project Mentors
- Yee-Kang (YK) Chang
- Kevin Sutter
- Ryan Zegray
- Kathryn Kodama
- Eric Lau
Affiliated Organization/Company
- Eclipse Foundation
- IBM
Number of Students
- 3 to 6
Required Skills/Experiences
- Experience with Java development
- Familiarity and experience with agile software development and typical software development tools.
Preferred Skills/Experiences
- Experience with cloud-native Java APIs and Eclipse plugin development will be nice to have but not mandatory
OpenJ9 Performance Instrumentation for AI Systems
Description
Eclipse OpenJ9 is an enterprise-grade Java Virtual Machine (JVM). The project is interested in using Artificial Intelligence and Machine Learning to improve the performance of Java applications. The first step of this journey, and the goal of this project, will be to create a JVM agent to record key performance metrics, format the data in a manner suitable for consumption by AI systems and to add some preliminary tagging to the gathered metrics to facilitate training of AI and ML systems aimed at improving the performance of Java applications running on Eclipse OpenJ9.
The core of the implementation will be in C++ using the Java Virtual Machine Tooling Interface (JVMTI) to capture rich information about the runtime behaviour of the JVM. The agent will provide a REST API to control the agent data gathering. Data will be sent to downstream consumers, AI systems, using RESET, WebSockets and JSON - possibly using the Open Telemetry format. Time permitting, we may also explore feeding some of the gathered telemetry to some machine learning algorithms to gauge their effectiveness in predicting / classifying application behviour.
URL
Project Mentor
- Andrew Craik
- Vijay Sundaresan
Affiliated Organization/Company
- IBM
Number of Students
- 3 to 5
Required Skills/Experiences
- C++
- Git
- Linux (command-line tools)
- Make/CMake
Preferred Skills/Experiences
- Java
- GDB
- C++ use of WebSockets JSON and REST
- Experience with AI/ML system data ingest
Performance Robustness Evaluation for Statistical Classifiers
Description
PRESC is an opinionated toolkit for the evaluation of machine learning classification models.
ML model evaluation is a very broad field with a vast number of existing approaches as well as available software tools. The focus of PRESC is on areas which are often overlooked and expressly extend beyond standard numerical accuracy-based measures of performance (although these will also likely be included in the performance report).
Current scope of the project:
- Analysis of misclassifications
- Robustness evaluation (generalizability to unseen data)
- Stability evaluation (variability in results to due methodology)
- During early development, we restrict focus to binary classifiers. However, the methodologies should apply equally to multiclass problems.
It is intended for use by ML engineers to assist in the development and updating of models. The deliverable is a tool which takes as input a model and a dataset and describes performance characteristics of the model in terms of various metrics, which can be used for both model selection and diagnostics. It will be usable as both a Python package and a continuous integration step.
URL
Project Mentors
- David Zeber
- Martin Lopatka
Affiliated Organization/Company
- Mozilla
Number of Students
- 1 to 4
Required Skills/Experiences
- Python 3
Preferred Skills/Experiences
- Python data science stack (numpy/pandas/matplotlib/Jupyter)
- Python machine learning tools (scikit-learn)
- Some experience with machine learning (academic or practical)
- Git/Github
Review Board
Description
Review Board is a powerful web-based code review tool that helps developers do peer review as they write code. Code review is a standard industry practice used to find bugs, improve quality, and mentor junior engineers.
Review Board is used by thousands of software companies including Yelp, LinkedIn, and VMware, as well as many open-source projects like Apache.
Students working on Review Board will have the opportunity to learn about back-end web development using Python and Django, as well as front-end development using HTML, CSS, Javascript, jQuery, and Backbone.js. Source control is managed via Git on GitHub. All patches are reviewed using Review Board, and students will be participating in the code review process.
URL
Project Mentors
- David Trowbridge
- Christian Hammond
- Mike Conley
- Anselina Chia
Affiliated Organization/Company
- Beanbag, Inc.
Number of Students
- 3 to 12
Required Skills/Experiences
- Students should have some familiarity with Python and JavaScript. It’s not required to be an expert, but it will be very difficult to make significant progress if you have to learn the language in addition to new frameworks and codebases.
Preferred Skills/Experiences
- Review Board is built using the Django framework on the backend, and jQuery/Backbone/HTML/CSS on the frontend. While Review Board sometimes uses those in advanced and unusual ways, some experience with those will give you a head start.
- We highly recommend having some familiarity with git.