Nick Ruta

Data Science · Machine Learning · Software Engineering

I use Software Engineering, Data Visualization & Machine Learning skills to explore interesting topics such as natural language processing to determine the true author of Shakespeare's works and time series analysis with astronomical data to find similarities amongst the stars.

Data Science Pipeline

This project demonstrates all of the technologies needed to create an end-to-end data science pipeline. This includes consuming data from an original source, processing and storing it and finally providing machine-learning based results to end users.

SKILLS USED:

Java • Spring Boot • MySQL • MongoDB • Javascript Angular 8 • Python Flask • Machine Learning • Pattern Recognition • Data Visualization

Github

View Architecture Map

Time Series Analysis

I created a python-based search engine that uses time series analysis to find similarities among the celestial objects that fill our universe. I used morphological similarity by implementing an algorithm that uses cross-correlation with Fast Fourier transform and feature-based similarity using pattern-recognition of each objects metadata. I intend to continue this research with the goal of creating software that scientists everywhere will use to analyze petabytes of data provided by the newly established “Large Synoptic Survey Telescope” in 2020.

Currently, the database has 2,000 time series from The Catalina Sky (CSDR2) and VVV Data surveys.

SKILLS USED:

Python • Machine Learning • Pattern Recognition • Cross-correlation w/ Fast Fourier Transform • Data Visualization

Detailed Project Description

View Live Application

Time Series Exploration Through Hierarchical Clustering

ABSTRACT:

We introduce a novel interactive visualization technique for analyzing large collections of time series data based on the hierarchical composition of the visual pattern space. Comparing many long time series is challenging to do by hand, therefore, clustering time series enables data analysts to discover relevance between and anomalies among multiple time series. However, even after reasonable clustering, they have to scrutinize correlations between clusters or similarities among time series in a cluster. We developed SAX Navigator, an interactive visualization tool, that allows users to hierarchically explore large collections oftime series data, retrieve their commonalities, and describe their dissimilarities. Our visualization focuses on a unique way to navigate time series that involves a “vocabulary of patterns” developed by using a dimensionality reduction technique, Symbolic Aggregate approXimation(SAX). In the space of SAX, the time series data performs better during clustering and is more efficient to query. We demonstrate the ability of SAX Navigator to analyze patterns in large time series data based on three case studies for the Catalina surveys data release 2, an astronomy data set containing 46,000 individual time series. We verified the usability of our system througha think-aloud study with an astronomy domain scientist. Our technique allows domain experts to view global patterns across the dataset as well as to explore clusters and individual observations locally.

SKILLS USED:

Time Series Analysis • Hierarchical Visualization • Visual Analytics • Javascript & D3

View the Paper Application DEMO Project VIDEO

Healthcare Data Privacy Visualization

How Private is Private?

We all know data gets passed around somehow. By some people. In some way. Surprisingly, though, breaches in health data are much more common than we might think, and the extent of impact is shockingly broad. The data on health care privacy is much more shocking than we might think. Sensitive health data is spread through a complex network involving insurance companies, educational institutions, physicians, and more. As a result of this broad network, breaches are happening at an astonishing rate, increasing over time and across the United States. The individual stories of how these breaches are simultaneously unbelievable and strongly representative of the truth. Nathan's story is similar to 652 others in the same category of company. The problem is rooted so deeply it even extends to within reach of us (literally) through our smartphones, with health and medical apps relying on hundreds of online connections to transmit data.

SKILLS USED:

Data Visualization • Visual Analytics • Javascript & D3

Website View the VIDEO

Adaptive LSM-Tree

THESIS TITLE: CuttleTree: Adaptive Tuning for Optimized Log-Structured Merge Trees

For my master's thesis in software engineering, I conducted research on adaptive data systems. I designed an adaptive LSM-tree that captures workload patterns by collecting statistics during its runtime and uses different versions of tunable parameters in order to optimize the performance. Instead of having a single and fixed design as in current state-of-the-art implementations, this new adaptive LSM-tree can transition between alternative designs and accommodate varying workloads.

SKILLS USED:

C++ • Data Structures & Algorithms • Database Design • Statistics

View the Thesis

Natural Language Processing on Shakespeare

For this project, I used natural language processing to determine the true author of Shakespeare's works. I conducted research in Text Mining for the Analysis of Shakespeare. I used text mining approaches to extract features from the data and machine learning techniques, including clustering and classification algorithms.

SKILLS USED:

Machine Learning • Natural Language Processing • Visualization • Statistics

View the paper

Visualization for Online Image Sequence Classification of Astronomical Events

Advances in imaging technology capability have led to a significant increase in the availability of astronomical data. Particularly, time series analysis in astronomy involves approximately 40 petabytes of data (provided by Pan-STARRS) and is expected to approach 100 petabytes (compliments of LSST) around 2021. Scientists interested in tracking the brightness of stars need an application that can process this astronomical ”big data” and return results within a few seconds. This project aims to build a visualization dashboard that can provide classification of many celestial objects from a series of nightly images as soon as possible using a Long Short-term memory (LSTM) architecture. Specifically, our model will (1) take in a variable number of images of the object and report a probabilistic classification after each image, (2) outperform naive models (that don’t incorporate sequential information) using the same data and (3) be monitored online through a visualization dashboard which allows the analyst to determine when each object has been classified as soon as possible. Our results show we are able to make predictions sooner and more accurately when compared to conventional convolutional neural networks and other baselines.

SKILLS USED:

Long short-term memory (LSTM) Recurrent Neural Network • Visualization • Statistics

View the paper Application DEMO

Portfolio

Ipython Notebook Demonstrating Time Series Analysis including the standardization, interpolation and phase shifting of comparable astronomical light curves.

TECHNOLOGIES USED -

Python • Math • Time Series Analysis

Github Ipython Notebook

This is a web application which aggregates RSS feeds and displays them on the home page. A user can register for an account and add a rss feed.

TECHNOLOGIES USED -

Java • Spring MVC • Spring Security • Spring Data • Hibernate

Github Repository

Live Demo

This is a custom validator created to reject YEAR values prior to 1940. It is created using the ConstraintValidator.

TECHNOLOGIES USED -

Java • Spring MVC

Github Repository

This is an example of using the Comparable interface to sort an object by a String field.

TECHNOLOGIES USED -

Java

Github Repository

This is a custom phone format example

TECHNOLOGIES USED -

Java • Spring MVC

Github Repository

This is my first iOS application. It is designed to be used on the iPhone. I used Objective-C to develop and Photoshop for the design aspects. It is a simple app reminding the car buyer of important points when visiting an auto dealership. Please feel free to run it in the xcode simulator to see it in action!

TECHNOLOGIES USED -

Xcode • Objective C • Photoshop

Github Repository

This is a practice code session that I completed after reading a book on HTML5. The project is to use the Canvas API of HTML5 along with the Twitter API (grabbing the engadget twitter feed).


TECHNOLOGIES USED -

HTML5 • CSS • Twitter API

Github Repository

Live Demo

This HTML5 Geolocation Practice session includes the Google Maps API. I created this after reading a book on HTML5. I am originally from New York and thought it would be interesting to track the direct distance between the location of my mobile device and Times Square, New York City. This web application uses the geolocation API of HTML5.


TECHNOLOGIES USED -

HTML5 • CSS • Twitter API

Github Repository

Live Demo

C++ Implementations of fundamental data structures and algorithms such as "Doubly" and "Circularly" Linked Lists and the Binary Search Tree.

TECHNOLOGIES USED -

C++ • Data Structures & Algorithms

Github Repository

Python Implementations of fundamental data structures and algorithms such as "Doubly" and "Circularly" Linked Lists and the Binary Search Tree.

TECHNOLOGIES USED -

Python • Data Structures & Algorithms

Github Repository

This is an example of CSS3 using Transforms & Transitions. It uses the CSS properties transition, transform, duration, opacity and width to alter the square when the user 'hovers' the pointer of the square.

TECHNOLOGIES USED -

HTML5 • CSS • Twitter API

Github Repository

Live Demo

This is a one page advertisement for an iOS application I wrote for NoAutoDealers.com

TECHNOLOGIES USED:

HTML • CSS • Bootstrap 3

Live Demo


This is a project written from scratch using PHP and HTML/CSS.

TECHNOLOGIES USED:

HTML/CSS • PHP

Github Repository


This is an interactive map that I created for my trip to Bali, Indonesia. It includes RESTful Web Service calls to Flickr in order to retrieve the images displayed

TECHNOLOGIES USED -

HTML5 • CSS • jQuery • Flickr REST Web Service

Live Demo

This is an example of the Strategy Design Pattern in Java. I used the Motorcycle as the client which has Drive Behavior.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

This is an example of the Observer Design Pattern in Java. In this example, the observable(subject), StockData, releases stock prices to its observer, ChannelingDisplay.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

This is an example of the Decorator Design Pattern in Java. The Application can be a WebApp or Mobile App. The cost of each project is computed using decorators to add the cost of additional features to the base price of each app.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

This is an example of the Command Design Pattern in Java. It simulates a robot being controlled by a remote control. The remote allows "Flying" and "Firing Missiles" functionality and the Command Design Pattern is used to map the commands to the Remote Control.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

This is an example of the Factory Method Design Pattern in Java. The code was created for an application that lists items for sale. The Factory Method allows for different items to be created, such as Cars and Guitars.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

This is an example of the Abstract Factory Design Pattern in Java. In this code, more detailed items are listed for sale. The Abstract Factory Pattern is used to create the type of items and create the specific details for each item.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

This is an example of the Singleton Design Pattern in Java. This code is for an app which lists an item for sale and manages the availability of that item. It is optimized for multiple thread access using synchronized and the Double Checked Locking technique.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

This is an example of the Facade Design Pattern in Java. A PostListingForSaleFacade is created to simplify the process of listing a Corvette for sale. listItemForSale() wraps up the method calls by delegating the responsibility to the corresponding components in the subsystem.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

This is an example of the Adapter Design Pattern in Java. A used car for sale object is converted to be used in the place of a NewCarForSale. The adapter is used to replace the MSRP price with a price derived from a web service call to get the average sale price of the used car.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

This is an example of the Template Method Design Pattern in Java. Used and New cars are listed for sale. This method defines the skeleton of the Algorithm to list the car. The subclasses redefine certain steps of the algorithm. A Template Method Hook is added to ask the seller if they want to post the car to social media sites.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

This is an example of the Composite Design Pattern in Java. A top level Composite called Listings is created to contain Composites named Car and Motorcycle Listings. Each Listings Composite has several Leaves called ListingItems.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

This is an example of the State Design Pattern in Java. The simulates the process of selling a Real Estate property. The states are For Sale, Pending in Escrow and Sold. The actions are offer made and offer accepted.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

This is an example of the Proxy Design Pattern in Java. Three commonly used types of proxies include Remote Proxy, Virtual Proxy and Security Proxy. In this example, I used the java.lang.reflect package to create a dynamic proxy for security.

TECHNOLOGIES USED -

Java • Design Patterns

Github Repository

Skills

Programming Languages & Tools
Workflow
  • Mobile-First, Responsive Design
  • Cross Browser Testing & Debugging
  • Agile Development & Scrum


Software Engineering

Java/Spring
90% Complete
C++
65% Complete
Python/Flask
90% Complete
HTML/CSS
95% Complete
JavaScript
90% Complete
Web Services
90% Complete

Data Science

Machine Learning Algorithms
85% Complete
Math/Statistics
80% Complete
Visualization
85% Complete
Data Munging
80% Complete

Awards & Certifications

  • Oracle Certified Associate, Java SE 7 Programmer
  • Oracle Certified Professional, Java SE 7 Programmer
  • Oracle Certified Expert, EE 6 Web Services Developer
  • Harvard University Extension School - Data Science Certificate
  • Harvard University - Dean’s List Academic Achievement Award