nicobykhovsky.com

nicobykhovsky.comnicobykhovsky.comnicobykhovsky.com

nicobykhovsky.com

nicobykhovsky.comnicobykhovsky.comnicobykhovsky.com
  • Home
  • Personal Projects
  • Interests
  • More
    • Home
    • Personal Projects
    • Interests
  • Home
  • Personal Projects
  • Interests

SiloRepo (www.silorepo.com)

Generate Your Engineering Portfolio with LLM's

I co-founded Silo to take GitHub repos, PDFs, and slides and turn them into structured, multi-level engineering portfolios. Under the hood, it uses LLM pipelines + vector stores (OpenAI embeddings + Pinecone) to handle massive uploads, recursively chunking repos into navigable summaries. On top of that, I added a social network for engineers, making portfolios something you could share and browse like LinkedIn but focused on what you actually built.

Implementation

The core of the system was a multi-step LLM pipeline. We designed agents that operated in stages: chunking artifacts, embedding them into Pinecone, retrieving relevant slices, and then generating summaries at different abstraction levels (file → repo → portfolio). Each stage had a narrow role, and outputs from one became structured inputs to the next. This avoided token overflows, cut costs, and made the process modular. We also built eval hooks at each stage to check coverage and consistency, so summaries aligned across levels.

Technical Details

On top of that, we experimented with stylistic specialization. Some agents produced terse implementation details, others generated higher-level narratives, and others focused on “why this matters.” Portfolios combined these views into layered outputs. The backend was (largely) model-agnostic, we integrated OpenAI, Llama, and Groq, so we could tune for latency, cost, or fidelity. This gave us flexibility to run large-scale batch processing while keeping the system stable and outputs coherent.

Where is it now?

On the business side, we scaled SiloRepo to thousands of active users at Columbia SEAS, where it became the default way students shared their projects. The system handled hundreds of simultaneous uploads per week and processed repositories with hundreds of files without breaking. Mixpanel analytics gave us detailed usage metrics, feature adoption, portfolio views, and share rates, which we used to iterate quickly.

Done-ish Papers

Speculative Decoding Paper (pdf)Download
Robotic Manipulation Paper (pdf)Download

High Performance Research (Spec Decoding Paper)

SpecAdapt: Adaptive Speculative Decoding for LLM Inference

SpecAdapt was research project on accelerating LLM inference with speculative decoding. Traditional autoregressive decoding is slow because each token requires a full forward pass. Speculative decoding improves throughput by letting a lightweight “draft” model propose multiple tokens while a stronger “verify” model checks them. I implemented and benchmarked several approaches (Medusa, EAGLE, fuzzy SD, draft-and-verify) inside a PyTorch + vLLM harness, profiling GPU performance with PyTorch Profiler and Nsight to measure tokens/sec, SM utilization, memory transfers, and kernel overhead.

Prompt-Aware and Resource-Aware Routing

The second component was prompt-aware routing. We found that acceptance rates and efficiency vary widely by prompt type, code, natural language, or long contexts behave differently under speculative decoding. To handle this, I built a clustering-based routing system that embedded and classified prompts, then dynamically assigned decoding strategies (aggressive vs. conservative draft lengths). This increased throughput by ~1.6× compared to static speculative decoding, and we extended it with resource-aware routing that adapted draft length based on GPU load.The second component was prompt-aware routing. We found that acceptance rates and efficiency vary widely by prompt type, code, natural language, or long contexts behave differently under speculative decoding. To handle this, I built a clustering-based routing system that embedded and classified prompts, then dynamically assigned decoding strategies (aggressive vs. conservative draft lengths). This increased throughput by ~1.6× compared to static speculative decoding, and we extended it with resource-aware routing that adapted draft length based on GPU load.

Evaluation and System Insights

Beyond performance, I built a full evaluation harness to analyze coverage, latency distributions, and acceptance probabilities across workloads. We also experimented with KV cache compression and pruning to reduce wasted compute during rejected drafts. The key insight was that speculative decoding isn’t one-size-fits-all: adaptive strategies that account for prompt structure and hardware state can deliver real speedups, and SpecAdapt demonstrated that in practice.

Systems and Generalization

SpecAdapt wasn’t just about one method, it's related to how speculative decoding can be treated as a system-level optimization problem. By integrating multiple SD methods into a common harness, adding eval hooks, and making routing decisions adaptive, we demonstrated a framework that generalizes across models, workloads, and hardware constraints. The modular design means new decoding strategies or verification models can be swapped in without re-engineering the pipeline, making it a practical path forward for production inference systems.

SILO V1 (desktop MCP)

Description of design for MVP #2

SILO is an ML-embedded file system GUI which augments the current desktop by organizing and classifying files. With a vision to become a secure, permission-based API platform, SILO plans to become "Plaid for X", where it provides autonomous agents selective connectivity to all local files on the Desktop. Initially, Silo will provide Fintech companies with organized financial data.

Demo of MVP #1

The current build uses two large components: an OCR system + text recognition system for preprocessing files followed by a suite of SVM models that fulfill document classification. These work in tandem to organize and categorize financial documents. In the next build, a Daemon will onboard files by intercepting at download... allowing for elegant processing by our classification algorithms. Front-end design is just a placeholder.

Code

https://github.com/Bykho/FileTransferral

Undergrad Projects

CS Project: Trust Based Resume

An System To Verify Professional Resumes

Concept

The purpose of this project was to address a fundamental issue in the job application process - an incentive system which encourages inaccurate resumes by applicants. Here, we leveraged blockchain databasing to establish a trusted resume authentication system that could solve this problem. The central idea was to enable ex-employers to validate the claims made by an applicant, ensuring that the information provided to the interviewer accurately reflects their past roles and responsibilities.

Description

This was my first attempt at building something at scale. I essentially built a new smart contract on Ethereum. These contracts executed predefined logic, notified ex-employers, confirmed claims, and recorded validated transactions. I developed intuitive interfaces for both job applicants and ex-supervisors, enabling applicants to create their resumes while allowing ex-employers to receive notifications and verify the accuracy of these claims. I spoke to a few universities as early customers.

Technical Approach

Tech Stack: Ethereum, Solidity, Truffle, Remix, React.js, Authentication Protocols, NoSQL Database, Data Analytics (Elasticsearch, Kibana).


Partial Diff Eq. Project: Black Scholes Stochastic Modeling

Modeling Partial Differential Equations

Concept

This project revolved around the application of mathematical tools, including stochastic calculus and differential equations, to model the complex dynamics of asset prices in uncertain financial markets. Primarily, I focused on proving mathematically, and empiraclly, the black-scholes equations. I aimed to create a robust model capable of estimating the fair market value of derivatives according to Delta, Gamma, Theta, Vega, and Rho.

Description

I encoded  processes described by stochastic calculus and relied largely on Ito's lemma. I built a model for derivative pricing, utilizing stochastic differential equations, Ito's Lemma, and Feynman-Kac equations to solve. Risk-neutral pricing theory underpinned my approach, with Brownian motion modeling market fluctuations. To validate our model, we ran Monte-Carlo simulations.

Result

Results are below. Includes some pretty LaTeX math.


https://docs.google.com/document/d/1BA_hFB6HVv8zj-2blZjKoq1NlcyyFcmsQmqpx6G42Gw/edit


CS Project: Study Space Finder

CS Project to help students "Find Top NYC Places To Study". Basically a specialized form of Yelp.

Description

A team and I built a user-friendly interactive study-space locator, drawing inspiration from Yelp but with a specified UI for students. We provided users with a convenient tool that would help them find the ideal study environment based on various preferences, such as noise level, amenities like Wi-Fi and printer access, and location convenience. Admittedly, this is a very non-sticky product as once a student finds a good study space, they tend to spend the rest of college there.

Technical Approach

Our stack included React.JS for the front-end, Express.JS for the backend, and API interactions for communication with MongoDB. User authentication was implemented for personalized results. Google Maps API enabled a display of study spaces on an interactive map and location-based searching. 

Past Work Experience

Copyright © 2025 nicobykhovsky.com - All Rights Reserved.

Powered by

This website uses cookies.

We use cookies to analyze website traffic and optimize your website experience. By accepting our use of cookies, your data will be aggregated with all other user data.

Accept