Case Study 01

MySightGuide: Vision AI

Hybrid processing pipeline enabling visually impaired users to perceive environments through on-device Computer Vision and Gemini reasoning.

System Logic Flow

MySightGuide Sequence Diagram
Inference Engine

Utilizing MediaPipe for real-time object detection. Optimized TFLite models for < 100ms latency.

Semantic Reasoning

Gemini API transforms raw data into human-centric, spatial descriptions for context-aware navigation.

ObjectDetectorHelper.kt Native Android // MediaPipe
val optionsBuilder = ObjectDetector.ObjectDetectorOptions.builder()
    .setScoreThreshold(0.5f)
    .setRunningMode(RunningMode.LIVE_STREAM)

objectDetector = ObjectDetector.createFromOptions(context, optionsBuilder.build())
Case Study 02

SentriGrade: OCR Logic

Extraction Pipeline

SentriGrade OCR Pipeline
OCRGradingProcessor.kt Deterministic Logic
fun gradeDocument(visionText: Text, answerKey: String): GradeResult {
    val similarity = calculateSimilarity(visionText.text, answerKey)
    return if (similarity >= 0.98) GradeResult.PASS() else GradeResult.FAIL()
}
Roadmap 2026

LilianVault: Offline RAG

On-Device Knowledge Base

LilianVault RAG Architecture
ChromaDB Mobile Gemini Nano SQLDelite Vector
Real-Time Architecture

Gemini Live Multimodal

A high-concurrency architecture for **Native Android** that streams real-time video frames and audio buffers to the Gemini 2.0 Flash model via WebSockets for ultra-low latency spatial reasoning.

Multimodal Streaming Pipeline

01. CAPTURE

CameraX & AudioRecord pipe raw bytes into a MediaCodec encoder.

02. STREAM

Persistent Bidi-gRPC connection manages full-duplex communication.

03. SYNTHESIS

On-device AudioTrack plays back real-time low-latency AI speech.

MultimodalLiveClient.kt Kotlin // Bidi-gRPC
// Initializing the Gemini Multimodal Live Session
val config = liveGenerationConfig {
    model = "models/gemini-2.0-flash-exp"
    responseModalities = listOf("AUDIO")
}

val liveSession = generativeModel.startLiveSession(config)

// Pushing raw camera frames to the stream
scope.launch {
    cameraStream.collect { frame ->
        liveSession.sendVideoFrame(frame)
    }
}