The page uses standard browser APIs to request microphone input, route it through a Web Audio graph, and read real-time frequency/time-domain data. This is normal Web Audio territory, not speculative AI.
Principles / citations / limits
HowSound BuddyWorks
Sound Buddy is not a song recognizer. It is a local browser audio analyzer that turns broad acoustic cues into learning prompts for electronic music: low-end weight, top-end texture, pulse stability, approximate tempo, broad genre likeness, and subgenre candidates.
Short answer
Music information retrieval has long used timbral, rhythmic, spectral, temporal, and tonal descriptors for classification, similarity, tempo, and annotation tasks. Sound Buddy uses a small, transparent subset of that family of ideas.
The project should not claim exact subgenre identification. It should say "leans", "likeness", "candidate", and "learning cue", because genre classification is not purely objective and the page does not use a trained reference dataset.
The pipeline
The browser asks for microphone access through `getUserMedia`. If the user denies permission, Sound Buddy cannot listen live and falls back to practice mode.
Basis: W3C Media Capture defines APIs for requesting access to microphones and cameras; MDN documents explicit user permission requirements.
The microphone stream is connected to an `AnalyserNode`. The analyzer exposes time-domain and frequency-domain data without needing to save or upload audio.
Basis: Web Audio defines an audio routing graph; `AnalyserNode` is designed for real-time frequency and time-domain analysis.
The code reduces raw bins into broad bands: sub, kick/bass, low-mid body, mids, top, and air. It also estimates balance, brightness, pulse stability, and rough BPM.
Basis: MIR systems commonly describe audio with spectral, temporal, rhythmic, and high-level descriptors.
The app maps features to prompts: "low end is leading", "top texture opened", "pulse is steady", "genre mix reads breaks plus electro plus techno".
Basis: the translation from features to advice is editorial and educational; it is deliberately framed as guidance, not identification.
What is theoretically strong
The browser can supply frequency-domain values from live audio. MDN documents that `getByteFrequencyData()` copies current frequency data into an unsigned byte array; the data spans from 0 to half the sample rate. That supports meters, spectral balance, and texture prompts.
Audio analysis libraries commonly treat each frame of a magnitude spectrum as a distribution over frequency bins. Spectral centroid is one standard example: it estimates where spectral energy is centered. Sound Buddy uses simpler band ratios and brightness, but the concept is aligned.
Beat tracking research commonly works from onset strength, tempo estimation, and beat peak selection. Sound Buddy does a lighter, browser-friendly pulse estimate. It is enough for "rough BPM / stable pulse" comments, not enough for definitive beat tracking.
Tzanetakis and Cook's classic genre-classification work proposed timbral texture, rhythmic content, and pitch content feature sets. Electronic styles often differ strongly in rhythm, low-end weight, and timbre, so those cues are useful for education.
What is not strong enough to overclaim
The project does not fingerprint audio, compare against a catalog, or recognize artists. It cannot tell you the track title.
"Hard techno 71%" or "UKG / garage 56%" means similarity to a cue profile inside this app. It is not a database-backed ground truth label.
Phone speakers, laptop mics, club PA bleed, distance from the source, echo, and browser audio processing can shift the spectrum. The advice is best used as a listening aid.
Evidence map
Measures whether the microphone signal is loud enough to analyze. Below a threshold, the page waits.
Supported by standard time-domain audio analysis; useful for avoiding false notes on silence.
Compares sub and bass bands against total spectrum. This helps teach kick pressure, bass tail, and physical weight.
Reasonable for techno education; vulnerable to playback system and microphone response.
Compares high and air bands. This helps identify hats, metallic noise, hiss, resonance, and perceived intensity.
Grounded in spectral feature thinking; not a direct instrument detector.
Looks for repeated low-end movement and estimates a rough pulse. It can say "fast pulse" or "steady loop".
Beat tracking is a serious MIR task; this browser heuristic is intentionally lighter than academic beat trackers.
Ranks broad lanes such as techno, electro, house, breaks, acid, trance, garage/UKG, jungle/DNB, bass/dubstep, downtempo/ambient, hard dance/hardcore, and industrial by feature similarity over a recent audio-stream window, not one isolated frame.
The current window is about 12 seconds: long enough to cover a short groove phrase at common club tempos, short enough to react when the arrangement changes.
Maps the recent stream cues to detailed labels such as hard techno, dub techno, electro funk, UKG, jungle/DNB, dubstep, footwork, psytrance, hardstyle, hardcore/gabber, ambient, and hard trance.
The weakest layer scientifically. It is editorial taxonomy plus signal heuristics, so the UI must keep confidence language cautious.
Privacy position
The current implementation reads browser analyzer arrays and produces comments in the page. It does not need server audio processing.
Microphone access depends on browser permission. A denial is handled as a normal state, not an error to bypass.
The safe public claim is: no audio is intentionally hosted, stored, uploaded, or fingerprinted by this page.
Sources
These references support the technical basis and the project's caution around genre naming. They do not certify the app's exact thresholds; those thresholds remain product heuristics.
Defines the high-level browser audio graph model used for processing and analyzing audio in web applications.
w3.org/TR/webaudio-1.1Documents the browser node that provides real-time frequency and time-domain analysis information.
MDN AnalyserNodeDocuments how frequency data is copied into a byte array and how frequency bins map up to half the sample rate.
MDN frequency dataDefines browser APIs for requesting access to local media devices such as microphones.
w3.org/TR/mediacapture-streamsDocuments the user-permission requirement for opening microphone or camera input.
MDN getUserMediaDefines common infrastructure for permission states around powerful web-platform features.
w3.org/TR/permissionsClassic IEEE paper on musical genre classification using timbral texture, rhythmic content, and pitch content features.
Musical genre classification of audio signalsEarly genre-classification work noting perceptual criteria related to texture, instrumentation, and rhythmic structure.
Automatic musical genre classificationCompares human and automatic genre classification and supports the project's caution that genre labels are inherently subjective.
Human and automatic genre classificationDocuments a standard spectral descriptor that treats a magnitude spectrum as a frequency-bin distribution.
librosa spectral centroidDocuments a beat-tracking pipeline based on onset strength, tempo estimation, and beat peak selection.
librosa beat trackReference MIR library with spectral, temporal, tonal, rhythm, and high-level descriptors for audio analysis.
essentia.upf.eduShows MIR as an interdisciplinary research field covering rhythm, beat, tempo, timbre, style, genre, and classification.
Transactions of ISMIRPeer-reviewed discussion of tempo-estimation evaluation, applications, metrics, and datasets.
Music Tempo Estimation: Are We Done Yet?Operational conclusion
Sound Buddy is viable as an always-on educational listening companion. It should remain transparent, local, and cautious: strong on feature-based feedback, modest on genre certainty, explicit about privacy, and clear that style detection is a learning cue rather than proof.