my proposed techniques for safe AI
Custom reconnaissance techniques to enhance AI model safety and reliability
in progress
I am building models with sole purpose of drawing circuit heat maps and flagging issues in the model under monitoring
in progress
I am building an array of visualization tools for analyzing the activation path.
I feel that proper human visualization of activation data is necessary for human in the loop intervention
some existing techniques
Pairwise Shapley Values
This is an improvement over traditional feature-attribution with Shapley values, this method explains predictions by comparing pairs of similar data instances, yielding more intuitive, human-relatable explanations while reducing computational overhead. Read more
ViTmiX
This is a sort of a hybrid explainability method targeting vision-transformer (ViT) models, combining multiple visualization techniques to produce clearer explanations of why a model made a certain decision (e.g. in object recognition or segmentation tasks). Read More
XAI‑Guided Context‑Aware Data Augmentation
This method uses XAI insights (which features the model considers important) to guide augmentation so that transformations preserve relevant information; this helps improve performance and generalization, especially in low-resource domains. Read more
Causal-inference and neuro-symbolic explainability approaches
This method goes beyond correlation-based explanations, these aim to reveal causal relationships and embed symbolic (human-understandable) reasoning into neural models to make their decisions more transparent and interpretable. Read more
Mechanistic interpretability / Circuit tracing & sparse decomposition
This is my favourite one, it is a set of techniques that attempt to peer inside deep networks (especially large language models or transformers) and decompose them into simpler, interpretable sub-components (e.g. “circuits,” “features,” “concepts”) rather than treating them as opaque black boxes. Read more
Interactive and user-centered explanations / context-aware XAI
This technique proposes systems where users can ask “what-if” questions (counterfactuals) or receive context-tailored explanations (e.g. for medical or IoT applications) rather than static feature-importance outputs. Read more
