my proposed techniques for safe AI

Custom reconnaissance techniques to enhance AI model safety and reliability

in progress

I am building models with sole purpose of drawing circuit heat maps and flagging issues in the model under monitoring

in progress

I am building an array of visualization tools for analyzing the activation paths. Reduced data extraction is necessary to do that, I am working on a tool to do that. => https://github.com/modelrecon/mr-recon-tracer
This tool will extract data in the my proposed Activity Cube format

some existing techniques

Pairwise Shapley Values

This is an improvement over traditional feature-attribution with Shapley values, this method explains predictions by comparing pairs of similar data instances, yielding more intuitive, human-relatable explanations while reducing computational overhead. Read more

ViTmiX

This is a sort of a hybrid explainability method targeting vision-transformer (ViT) models, combining multiple visualization techniques to produce clearer explanations of why a model made a certain decision (e.g. in object recognition or segmentation tasks). Read More

XAI‑Guided Context‑Aware Data Augmentation

This method uses XAI insights (which features the model considers important) to guide augmentation so that transformations preserve relevant information; this helps improve performance and generalization, especially in low-resource domains. Read more

Causal-inference and neuro-symbolic explainability approaches
This method goes beyond correlation-based explanations, these aim to reveal causal relationships and embed symbolic (human-understandable) reasoning into neural models to make their decisions more transparent and interpretable. Read more

Mechanistic interpretability / Circuit tracing & sparse decomposition

This is my favourite one, it is a set of techniques that attempt to peer inside deep networks (especially large language models or transformers) and decompose them into simpler, interpretable sub-components (e.g. “circuits,” “features,” “concepts”) rather than treating them as opaque black boxes. Read more

Interactive and user-centered explanations / context-aware XAI

This technique proposes systems where users can ask “what-if” questions (counterfactuals) or receive context-tailored explanations (e.g. for medical or IoT applications) rather than static feature-importance outputs. Read more

my proposed techniques for safe AI

in progress

in progress

some existing techniques

Pairwise Shapley Values

ViTmiX

XAI‑Guided Context‑Aware Data Augmentation

Causal-inference and neuro-symbolic explainability approachesThis method goes beyond correlation-based explanations, these aim to reveal causal relationships and embed symbolic (human-understandable) reasoning into neural models to make their decisions more transparent and interpretable. Read more

Mechanistic interpretability / Circuit tracing & sparse decomposition

Interactive and user-centered explanations / context-aware XAI

Find model recon

Causal-inference and neuro-symbolic explainability approaches
This method goes beyond correlation-based explanations, these aim to reveal causal relationships and embed symbolic (human-understandable) reasoning into neural models to make their decisions more transparent and interpretable. Read more