Crypto Gloom

What happened to security? seclusion?

The following is a guest post from John deVadoss, Board of Directors of the Global Blockchain Business Council in Geneva and co-founder of the InterWork Alliance in Washington, DC.

Last week, I had the opportunity to present and discuss the implications of AI for security with some members of Congress and staff in Washington, DC.

Today’s generative AI is reminiscent of the Internet of the late 80s: basic research, latent potential, and academic applications, but it’s not yet ready for public release. This time, unfettered vendor ambitions, fueled by minor league venture capital and the Twitter echo chamber, are fast-tracking the brave new world of AI.

The so-called “public” based model is tainted and unsuitable for consumer and commercial use. Privacy abstractions, if they exist at all, leak like a sieve. Security configuration is a work in progress as the attack surface and threat vectors are still being understood. The less said about fantasy guardrails, the better.

So how did we get here? What happened to the security team? seclusion?

A “compromised” foundation model

The so-called “open” model is not open at all. Various vendors boast openness by providing open access to model weights, documentation, or tests. Despite this, none of the major vendors provide training datasets or anything close to their manifests or lineage to make their models replicable and reproducible.

This opacity associated with training datasets means that if you plan to use one or more of these models, you have no ability as a consumer or organization to determine or verify the extent of data contamination. Regarding potentially illegal content, including IP and copyright.

Crucially, without the manifest of the training data set, there is no way to identify or verify malicious content that does not exist. Malicious actors, including state-sponsored actors, plant Trojan horse content across the web that models collect during training, resulting in unpredictable and potentially malicious side effects during inference.

Once the model is corrupted, there is no way to untrain it and the only option is to destroy it.

“Porous” security

Generative AI models are the ultimate secure honeypot because “all” data is collected in one container. In the AI ​​era, new types of attack vectors emerge. The industry has yet to understand the implications for protecting these models from cyber threats and how cyber threat actors can use them as tools.

Malicious prompt injection techniques can be used to pollute the index. Data poisoning can be used to corrupt weights. Embedding attacks, including inversion techniques, can be used to extract rich data from the embeddings. Membership inference can be used to determine whether certain data is included in a training set, etc., and that’s just the tip of the iceberg.

Threat actors can access confidential data through model inversion and programmatic queries. This may damage or affect the potential behavior of the model. As previously mentioned, uncontrolled data collection leads to the threat of state-sponsored cyber activities embedded through Trojans and the like.

“Leaky” privacy

AI models are useful because of the data sets they are trained on. The indiscriminate collection of large-scale data poses unprecedented privacy risks to individuals and the general public. In the AI ​​era, privacy protection has become a social issue. Regulations that primarily address personal data rights are inadequate.

In addition to static data, dynamic dialog prompts must be protected by IP handling. If you are a consumer co-creating an artifact with a model, you do not want the prompts that direct this creative activity to be used to train the model or shared with other consumers of the model.

If you are an employee using a model to deliver business results, your employer expects your messages to remain confidential. Additionally, prompts and responses require a security audit trail in case liability issues surface for either party. This is primarily due to the stochastic nature of these models and the variability of their responses over time.

What happens next?

We are dealing with a different kind of technology than anything we have seen before in the history of computing: technology that exhibits emergent, latent behavior at large scale. Yesterday’s approaches to security, privacy, and confidentiality no longer work.

Industry leaders are throwing caution to the wind, and regulators and policymakers have no alternative but to intervene.