How to protect your privacy while using AI

In Spike Jonze's film 'Her,' Theodore Twombly develops an intimate relationship with Samantha, an artificially intelligent operating system. Samantha learns, evolves, and adapts based on her interactions with Theodore and, as it turns out, hundreds of other users. While our current AI, like ChatGPT, might not be professing love, the underlying principle of learning from our every typed word presents a strikingly similar, if less romanticised, dynamic. We pour our thoughts, queries, creative sparks, and sometimes sensitive data into these digital confidantes. But what happens to that information, and how do we ensure our digital privacy and intellectual content aren't just lost whispers in the server?

her trailer
Joaquin Phoenix spins romantically with a phone in ‘Her’ 

The allure of generative AI is undeniable. These tools are helpful assistants, partners in brainstorming, and tireless soulless creators. Yet, this burgeoning human-AI collaboration rests on a foundation of data, often, 'our' data, prompting a more sophisticated conversation about privacy, content ownership, and the subtle trade-offs we make every time we ask AI to dumb down complex topics or summarise long, senseless messages from friends.

In Memphis, Elon Musk's xAI Supercomputer Stirs Hope and Concern - Bloomberg
xAI Data Centre in Memphis

The power of Large Language Models (LLMs) stems from their training on unimaginably vast datasets and their ongoing learning from interactions. When you engage with an AI, your prompts, queries, and the very essence of your creative or intellectual explorations can become part of this intricate learning process. This isn't inherently malicious; it's how these systems refine their understanding, improve their responses, and become more useful. However, the nuanced implications for individual privacy and content integrity are significant and warrant careful consideration.

Beyond simple data grabs: The complexities of AI learning

It's easy to frame the data issue as a straightforward transaction where users provide data and AI companies consume it. The reality is more intricate. AI models don't 'remember' conversations in a human sense, but patterns, styles, and information from user inputs contribute to the statistical tapestry that allows them to generate coherent and contextually relevant outputs. The concern isn't just about a specific piece of data being regurgitated, but how cumulative inputs shape the AI's knowledge base and capabilities.

This raises several points,

The spectrum of sensitivity: Not all information shared with an AI carries the same privacy weight. Brainstorming novel plots with an AI has different implications than summarising a confidential internal memo or discussing sensitive personal health details. Users must develop a discerning approach, evaluating the potential risks based on the 'nature' of the data they are inputting. A casual query is 'different' from proprietary code or a deeply personal secret.

The 'Black Box' and unforeseen connections: While AI developers strive to prevent direct regurgitation of training data, the internal workings of complex LLMs can be opaque, often referred to as a 'black box.' It can be challenging to trace precisely how specific inputs influence future outputs or to guarantee that anonymised data snippets might not, in aggregate or combined with other information, inadvertently reveal sensitive patterns or connections.

Content co-creation and murky ownership: When you use an AI to help generate text, code, or images, who owns the output? Terms of service vary significantly. Some platforms grant users broad rights to the generated content, while others might retain certain rights, particularly concerning the use of that content to further train their models. The line between user input and AI-generated material can blur, creating a complex scenario for intellectual property, especially for creative professionals. Is the AI a tool, like a word processor, or a collaborator with its own derived standing? The legal and ethical frameworks are still catching up.

The implicit value exchange: Many AI tools are offered for free or at low cost. The implicit understanding is often that user interactions contribute to the model's improvement, forming a non-monetary exchange. The nuance lies in ensuring this exchange is transparent and that users have genuine agency in deciding how, or if, their data participates in this value loop, especially when highly sensitive or commercially valuable information is involved.

Strategies for a more considered AI engagement

Navigating this landscape requires more than just basic security hygiene; it demands a mindful and informed approach, which we will discuss one-by-one.

Embrace informed discretion: This remains paramount. Before inputting any information, pause and consider its sensitivity. If the potential exposure of that data would be detrimental, seek alternative methods or use AI with extreme caution, perhaps by heavily anonymising the input. Not every task is suitable for every AI.

Deep dive into terms and evolving policies: Even though the task is humungous, resign yourself to the necessity of scrutinising terms of service and privacy policies, particularly sections on data usage, content ownership, and opt-out provisions. Crucially, understand that these policies can and do change. What might be an 'opt-out of training' feature today could be modified tomorrow. Ongoing awareness is key.

Leverage platform controls (and understand their limits): Many AI providers, like OpenAI, are introducing more granular controls, such as options to prevent chat history from being used for training or to request data deletion. Utilise these features, but also understand their scope. Opting out of training for future models might not erase data already processed or data used for trust and safety monitoring.

The anonymisation imperative, when possible: For tasks involving sensitive information where AI assistance is still desired, robust anonymisation or pseudonymisation is critical. This involves more than just changing names; it means removing any data points that, even indirectly, could lead back to an individual or confidential source. This can be challenging and requires careful thought.

Differentiate between consumer and enterprise solutions: For business contexts, enterprise-grade AI solutions often come with more stringent data privacy commitments, dedicated instances, and contractual safeguards that are typically absent in free, consumer-facing versions. These can be a worthwhile investment if dealing with proprietary or regulated data.

Consider the source and architecture: Not all AIs are created equal. Some newer models or platforms might be designed with privacy as a core tenet, perhaps running locally on your device (on-device AI), which drastically reduces data transmission to external servers. Open-source models, while requiring technical expertise, can offer greater transparency into their operations.

Advocate for and support privacy-preserving technologies: The development of techniques like federated learning (where models are trained on decentralised data without the raw data leaving the user's device) and differential privacy (adding statistical 'noise' to data to protect individual records) is crucial. Supporting and choosing services that invest in these methods can drive the industry towards more inherently private AI.

The path forward

The journey with AI isn’t about choosing sides between innovation and privacy, but creating a symbiotic relationship where both thrive. This requires shared responsibility; users must become more digitally literate and aware of the data they share, developers must embrace privacy-by-design and transparent data practices, and policymakers must create flexible yet robust frameworks to protect rights without stifling progress. As we interact with tools like ChatGPT, recognising the nuances of data ownership and consent is crucial to shaping a future where technology serves us, not the other way around. And if you’ve made it this far without asking an AI to summarise it, congratulations, you’ve already started engaging more thoughtfully.

News