Generative artificial intelligence (GenAI) is an unequivocal game changer for CEOs and CIOs delivering data-driven solutions. This was made abundantly clear after the introduction of systems such as ChatGPT1, DALL-E2, Gemini (formerly known as Bard),3 and more in 2022 and 2023.4 AI has come a long way since its inception and now encompasses a broad spectrum of capabilities, ranging from natural language processing (NLP) and computer vision to decision making and problem solving for mundane, complex tasks. It has become a powerful tool for user experience, business process development, and delivery of personalized solutions at scale. Some examples include an e-commerce engine that predicts products based on each customer’s past behaviors and purchases, or a Healthcare AI engine that recommends early disease detection and treatment plans based on each patient’s unique needs and characteristics. Many organizations are still trying to determine their AI strategy, and the pipeline security and ethics of large language models (LLM) must be factored into this. As organizations increasingly integrate AI into their operations, it is crucial to understand that it is not a one-stop solution.
AI has come a long way since its inception and now encompasses a broad spectrum of capabilities, ranging from NLP and computer vision to decision making and problem solving for mundane, complex tasks.
Effective risk management strategies must be implemented and evolved along with AI-based solutions. Successful AI deployment requires 5 critical stages:5
- Data collection is the process of collecting and gathering raw data from multiple sources (this is done by integrating data sources with the target).
- Data cleaning/preparation is the process of cleaning the data before using it for the AI pipeline (this is done by removing duplicates and excluding non-supported formats, empty cells, or invalid entries that can lead to technical issues).
- Model development is the process of building data models by training large datasets, analyzing the dataset’s patterns, and making predictions without additional human intervention. An iterative MDD (model-driven development) is followed here.
- Model serving is the process of deploying machine learning (ML) models into the AI pipeline and integrating them into a business application. These model functions, available as application programming interfaces (API), are mostly deployed at scale, and can be used to perform tasks or make predictions based on real-time or batch data.
- Model monitoring is the process of assessing the performance and efficacy of the models against the live data and tracking metrics related to model quality (e.g., latency, memory, uptime, precision accuracy, etc.) along with data quality, model bias, prediction drift, and fairness.
While enterprises can use gen AI solutions to expedite AI model development, it also poses high risk6 to critical proprietary and business data. Data integrity and confidentiality are crucial and associated risk must be considered before approving new AI initiatives. AI solutions can introduce serious malware risk and impact if the right practices are not followed, or the proper safeguards are not implemented. Several types of attacks can compromise the integrity and reliability of the data models:
- A data pipeline attack is conducted when the entire data pipeline, from data collection to data training, provides a large attack surface that can be easily exploited to obtain access, modify data, or introduce malicious inputs and cause privacy violations.
- A data poisoning attack involves inserting harmful or misleading data into training datasets to intentionally influence or manipulate model operation. It can also be facilitated by modifying the existing dataset or deleting a portion of the dataset.
- A model control attack involves malware taking control of the model’s decision-making process, resulting in erroneous outputs. This primarily occurs when externally accessible AI models are intentionally manipulated (e.g., taking control of an automated vehicle), which could negatively impact human life.
- A model evasion attack is a real-time data manipulation attack that may take the form of changing user inputs or device readings to modify the AI's responses or actions.
- A model inversion attack is a reverse engineering attack that can be used extensively to steal proprietary AI training data or personal information by exploiting the model output. For example, an inversion attack on a model predicting cancer can be used to infer a patient's medical history.
- A supply chain attack involves attackers hacking third-party software components (e.g., open-source third-party libraries or assets) included in the model training, deployment, or pipeline to insert malicious code and control the model’s behavior. In one example, 1600 HuggingFace API tokens were leaked,7 and hackers accessed the accounts of various organizations using HuggingFace API in their model development supply chain.8
- A denial of service (DoS) attack overloads AI systems with numerous requests or inputs, resulting in performance degradation or denial of service due to resource downtime or exhaustion. Though it does not inherently result in the theft or loss of critical information, recovery can cost the victim time and money. Flooding services and crashing services are two popular methods.
- A prompt attack includes manipulative tactics where attackers deceive users into revealing confidential information by exploiting security weaknesses in LLMs used by AI-driven solutions such as chatbots and virtual assistants. In one real-world prompt attack, a security flaw found in Bing successfully tricked models into divulging confidential information.9
- Biased risk means that AI systems may create unfair results or promote social prejudices, posing ethical, reputational, and legal issues. Given the fact AI solutions have the potential to revolutionize many industries and improve lives in countless ways, these biases may severely impact people of color or users not well represented in the training dataset. For example, a face detection solution may not recognize non-white faces if those users were not added to the training set.
Recommendations
There are 6 key recommendations to enhance the security of data models, the ML operations (MLOps) pipeline, and AI applications. These best practices provide security guardrails and monitoring of assets while complying with regulations across respective geographies:
- Practice zero trust with AI10—Access to model/data must be denied unless the user or application can prove their identity. Once identified, the user should be granted access to only the required data for a limited time resulting in least-privilege access, rigorous authentication, and continuous monitoring. The trust but verify approach to AI results in models being continuously questioned and evaluated. The vault (secrets management), identity and access management (IAM), and multifactor authentication (MFA) play a central role here.
- Design an AI bill of materials (AIBOM)—Similar to a software bill of materials (SBOM) but prepared exclusively for AI models, resulting in enhanced transparency, reproducibility, accountability, and ethical AI considerations. The AIBOM11 details the building components comprising an AI system's training data sources, pipelines, model development, training procedures, and operational performance to enable governance and assess dependencies.
- Implement a comprehensive data supply chain—The access to clean, comprehensive, and enriched unstructured and structured data is the critical building block for AI models. The enterprise AI pipeline and MLOps solutions supporting orchestration, continuous integration/continuous deployment (CICD), ecosystem, monitoring, and observability are needed to automate and simplify ML workflows and deployments.
- Maintain compliance with local regulations —Organizations must adhere to AI data regulations and compliance enforced in their respective region.12 Recent regulations, such as the US’s “Human-Centered Design Approach” outlined in H.R. 5628 (Algorithmic Accountability Act),13 R. 3220 (Deep Fakes Accountability Act),14 and the European Union’s Artificial Intelligence Act15 are pioneering measures in the realm of digital governance.
- Engage in continuous improvement—With the continuous evolution of AI processes and models, the security of the AI ecosystem is a journey. A significant attempt must be made to frequently provide cybersecurity training to not only the data scientists and engineers but also developers and operations team building and supporting AI applications. CISA (Cybersecurity & Infrastructure Security Agency) has recommended cybersecurity training and exercises for both federal and non-federal employees, which must be mandatory training on an annual basis.
- Utilize the balanced scorecard-based approach for CISO’s—Chief information security officers (CISOs) are now being invited to boardroom discussions to share their cybersecurity vision and align it with business priorities. A metrics-driven balanced scorecard solution provides a comprehensive approach to protect enterprise assets from malicious threats.16 A balanced scorecard-based cybersecurity strategy map can reduce business risk, increase productivity, enhance customer trust, and help enterprises grow without the fear of a data breach.
AI models play a critical role in delivering competitive advantage to organizations; therefore, AI process integrity and confidentiality must be maintained by securing the most important assets and formulating a multipronged approach to achieve AI security.
Conclusion
It is critical to safeguard data and assets by compartmentalizing AI operations and adopting a metrics-driven approach. A balance between harnessing AI’s power and addressing its data security and ethical implications is crucial for a sustainable business solution. "Data is the new oil," and AI model development is the process of refining and gaining valuable insight from data. Organizations should not lose this critical asset while defining their AI pipeline because cyberattackers continuously innovative methods to access it. It is time to look at it through a new lens and implement proactive methods to stay ahead of bad actors.
Endnotes
1 OpenAI, “Introducing Chat GPT,” 30 November 2022, http://openai.com/index/chatgpt/
2 OpenAI, DALL-E2
3 Google Gemini, Gemini
4 Reber, D.; “Six Steps Toward AI Security”, NVIDIA Blogs, 25 September 2023
5 Takyar, A.; “AI Model Security: Concerns, Best Practices, Techniques and Industry Scenarios,” Leeway Hertz
6 Hewlett Packard, “What is AI Security?,” 8 July 2024
7 Arghire, I.; “Major Organizations Using ‘Hugging Face’ AI Tools Put at Risk by Leaked API Tokens,” Security Week, 5 December 2023
8 Arghire, Major Organizations Using ‘Hugging Face’ AI Tools Put at Risk by Leaked API Tokens, http://www.securityweek.com/major-organizations-using-hugging-face-ai-tools-put-at-risk-by-leaked-api-tokens/
9 Burgess, M.; “The Security Hole at the Heart of ChatGPT and Bing,” WIRED,25 May 2023
10 Laplante, P.; Voas, J.; “Zero Trust Artificial Intelligence?,” IEEE Computer Society Digital Library, vol. 55, 2022, p. 10-12
11 Lim, L.; “The Essential Guide to AI Bills of Materials (AIBOMs),” Snyk
12 TechTarget, “AI regulation: What Businesses Need to Know in 2024,” 2024
13 Congress.gov, H.R.6580 - Algorithmic Accountability Act of 2022
14 Congress.gov, H.R.3230 - DEEP FAKES Accountability Act
15 European Parliament, “EU AI Act: First Regulation On Artificial Intelligence,” 6 August 2023
16 Mamgai, A.;” How CISO’s Can Take Advantage of the Balanced Scorecard Method,” ISACA®, 1 February 2024
Arun Mamgai
Has more than 18 years of experience in cloud-native cybersecurity, application modernization, open-source secure supply chains, AI/machine learning (ML), and digital transformation (including balanced scorecards, data management, and digital marketing) and has worked with Fortune 1000 customers across industries. He has published many articles highlighting the use of generative AI for cybersecurity and securely developing modern cloud applications. He has been invited to speak at leading schools on topics such as digital transformation and application-level attacks in connected vehicles and has been a judge for one of the most prestigious awards in the technology sector. He has also mentored multiple start-ups and actively engages with a nonprofit institution that enables middle school girls to become future technology leaders.