LLM Productivity

While Fleshing Out LLM Risk Measurement

A Layered Risk Discovery Process with Conditional Questioning

Sssshhhh.

I’m late finishing up an article (called the CIA Pyramid, it’ll be good!) and I feel bad because Bryan is as usual so patient.

But I can’t get to it.  This week, I’m giving a talk on AI, and so I decided to brush up on Copilot.

And, the best way for me to learn something is just use it.

Making the “kill-two-birds-with-one-stone” even more valuable – the talk on AI doesn’t have enough time available to do the vendor management part of IT justice.  I really wanted to go into the questions community bankers should ask their vendors, but in order to cover the promise of the talk description, I simply can’t get into that level of detail.

Making matters worse is the lack of guidance on AI, not only from the FFIEC but the technology community in general.  So while we’re all relieved to find documents like the NIST AI Framework or the OWASP AI Governance Checklist, they are still in their early iterations, especially as they address vendor management.

So, I decided . . . why not use Copilot to help me generate a list of questions to ask vendors related to AI, use that as a first iteration of a boilerplate our Clients can use in their own vendor management program, and write an article about it all at the same time.

And, time it.  And compare this time to what it would normally take for me to write an article or create a boilerplate.

My point:  how much time will it take to write an article – no actually, write the article and create the deliverable FOR the article (an LLM VM Checklist), if I write that article about LLM risk using an LLM?

(Kind of like a “meta-article?”)

First step:  the old way.  I checked our last time analysis . . . the one I used to file my CPE last year, so it was circa 12/31/2024 (yes, I was working last minute).  Between April Fool’s Day and the end of the year, I posted an article a week, so it worked out to come at a good time.   In that time, I averaged an hour and fifty-five minutes per blog post.  But that includes time past my first iteration, interacting with the team (see warning below), so let’s say 80% of that is actually writing the first iteration of the article.  1.9*.8=1.52 or an hour and thirty minutes per first iteration.   And that was just to write articles.  I actually put 15.9 hours into the AI Policy Boilerplate, but we did three iterations.  So, let’s just say 5 hours per iteration.  And that doesn’t include Adam’s time.

My goal:  How long will it take me to develop the deliverable.  How much time will it take me to write the article?


WARNINGS AND STIPULATIONS:

This ignores the time it takes the rest of the infotex team to go through the “SSE Process” we use to get our articles ready.  Suffice it to say, proofing goes beyond spelling and grammar on a blog.  Our process includes confirming facts, bowdlerizing, working up graphics when necessary, etc.  I’m only talking about the creative time it takes for me to get to “iteration #1.”

Meanwhile, I did use ChatGPT to assist in various aspects of writing during that time.  In fact, in tongue and cheek fashion, I had ChatGPT summarize a “long article” in one of those articles.  (I did factor this out of my 1.5 hour estimate).

Finally, keep in mind I’m still a novice at Copilot (let’s face it, I’m a novice at ChatGPT too!)  But our company hasn’t jumped through all the hoops per policy, and since we’re currently benefiting from ChatGPT we decided to be a “late majority adopter” of Copilot, and our architecture plan targets this in first quarter 2025.   

And, I always struggle with the formatting part of a LLM conversation.  My prompting does great on content, but when I try to work on the formatting, I end up arguing with LLMs too much.  Thus, maybe to cheat on the timing, because I could not get copilot to include links (and not mess everything else up) I did not have time to check the OWASP / NIST References.  Nor do I know if this is all-inclusive.  (Bryan can have Adam handle this in the SSE phase of the blog post).

And, if Adam can’t get to that, anyway, as usual, I would recommend you “make this your own,” and only ask questions pertinent to your unique situation.

Editor’s note:  Adam did not find time to confirm the OWASP / NIST references.


Still, if only to demonstrate how you can use ChatGPT or Copilot or other LLMs to mitigate AI risk, here’s a pretty decent starting point for questions you can ask as part of your AI vendor management approach.

I started with what I have come to call my “MetaPrompt.”  I’ve posted another very short post with my current “generic” MetaPrompt, but know I have one for several different “categories” of inquiry.

But when using ChatGPT, my MetaPrompt sets the stage so that I get responses in a manner I like.  I issued this prompt, and with some tweaks (and I added FDIC IT Examination Specialist to the expert levels), lots of back and forth, the end result was a three-step process for vetting vendors, meant to end after step one if risk warrants (or actually lack of risk), and about a third of the way through step two if risk warrants.

The human parts of this:  I decided to simplify questions to a “discovery process” that would allow us to quit the process as soon as the lack of risk would warrant or go through the entire process if the vendor is knee-deep in AI development.   And I couldn’t get it to link to the OWASP or NIST frameworks it quoted.

But here’s the result:


A Vendor Management Discovery Process for LLM Risk Measurement:

Step 1: Initial Screening Questionnaire

  • Purpose: Quickly assess AI/ML/LLM/Other technology usage and identify areas where further risk assessment may be needed.
  • Questions:
    1. AI/ML/LLM/Other Technology Utilization: Does your company currently use or plan to use any AI/ML/LLM/Other technologies in your products or services? (Yes/No)
      If Yes, please explain.
      Note:  If no, does this align with your strategic plan?
    2. Notification:  How will we be notified about the above named AI plans, or any new plans or deployments in advance of our own risk-taking
    3. Optional:  Will we be able to opt out?
    4. AI Application Areas: If AI is currently used, in which specific areas or processes is it applied? (Multiple Choice and Open-ended)
      (Check all that apply)
      • ___ LLMs like ChatGPT are used by members of our staff.
      • ___ We use proprietary AI apps developed by third parties.
      • ___ We are developing AI systems in-house.
      • ___ Other (please describe):
    5. Third-Party AI Integration: Are third-party AI/ML/LLM/Other technology models or tools integrated into your environment? (Yes/No)
      If Yes, please explain.
    6. Training Risk:  Is our data being used, or are there plans to use our data to train any artificial intelligence models, tools, systems, or future applications of what would be considered artificial intelligence.  If Yes, please explain.

Step 2: Detailed Risk Assessment Questionnaire

To be used if warranted based on the Initial Screening Questionnaire, as determined by the bank’s Information Security Officer, or if a vendor plans to develop their own AI, train their own model, or use our data for any purpose related to artificial intelligence.

Editor’s Note:  While we did not confirm links, where no link was provided is where the infotex team “humanized” the document with questions that the LLM did not provide.

  • Purpose: Dive deeper into AI/ML/LLM/Other technology risk management practices using OWASP and NIST guidelines to identify specific risks and mitigation strategies.
  • Questions:
    1. AI Governance Policies: Do you have any documented AI governance policies? (Yes/No)
      If No, please explain.
      Reference: OWASP AI Security and Privacy Risk Framework and NIST AI RMF.
    2. AI Audits and Reviews: Has your AI/ML/LLM/Other technology usage been independently audited or reviewed? (Yes/No)
      If No, please explain.
      Reference: NIST SP 1270.
    3. Compliance Framework:  What frameworks are you being audited against (i.e. OWASP, NIST)?
    4. AI Application Areas: If AI is currently used, in which specific areas or processes is it applied? (Multiple Choice and Open-ended)
      (Check all that apply)
      • ___ LLMs like ChatGPT are used by members of our staff.
      • ___ We use proprietary AI apps developed by third parties.
      • ___ We are developing AI systems in-house.
      • ___ Other (please describe):
    5. Third-Party AI Integration: Are third-party AI/ML/LLM/Other technology models or tools integrated into your environment? (Yes/No)
      If Yes, please explain.
    6. Training Risk:  Is our data being used, or are there plans to use our data to train any artificial intelligence models, tools, systems, or future applications of what would be considered artificial intelligence.  If Yes, please explain.
    7. Data Types Used by AI: What specific data types are used by your AI models? (Multiple choice: Personal Data, Financial Data, Operational Data, etc.)
      Reference: NIST AI RMF.
    8. AI Risk Management: Do you have a formal process for identifying and mitigating AI risks? (Yes/No)
      If No, please explain.
      Reference: OWASP AI Security and Privacy Risk Framework.
    9. Data Quality and Integrity: How do you ensure the quality and integrity of the data used in your AI models?
      Reference: NIST AI RMF.
    10. AI Bias Management: Describe your approach to managing AI/ML biases and ensuring fairness in decision-making processes.
      Reference: OWASP AI Security and Privacy Risk Framework.
    11. Model Monitoring and Accuracy: What controls are in place to monitor the performance and accuracy of your AI models?
      Reference: NIST SP 1270.
    12. Explainability and Transparency: How do you address the explainability and transparency of AI decisions made within your environment?
      Reference: NIST AI RMF.
    13. AI Model Security: What specific measures do you take to secure AI models and related data from cyber threats?
      Reference: OWASP Top Ten for ML Security.
    14. Isolation:  Will your AIs have access to external data sources?  If yes, please list those sources (i.e. world wide web, xyz database, inputs from web forms, data from bank transactions, etc.)
    15. Testing:  Beyond the audit issues defined above, how do you plan to test your AI products for usability and customer satisfaction?
    16. Incident Response for AI Failures: Describe your incident response plan related to AI/ML/LLM/Other technology system failures or breaches.
      Reference: NIST Cybersecurity Framework.
    17. Documentation and Reporting: How do you document and report AI risk management activities for internal or external review?
      Reference: NIST AI RMF.
    18. Third-Party AI Vendors: Are there any external partners involved in the development or deployment of your AI systems? How are they vetted?
      Reference: Inter-agency Guidance on Third-Party Risk Management (TPRM).

Step 3: Follow-up Discussion

  • Purpose: Review questionnaire responses with the vendor, discuss identified risks, and agree on any necessary mitigation strategies.
  • Process:
    • Focus on areas flagged during the initial screening and detailed risk assessment.
    • Request supporting documentation where necessary.
    • Agree on next steps, including any additional assessments or control implementations.

Finally, I asked for some pros and cons with this process, which resulted in the following:

Pros and Cons of the Conditional Layered Risk Discovery Process

Pros:

  • Efficiency: By quickly identifying whether AI-related risks need further exploration, the process saves time and resources.
  • Focused Assessment: Allows for a deep dive only where necessary, ensuring that effort is spent on the most relevant areas.
  • Compliance and Documentation: Ensures alignment with OWASP and NIST frameworks, creating a solid basis for future audits and compliance checks.

Cons:

  • Dependence on Initial Screening: If the initial screening is not thorough, there is a risk that some AI-related risks may be overlooked.
  • Complexity in Implementation: Requires well-defined criteria for deciding when to move from Step 1 to Step 2, which may need frequent adjustment based on vendor responses.

Besides the above list being a takeaway, at least as a starting point for your own vendor management program, the important takeaway of this article is, in my opinion, is that I just wrote an article with a satisfactory starting point using AI in 29 minutes.  Sure, the list is not perfect or all inclusive, but it is a starting point on something I’ve been meaning to do for months now.

I saved an hour.  If I write twenty articles a year, that’s half a week.

All because I didn’t want to rush, “The CIA Pyramid.”

Original article by Dan Hadaway CRISC CISA CISM. Founder and Information Architect, infotex


Dan’s New Leaf – a fun blog to inspire thought in  IT Governance.

To see more content like this in your inbox, sign up for our newsletter here!

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

The Magnificent Seven 2023

Seven Trends . . . …that small bank Information Security Officers face in 2023 Another one of those Dan’s New Leaf Posts, meant to inspire thought about IT Governance . . . . Welcome t...

“Lock It” – Awareness Poster

Another awareness poster for YOUR customers (and users). Now that we have our own employees aware, maybe it’s time to start posting content for our customers!Check out posters.infotex.com for th...