Information security and infrastructure overview

I. Introduction

Welcome to SerenityGPT. The core service is knowledge retrieval, focusing on creating a human interface for your company's data, enhancing accessibility and understanding. In a world inundated with information, the mission is clear: to offer interfaces that are intuitive and securely intertwined with the confidentiality and privacy of your corporate data.

Service overview

SerenityGPT provides knowledge retrieval services designed to grant your enterprise easy access to its crucial data. This is accomplished through a unified interface that lets you interact with data from various sources, ensuring both accessibility and security.

Privacy: a fundamental commitment

Understanding that many prospective clients operate in sectors where privacy is paramount, SerenityGPT places a premium on ensuring the confidentiality and integrity of data. Privacy is not merely about compliance; it's a fundamental commitment that underpins relationships with clients who operate in privacy-sensitive or regulated environments.

SerenityGPT aligns its practices with the principles of the General Data Protection Regulation (GDPR). The approach to privacy mirrors GDPR's core principles, offering transparency, accountability, and respect for individual data rights.

For the informed CIO

This document is intended for Chief Information Officers (CIOs) of potential client companies who are considering SerenityGPT's services. It provides a transparent overview of data management, protection, and utilization policies, offering the insight needed for informed decision-making.

For any inquiries or clarifications regarding privacy practices, SerenityGPT is readily available to engage in discussions to provide the necessary information.

II. Definitions

In the context of this privacy policy, the following terms have the meanings outlined below.

1. Knowledge retrieval

The process through which SerenityGPT’s system accesses, extracts, and presents information from the client's data repositories, allowing users to gain insights and make informed decisions efficiently.

2. Human interface

A user-friendly platform provided by SerenityGPT, designed to facilitate interaction between individuals and the company’s data in an intuitive and accessible manner.

3. On-premises deployment

Refers to the installation and operation of SerenityGPT’s services within the client's own infrastructure, ensuring that the client has complete control over the physical, technical, and administrative safeguards of the data processed.

4. Embedding process

A computational procedure where data is transformed and represented in a format that enhances the efficiency and accuracy of SerenityGPT’s knowledge retrieval system, enabling advanced search and analysis functionalities.

5. Permissions metadata

Data attributes used to control access to documents and information, ensuring that users can only access and retrieve data for which they have been granted explicit authorization. Such permissions and usage are defined during scoping exercises with clients.

6. Microsoft Azure API

An application programming interface provided by Microsoft Azure, optionally used by SerenityGPT to invoke GPT-4 to provide a concise answer based on the information contained in search results.

General Data Protection Regulation, a legal framework that sets guidelines for the collection and processing of personal information from individuals who live in the European Union (EU).

8. Data subject

Any individual whose personal data is being collected, held, or processed.

9. Data controller

The entity that determines the purposes, conditions, and means of the processing of personal data.

10. Data processor

An entity that processes personal data on behalf of the data controller.

III. Data collection

1. Types of data collected

SerenityGPT collects various types of data as part of its knowledge retrieval services. The types of data collected are defined and agreed upon with the client prior to deployment and may include, but are not limited to:

Knowledge base articles
Support tickets
Documentation
Shared documents
Email history
Internal chat systems

The scope of the data collected is entirely determined and consented to by the client, ensuring clear boundaries and expectations are set from the inception of the service.

2. How data is collected

Data ingestion API: SerenityGPT offers an API designed for data ingestion and updating. This API is part of the on-premises deployed service for ingesting data from the client’s systems into the SerenityGPT service. The API supports modification and deletion of existing data, ensuring both the content and permissions remain up to date.

3. Permissions metadata

User identifiers: Permissions metadata includes identifiers like email addresses that delineate access and user rights within the system.

Group information: Where applicable, group information is incorporated, offering an additional layer for streamlined access and permission management.

As SerenityGPT’s system is deployed on-premises, the data is unequivocally owned by the client. Consequently, obtaining consent from end-users falls under the client's responsibility, aligning with their existing practices for data storage and processing on their servers.

5. System logs and health monitoring

On-premises logs: SerenityGPT’s service generates logs, but these are strictly stored on-premises and are not transmitted outside of the client’s network.

Health monitoring (optional): Clients have the option to enable a minimal health-check monitoring system. This system is non-invasive, providing basic binary feedback on the service's status—either "working" or "has issues." If issues are detected, SerenityGPT engineers collaborate with the client to examine the logs on the server, facilitating troubleshooting and resolution.

IV. Data use

1. Purpose of data use

SerenityGPT utilizes the collected data primarily to facilitate and optimize the knowledge retrieval process for clients. The imported data undergoes various processes to enhance the efficiency and relevance of the search results presented to end-users.

1.1 Document processing

Upon import, documents are segmented into chunks, with each chunk undergoing an embedding process. Embeddings, along with document chunks and associated metadata (including permissions), are stored within the on-premises database.

1.2 Search and retrieval

When a user initiates a search query, the system conducts a multi-stage process, searching through the database and fetching documents deemed most relevant based on the calculated embeddings. This process also includes re-ranking of documents to improve the accuracy and relevance of the retrieved information.

1.3 LLM extension (optional)

For clients who have enabled the Language Model (LLM) extension, selected portions of the retrieved documents may be sent to Microsoft Azure API. This step allows for the generation of coherent and concise answers, providing users with not only relevant documents but also human-readable responses.

2. Data processing

Embedding process: Embeddings enable the calculation of document similarity, enhancing the precision of search results. This process is conducted locally using an on-premises model, ensuring that data never leaves the client's infrastructure during this phase.

3. Data improvement and customization

User feedback: The system allows for the collection of user feedback on search results and answers, which helps hide mismatched documents and improve answers.

Click data (optional): Clients may opt to have the system collect data on user clicks within the search results. This data, while never leaving the on-premises server, assists in re-ranking documents and refining the search experience for users, leading to a more efficient and user-friendly service over time.

4. Limitations on data use

All uses of data within the SerenityGPT system align with the permissions and consent parameters set by the client and their end-users. Additionally, the data processed by SerenityGPT is solely used for the purposes outlined in this policy and agreed upon by the client, adhering strictly to privacy and confidentiality standards.

The primary mechanism for data retrieval within the client's infrastructure is through SerenityGPT’s user interface. Accessible via a web browser within the client’s local network, this interface is the sole portal through which data can be actively retrieved and interacted with, ensuring a secure and contained environment for data access.

Optional Microsoft Azure API sharing: With prior agreement and consent from the client, selected and relevant chunks of documents may be shared externally with Microsoft Azure API. This sharing is limited to regions pre-agreed upon with the client, providing an added layer of control and assurance over data localization and compliance. According to Microsoft Azure’s policies, the shared data is exclusively used for generating immediate responses to queries and is not utilized for any other purposes. Data sent to Azure is stored temporarily for a maximum of 30 days, solely to address service abuse claims, and is never used for model training or exported for any other application.

3. Secure data transfer

TLS encryption: SerenityGPT is committed to safeguarding data during transfer, irrespective of the direction of flow. Whether during the importation of data into the system, delivery of data to user browsers within the client’s network, or optional transmission to Microsoft Azure API, all data transfers are securely encrypted using TLS (Transport Layer Security) protocols.

Pre-agreed policies: All data sharing preferences and policies, including those pertaining to Microsoft Azure API usage and monitoring, are established and agreed upon with the client prior to system installation. These preferences serve as a guiding framework for data sharing and are strictly adhered to during the operation of the SerenityGPT service.

VI. Data security

1. On-premises data protection

SerenityGPT implements stringent security measures to safeguard collected data stored on the client's premises. The database, located on the local server, is configured to be accessible solely from within this server, creating a security perimeter that mitigates unauthorized access risks.

2. Encryption practices

At rest: Although the data is protected within the client’s secure infrastructure, additional encryption practices for data at rest can be implemented based on client preferences and requirements to enhance data security.

In transit: TLS encryption is consistently employed during data transfer to ensure the data's confidentiality and integrity are never compromised during transmission.

3. Access control and management

Engineer access: Access by engineers to the server is controlled and monitored. A jump-host mechanism is used for secure access, whereby all actions performed are logged and auditable. This practice ensures accountability and traceability of actions performed on the server.

4. Proactive security practices

Software vulnerability monitoring: SerenityGPT employs Dependabot for real-time monitoring of software dependencies. This tool automatically identifies and helps address vulnerabilities, ensuring that the system is always patched and protected against known software risks.

5. Incident response procedure

In the unlikely event of a data breach or security incident:

5.1 Immediate identification and containment

Once an incident is identified, immediate action is taken to contain and mitigate the incident to prevent further damage.

5.1.1 Short-term containment: Immediately isolate affected systems to prevent further damage (e.g., disconnect network access).
5.1.2 Long-term containment: Implement temporary fixes to allow systems to operate while the root cause is being investigated and eradicated.
5.1.3 Documentation: Document all containment actions taken.

5.2 Investigation and assessment

A thorough investigation is launched to understand the incident's scope and impact, gathering all necessary information for a comprehensive assessment.

5.3 Notification

Affected clients are notified promptly, provided with details of the incident, and advised on steps to take to protect their interests. If necessary, SerenityGPT will also notify the applicable regulatory bodies, such as the Information Commissioner's Office via the Data Protection Officer.

5.4 Remediation and recovery

Measures are implemented to address the vulnerability exploited during the incident, with recovery actions initiated to restore normal service operations.

5.5 Post-incident review

After resolution, a post-incident review is conducted to understand the incident's root cause and learn from the event, with findings used to strengthen future security protocols.

VII. Data retention and deletion

1. Data retention policy

SerenityGPT operates on a dynamic data retention policy, mirroring the data life-cycle practices of clients. Data residing within the system reflects the client's active data repositories. Once data is removed or deleted from the client's systems, corresponding data within SerenityGPT’s service is also scheduled for deletion, maintaining a synchronized data environment.

2. Active deletion mechanism

To facilitate responsive data deletion, SerenityGPT provides clients access to a dedicated API. This API is designed for the active deletion of specific data records, such as support tickets and documents identified by their unique document IDs. Clients can utilize this mechanism to initiate immediate deletion requests for data that has been removed from their primary systems, ensuring timely and accurate data removal within the SerenityGPT environment.

3. Data deletion procedure

Data deletion requests made through the dedicated API or as a result of data removal within client systems are processed immediately by SerenityGPT. The deletion procedure irreversibly removes the specified data records from databases, rendering the data unrecoverable and permanently eliminating access.

4. Conditions for extended retention

SerenityGPT does not impose or practice extended data retention beyond the client's data management policies. The system does not independently retain data for any additional period unless explicitly required and instructed by the client. There are no known conditions under which data retention might be involuntarily extended, as practices are designed to respect and mirror client data management preferences and instructions without deviation.