India enacted the Digital Personal Data Protection Act, 2023 (DPDPA) on August 11, 2023, a comprehensive data protection law culminating from a landmark Supreme Court decision recognizing a constitutional right to privacy in India, and discussions on multiple drafts spanning over half a decade. 1
The law comes at a time when, globally, there has been an exponential growth in artificial intelligence applications and use-cases, including consumer-facing generative AI systems. As a comprehensive data protection law, the DPDPA will significantly impact how organizations use and process personal data, which in turn affects the development and use of AI. Specifically, AI model developers and deployers will need to carefully consider the DPDPA’s regulatory scope concerning the processing of personal data, the limited grounds for processing, the rights of individuals in respect of their personal data, and the possible exemptions available to train and develop AI systems.
While the Central Government has yet to notify subordinate legislation to the DPDPA (the DPDP Rules), which will operationalize key provisions of the law, we can analyze the DPDPA for an early idea of how it could be applied to AI. While the new law may create challenges for AI training and development through its consent-centric regime, it also contains exemptions for publicly available data, exemptions for research, a limited territorial scope, and risk-based approach to the classification of obligations—an overall approach that is likely to significantly shape the development of AI in India.
1. DPDPA’s consent-centric regime may pose challenges for AI training and development
The DPDPA recognises consent and ‘certain legitimate uses’ as the two grounds for processing personal data. Section 7 of the DPDPA specifies scenarios where personal data can be processed without consent. These include situations where the data principal has voluntarily provided their personal data and has not objected to its use for a specific purpose, as well as cases involving natural disasters, medical emergencies, employment-related matters, and the provision of government services and benefits
This means that the DPDPA creates a consent-centric regime for personal data processing. Notably, it does not recognise other alternative legal bases to consent for processing personal data, such as contractual necessity and legitimate interests, that are provided under other leading data protection laws internationally, such as the General Data Protection Regulation (GDPR) in the EU and Brazil’s Lei Geral de Proteção de Dados (LGPD). Previous work by FPF has identified challenges – for both organizations and individuals – in relying on consent as the primary basis for processing, especially in ensuring that it is provided meaningfully. In the context of AI development, FPF’s report on generative AI governance frameworks in the APAC region highlights the challenges of relying on consent for web crawling and scraping (however, this may not be an issue under the DPDPA for publicly available data – see point 2 below). Specifically, without an established legal relationship with the individuals whose data is scraped, it is practically impossible to identify and contact them to obtain their consent.
Certain sector-specific AI applications and generative AI systems that require curated personal data to develop AI models will need to be trained on personal data that is not publicly available. In such a context, data fiduciaries (i.e., “data controllers” or entities that determine the purposes and means of processing personal data) will likely need to rely on consent as the primary ground for processing personal data. As per the DPDPA, data fiduciaries — in this case, AI developers or deployers — must ensure that consent is accompanied by a notice clearly outlining the personal data being sought, the purpose of processing, and the rights available to the data principal. Furthermore, for personal data collected before the enactment of the DPDPA, data fiduciaries are required to provide notice informing the “data principal” (i.e., data subject, or the person whose personal data are collected or otherwise processed).
2. Exemptions for publicly available data could facilitate training AI models on scraped data, but require caution
A significant provision under the DPDPA is the exclusion of publicly available data entirely from the scope of regulation. According to Section 3(c)(ii) of the DPDPA, the DPDPA does not apply to data that is made publicly available by the “data principal” or any other person legally obligated to make the data publicly available.
This blanket exemption goes further than similar provisions in other data protection laws, which, for instance, only exempt organizations from the obligation to obtain individuals’ consent for processing of their personal data, if the data is publicly available. This is the case in Singapore, where Section 13 of the Personal Data Protection Act (PDPA), read with the Act’s First Schedule, exempts organizations from the requirement to obtain consent to process personal data, if the data is publicly available. However, unlike the DPDPA, data protection obligations under PDPA continue to apply even when processing publicly available data.
Similarly, Article 13 of China’s Personal Information Protection Law (PIPL), which, broadly, specifies the grounds for processing personal data, allows the processing of personal data without consent if the data has been disclosed by the individual concerned or has been lawfully disclosed. Such processing must be within reasonable scope and must balance the rights and interests of the individual and the larger public interest.
In Canada, the relevant exemption under the Personal Information Protection and Electronic Documents Act (PIPEDA) only applies to the processing of publicly available information in the circumstances mentioned in the Regulations Specifying Publicly Available Information, SOR/2001-7 (13 December, 2000). The Canadian data protection regulator provides guidance on the interpretation of what could be considered as publicly available.
Of note, the EU’s GDPR does not include any exemptions or even tailored rules applying to publicly available personal data. This is because the whole regulation applies equally to all personal data, including the provisions related to lawful grounds for processing. For instance, with regard to giving notice to data subjects, the GDPR even has a dedicated article that requires notice to be given when personal data was not collected directly from data subjects (Article 14). However, this obligation has an exception where “the provision of such information proves impossible or would involve a disproportionate effort, in particular for processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes”. There is currently an ongoing debate among European regulators on whether processing publicly available personal data particularly under the guise of scraping can be done lawfully without the consent of individuals under the GDPR, with no clear answer yet.2
Globally, the scraping of webpages has come under increased regulatory scrutiny. In August 2023, members of the Global Privacy Assembly’s International Enforcement Cooperation Working Group issued a joint statement urging social media companies and other websites to guard against unlawful scraping of personal information from web pages. In May 2024, the European Union Data Protection Board’s ChatGPT Taskforce, in its report, noted that the automated collection and extraction of certain information from webpages might contain personal data, including sensitive categories of personal data, which could “carry peculiar risks for the fundamental rights and freedoms” of individuals.
Processing of publicly available personal data would not be subject to obligations under the DPDPA to the extent that any personal data contained in the datasets was made publicly available by the data principal or by someone legally required to do so – this may include, for example, personal data from social media platforms and company directories. However, organizations will still need to incorporate appropriate safeguards to ensure that only permissible personal data is scraped and the scraped data does not violate any other applicable laws. At the same time, questions may arise with regard to the applicability of the DPDPA to publicly available personal data that was collected for an initial processing operation, such as training an AI model, but which is not anymore publicly available after being collected.
3. Exemptions for research purposes with clear technical and ethical standards could promote AI research and development
Section 17(2)(b) of the DPDPA also exempts processing of personal data for “research, archiving or statistical purposes” from obligations under the DPDPA. However, this exemption only applies if such processing complies with standards prescribed by the Central Government and is not done to take “any decision specific to a [d]ata [p]rincipal”. To date, the Central Government has not released any standards relating to this provision
By contrast, data protection laws in most jurisdictions do not specifically provide an exemption for processing personal data for research purposes. Instead, they recognize research as a secondary use that does not require a distinct lawful basis for processing than the one originally relied on, or permit non-consensual processing for research, subject to certain conditions.
For instance, in the EU, under the GDPR, secondary use of personal data for archiving, statistical, or scientific research purposes is permissible, provided that ‘appropriate safeguards’ are in place to protect the rights of the data subject. These safeguards include technical and organizational measures aimed at ensuring data minimization. Furthermore, the GDPR allows the processing of sensitive categories of personal data when necessary for scientific or historical research purposes.
In Japan, the Act on the Protection of Personal Information (APPI) exempts consent requirements, in cases of secondary collection and use of personal data, if the data is obtained from an academic research institution and processed jointly with that institution. However, such processing must not be solely for commercial purposes and must not infringe upon the individual’s rights and interests.
In Singapore, the PDPA provides a limited additional basis for the use, collection, and disclosure of personal data for research purposes, if the organization can satisfy the following conditions: (a) the research purpose requires personally identifiable information; (b) there is a clear public benefit to the research; (c) the research results will not be used to make decisions affecting individuals; and (d) the published results do not identify individuals.
It is unclear at this stage if the research exemption under the DPDPA will extend to only academic institutions or also extend to private entities that engage in research. While such an exemption, with clearly outlined standards, could help create quality data sets for model development, it is crucial to have clearly defined technical and ethical standards that can prevent privacy harms.
4. Limited nature of DPDPA’s territorial scope may allow offshore providers of AI systems to engage in unregulated processing of personal data of data principals in India
Like many other global data protection frameworks, the DPDPA has extraterritorial applicability. Section 3(b) of the DPDPA indicates that the DPDPA applies to entities that process personal data outside India, if such processing is connected to any activity which is related to the offering of “goods or services” to data principals in India.
This provision is narrower in scope than similar provisions under other global data protection laws. For example, the GDPR, unlike the DPDPA, also applies extraterritorially to processing which involves ”the monitoring of behavio(u)r” of data subjects within the European Union. In fact, data protection authorities in Europe have fined foreign entities for unlawfully processing the personal data of EU residents, even when those entities have no presence in the region. Of note, under the EU’s AI Act, AI systems used in high-risk use cases3 “should be considered to pose significant risks of harm to the health, safety or fundamental rights if the AI system implies profiling” as defined by the GDPR (Recital 53), linking thus engaging in “profiling” as a component of an AI system to heightened risks to the rights of individuals. Interestingly, the Personal Data Protection Bill, 2019, which was introduced in Indian Parliament and withdrawn in 2022, and the Joint Parliamentary Committee’s version of the data protection bill also extended extraterritorial applicability to any processing that involved the “profiling of data principals within the territory of India”.
This narrower scope permits offshore providers of AI systems, which do not provide goods and services to data principals in India, to profile and monitor the behavior of data principals in India without being subject to any obligations following from the DPDPA. Additionally, such companies may engage in unregulated scraping of publicly available data to train their AI systems, beyond the exception explored above. As highlighted in point 2, publicly available personal data that has not been made available by the data principal or by any other person under a legal obligation still falls under the DPDPA’s scope of regulation. This could include personal data shared by others on blog pages, social media websites, or in public directories, among others. Compliance with the DPDPA obligations in these scenarios does not extend to offshore organizations, as long as they do not engage in activities related to offering goods or services in India.
For the same types of data, all other data fiduciaries must ensure that the data is processed based on permissible grounds and is protected by appropriate security safeguards. Additionally, for personal data collected through consent, data fiduciaries must ensure that data principals are afforded the rights to access, correct, or erase their personal data held by the fiduciary.
5. Classification of significant data fiduciaries with objective criteria would allow a balanced and risk-based approach to data protection obligations relevant to AI systems
The DPDPA adopts a risk-based approach to imposing obligations by introducing a category of data fiduciaries known as ‘Significant Data Fiduciaries’ (SDFs). The DPDPA empowers the Central Government to designate any data fiduciary or class of data fiduciaries as a SDF based on the following factors:
- The volume and sensitivity of personal data processed;
- The risk posed to the rights of data principals);
- The potential impact on the sovereignty and integrity of India;
- Risk to electoral democracy;
- Security of the state; and
- Public order.
In addition to complying with the obligations for data fiduciaries, SDFs are required to:
- appoint a resident Data Protection Officer who will serve as the primary point of contact for grievance resolution under the mandatory grievance redressal mechanism;
- designate an independent data auditor to conduct regular audits, ensure compliance with data protection obligations, and carry out periodic Data Protection Impact Assessments (DPIA).
The DPIA obligation is particularly relevant to identifying and mitigating risks to privacy and other rights that may be impacted by processing of personal data in the context of training or deploying an AI system.
The Central Government also has the powers to impose additional obligations on SDFs. On the other hand, the Central Government is also empowered to remove notice, data retention limitation, accuracy and obligations for certain data fiduciaries or a class of data fiduciaries, “including startups”.
It is important to note that the DPDPA does not specify objective criteria, such as the categories of personal data that may be considered sensitive, or the volume of data or users required for the classification of SDFs or the easing of certain obligations for data fiduciaries. In the absence of these specific quantitative thresholds, the classification of AI driven companies could be influenced by the Central Government’s perception of the potential threats posed by specific AI applications.
Conclusion
With the AI market in India growing at 25-35% annually and projected to reach a market size of around $17 billion by 2027, the Indian government has recognized this opportunity by allocating over $1.2 billion for the IndiaAI Mission, aimed at developing domestic capabilities to boost the growth of AI in the country. As AI continues to evolve and integrate into various sectors, the DPDPA provides a crucial framework that will influence how organizations develop and deploy AI technologies in India. The law’s exemptions for publicly available data, its over-reliance on consent, and a graded approach to obligations for data fiduciaries present both opportunities and challenges.
The provisions of the DPDPA will only take effect once the government issues a notification under Section 1(2) of the DPDPA. The forthcoming DPDP Rules are expected to clarify and operationalize key aspects of the Act. These include the form and manner of providing notices, breach notification procedures, how data principals can exercise their rights under the DPDPA, and the provisions on procedure and operations of the Data Protection Board. The effectiveness of the law in balancing privacy protections, preventing harms, on one hand, and harnessing the benefits that AI could bring for people and society, on the other hand, will become clearer once these rules are in place.
Edited by: Gabriela Zanfir-Fortuna, Josh Lee Kok Thong, and Dominic Paulger
- You can refer to FPF’s previous blogs (here and here) for a brief history and overview of the DPDPA. ︎
- See, for instance, Report of the work undertaken by the ChatGPT Taskforce of the EDPB, May 2024, paras. 15 to 19, and the Dutch Data Protection Authority’s Guidelines on the scraping of web data. ︎
- As identified in Annex III of the regulation. ︎