Sfetcu, Nicolae (2023), Big Data Processing: Unleashing the Potential of Data, IT & C, 2:4, DOI: 10.58679/IT90870, https://www.internetmobile.ro/big-data-processing-unleashing-the-potential-of-data/
In today’s digital age, the world is generating an unprecedented amount of data. This deluge of data encompasses information from various sources such as social media, IoT devices, sensors, and more. The use of big data presents significant legal problems, especially in terms of data protection. The existing legal framework of the European Union based in particular on the Directive no. 46/95/EC and the General Regulation on the Protection of Personal Data provide adequate protection. But for Big Data, a comprehensive and global strategy is needed.
Procesarea datelor mari: dezlănțuirea potențialului datelor
În era digitală de astăzi, lumea generează o cantitate fără precedent de date. Acest potop de date cuprinde informații din diverse surse, cum ar fi rețelele sociale, dispozitivele IoT, senzori și multe altele. Utilizarea Big Data prezintă probleme juridice semnificative, în special în ceea ce privește protecția datelor. Cadrul legal existent al Uniunii Europene bazat in special pe Directiva nr. 46/95/CE și Regulamentul general privind protecția datelor cu caracter personal asigură o protecție adecvată. Dar pentru Big Data, este nevoie de o strategie cuprinzătoare și globală.
Cuvinte cheie: big data, Uniunea Europeană, GDPR, prelucrarea datelor, tehnologiile big data, politica de confidențialitate, reglementări UE
IT & C, Volumul 2, Numărul 4, Decembrie 2023, pp. xxx
ISSN 2821 – 8469, ISSN – L 2821 – 8469, DOI: 10.58679/IT90870
2023 Nicolae Sfetcu. Responsabilitatea conținutului, interpretărilor și opiniilor exprimate revine exclusiv autorilor.
Big Data Processing: Unleashing the Potential of Data
 Researcher – Romanian Academy – Romanian Committee for the History and Philosophy of Science and Technology (CRIFST), History of Science Division (DIS)
In today’s digital age, the world is generating an unprecedented amount of data. This deluge of data encompasses information from various sources such as social media, IoT devices, sensors, and more. This data, often referred to as “big data,” holds immense potential for businesses, governments, and researchers. However, harnessing this potential requires effective big data processing techniques. In this essay, we will explore the concept of big data processing, its importance, and its impact on various sectors.
The use of Big Data presents significant legal problems, especially in terms of data protection. The existing legal framework of the European Union based in particular on the Directive no. 46/95/EC and the General Regulation on the Protection of Personal Data provide adequate protection. But for Big Data, a comprehensive and global strategy is needed. The evolution over time was from the right to exclude others to the right to control their own data and, at present, to the rethinking of the right to (digital) identity.
The collection and aggregation of data in Big Data are not subject to data protection regulations, due to new perspectives on confidentiality, with the possibility of specific forms of discrimination.
Big data is characterized by its volume, velocity, variety, and veracity. Volume refers to the vast amount of data generated daily, which is often too large to handle with traditional data processing tools. Velocity reflects the speed at which data is generated and must be processed, sometimes in real-time. Variety indicates that data can be structured or unstructured, coming in various formats like text, images, videos, and more. Veracity points to the need for accurate and reliable data in big data analysis.
In 2014, Podesta’s report concluded that “big data analytics have the potential to eclipse longstanding civil rights protections in how personal information is used in housing, credit, employment, health, education, and the marketplace.” It follows that new specific ways of protecting citizens are needed, because the legal framework, although theoretically applicable, does not seem to provide adequate and full protection.
Purposes of data processing
Big data processing allows companies to personalize their products and services. By analyzing customer behavior and preferences, businesses can tailor their offerings to individual needs, enhancing customer satisfaction and loyalty. In the healthcare sector, big data processing has revolutionized patient care. By analyzing patient records, medical history, and real-time data from wearable devices, healthcare providers can diagnose diseases early, predict outbreaks, and improve treatment protocols. Researchers in various fields, from genomics to climate science, rely on big data processing to analyze and interpret complex datasets. This has led to breakthroughs in understanding diseases, predicting natural disasters, and advancing our knowledge of the world. Also, big data processing plays a pivotal role in building smart cities. By collecting and analyzing data from sensors and IoT devices, urban planners can optimize traffic flow, reduce energy consumption, and enhance public safety.
Anonymous and aggregate data can be processed to identify the behavior of certain categories of consumers. For this purpose, the data operator performs anonymization and then transfers them to a third party using them.
Principles of data processing
Data processing is based on the following principles set out in Article 5 of the GDPR:
- Purpose limitation: Data collectors must inform the data subject about the purposes of data collection, which can be further processed for those purposes only.
- Data minimization: Only personal data relevant to the stated purposes will be collected.
- Accuracy and updating: The data will be updated and rectified whenever required by the stated purpose. In the case of Big Data, the right of users to cancel or delete personal data is very important.
- Limitation of storage: Data will be stored only during processing and subsequently destroyed. The duration of storage may be extended to the extent that the data are archived for public interest, scientific or historical research or statistical purposes.
- Integrity and confidentiality: the data operator: Ensure adequate security for personal data through technical and organizational measures.
Big data techniques and technologies
To handle big data effectively, various processing techniques and technologies have emerged:
- Distributed computing frameworks like Apache Hadoop and Apache Spark distribute data processing tasks across multiple nodes, enabling parallel processing and scalability.
- Cloud platforms like Amazon Web Services (AWS) and Microsoft Azure provide scalable and cost-effective solutions for storing and processing big data.
- Machine learning algorithms can analyze big data to identify patterns and make predictions, while AI technologies like natural language processing (NLP) can extract insights from unstructured data.
- Data warehousing solutions like Snowflake and Google BigQuery facilitate the storage and retrieval of large datasets for analysis.
The General Data Protection Regulation, “GDPR” (Regulation EU 2016/679) deals with data protection and privacy of persons in the European Union and the European Economic Area. It specifically addresses the export of personal data outside EU and EEA areas. The GDPR intends to simplify the regulatory environment by unifying the regulation within the EU.
GDPR applies in two cases for the processing of personal data: (a) access to goods or services for a fee by persons in the EU, or (b) monitoring their behavior within the EU. Thus, the regulation allows it to be extended to all Internet service providers, even if they are not established in the EU. More generally, GDPR applies to all large data aggregators, regardless of geographical or physical connections.
Stages of processing of personal data
Big data processing enables organizations to make informed decisions based on data-driven insights. By analyzing large datasets, businesses can identify trends, patterns, and correlations that may have otherwise gone unnoticed. This is crucial for making strategic decisions, optimizing operations, and staying competitive in the market.
The processing of personal data is defined in Article 4, paragraph 2, as “any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.”
Big Data includes several personal data processing activities, each with its own specific rules:
- data collection
- data storage
- data aggregation
- data analysis and use of analysis results
Design and implicit confidentiality
The concepts of privacy by design and implicit confidentiality were not explicitly included in EU regulations. But, according to art. 78 of the GDPR,
“In order to be able to demonstrate compliance with this Regulation, the controller should adopt internal policies and implement measures which meet in particular the principles of data protection by design and data protection by default. Such measures could consist, inter alia, of minimizing the processing of personal data, pseudo-anonymizing personal data as soon as possible, transparency with regard to the functions and processing of personal data, enabling the data subject to monitor the data processing, enabling the controller to create and improve security features. When developing, designing, selecting and using applications, services and products that are based on the processing of personal data or process personal data to fulfil their task, producers of the products, services and applications should be encouraged to take into account the right to data protection when developing and designing such products, services and applications and, with due regard to the state of the art, to make sure that controllers and processors are able to fulfil their data protection obligations.”
The use of Big Data implies at least one paradox: on the one hand, Big Data ensures maximum transparency but at the same time, there is no adequate transparency regarding the use of Big Data. Transparency is a fundamental issue because it influences the ability of a user to allow the disclosure of his information.
Big data processing is a transformative force that empowers organizations and individuals to unlock the value hidden within vast datasets. Its impact extends across sectors, from business and healthcare to research and smart cities. With the continued evolution of big data processing techniques and technologies, we can expect even more profound insights and innovations in the future. Embracing the power of big data is not just an option; it is a necessity for staying competitive and addressing complex challenges in the 21st century.
-  European Economic and Social Committee. 2017. “The Ethics of Big Data: Balancing Economic Benefits and Ethical Questions of Big Data in the EU Policy Context.” European Economic and Social Committee. February 22, 2017. https://www.eesc.europa.eu/en/our-work/publications-other-work/publications/ethics-big-data.
-  Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.
-  Provost, F., & Fawcett, T. (2013). Data science for business: What you need to know about data mining and data-analytic thinking. O’Reilly Media.
-  European Parliament. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA Relevance). OJ L. Vol. 119. http://data.europa.eu/eli/reg/2016/679/oj/eng.
-  Cuzzocrea, A., Song, I. Y., & Davis, K. C. (2011). Analytics over large-scale multidimensional data: The big data revolution! In International Conference on Extending Database Technology (pp. 382-386). Springer.
Acesta este un articol cu Acces Deschis (Open Access) distribuit în conformitate cu termenii licenței de atribuire Creative Commons CC BY SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0/).