Apache Hadoop and Hadoop Distributed File System (HDFS) - Architecture, Use Cases, and Benefits

Apache Hadoop and Hadoop Distributed File System (HDFS) - Architecture, Use Cases, and Benefits

Wojciech Gębiś - May 28, 2023

The exponential surge of data in the 21st century has triggered a radical shift in how we store, process, and leverage information. Traditional data processing systems, often built on monolithic, centralized architectures, are increasingly challenged to handle the volume, velocity, and variety of data in this era of Big Data. This has led to the rise of a new breed of technologies designed to tackle these Big Data challenges. Among these technologies, Apache Hadoop stands as a pioneer, revolutionizing the way we understand and use data.

Apache Hadoop, an open-source software framework, is designed to store and process vast amounts of data across clusters of computers. It provides a scalable and reliable infrastructure, supporting the development and execution of distributed processing of large data sets. Rooted in the principles of fault tolerance, scalability, and data locality, it has become the backbone of many organizations’ data strategies.

This article provides a comprehensive exploration of Hadoop and its distributed file system (HDFS), delving into its architecture, key components, and the broader ecosystem. It will discuss key use cases, benefits, and even the limitations of Hadoop, providing a balanced view. Furthermore, we will examine how Hadoop fits into the modern Big Data infrastructure stack and who is currently using it in the industry. Finally, we will look at Hadoop as a fully-managed service, and what that means for businesses looking to leverage this powerful tool.

Whether you’re a data scientist, a CIO seeking to understand how Hadoop can benefit your organization, or simply a technology enthusiast eager to understand the future of Big Data infrastructure, this article will serve as a comprehensive guide to understanding the world of Hadoop.

TL:DR

Apache Hadoop is a powerful, open-source framework designed for storage and processing of large datasets in a distributed computing environment. It’s built around a concept of using clusters of commodity hardware which results in significant cost savings and high fault-tolerance.

Hadoop’s core components include the Hadoop Distributed File System (HDFS) for data storage and Hadoop MapReduce for data processing. The data in HDFS is divided into blocks and distributed across nodes in a cluster, allowing for parallel processing via MapReduce.

The Hadoop ecosystem encompasses various tools that expand its capabilities, such as HBase for real-time read/write access, Hive for SQL-like querying, Pig for high-level data manipulation, and many others.

Key use cases for Hadoop framework include big data analytics, data warehousing, data lake, and more. Industries from social media to finance and healthcare benefit from Hadoop’s ability to handle petabytes of data.

While Hadoop offers many advantages such as scalability, cost-effectiveness, and flexibility, it also has limitations, including its complexity and the lack of real-time processing.

As part of the Big Data Infrastructure Stack, Hadoop works in conjunction with other systems for data ingestion, storage, processing, analysis, orchestration, exploration, visualization, and machine learning.

Many organizations, from tech giants like Facebook and Twitter to financial firms like J.P. Morgan, use Apache Hadoop for managing and analyzing their data.

Apache Hadoop is also available as a fully-managed service, offered by cloud providers like AWS, Google Cloud, and Microsoft Azure, simplifying the process of setting up, managing, and scaling a Hadoop cluster.

Harnessing the power of Hadoop and Big Data requires the expertise of seasoned professionals. At nexocode, we offer the expertise of our skilled data engineers to develop tailored, scalable data solutions for your specific needs. Contact nexocode to unlock the potential of your data.

The Origins of Hadoop and a Little Bit of Big Data Architecture History

The origins of Hadoop trace back to the early 2000s, when Doug Cutting and Mike Cafarella were working on the open-source project, Nutch. Nutch was a web search engine designed to crawl and search billions of pages on the internet, which brought forth new challenges in handling large volumes of data that were beyond the capabilities of the existing solutions. At the same time, Google published two groundbreaking papers on their technologies, Google File System (GFS) in 2003 and MapReduce in 2004. These technologies were solving the very problems faced by the Nutch team.

Inspired by these papers, Cutting and Cafarella decided to implement similar solutions into Nutch. In 2006, they separated this part of the code and named it Hadoop, after Cutting’s son’s toy elephant. The Apache Software Foundation adopted Hadoop, and by 2008, it had become a top-level Apache project.

Enter Apache Hadoop Project

Hadoop was introduced into a landscape where data was traditionally handled by relational databases and data warehousing solutions. These systems were efficient for structured data but struggled with unstructured and semi-structured data. They also weren’t designed to handle the base “3 Vs” of big data – volume, velocity, and variety – that characterized the new data generation.

Furthermore, traditional systems required high-end hardware and were expensive to scale as data grew. In contrast, Hadoop was designed to run on clusters of commodity hardware, which was cheaper and offered greater scalability.

Hadoop’s introduction was timely and necessary. It brought a paradigm shift from “data to computation” to “computation to data,” where instead of moving large volumes of data across the network, the computation is moved to where the data resides, which is much more efficient.
Hadoop cluster divided into functional layers: distributed storage layer, distributed processing layer and APIs

Hadoop cluster divided into functional layers: distributed storage layer, distributed processing layer and APIs

Apache Hadoop is an open-source software framework that allows for the distributed processing of large datasets across clusters of computers using simple programming models. Developed by the Apache Software Foundation, Hadoop provides a scalable and reliable infrastructure for organizations to build and manage their big data applications.

Hadoop is designed to process large volumes of data by dividing the data into smaller chunks, distributing these chunks across a cluster of computers, and processing them in parallel (distributed processing). This ability to divide and conquer makes Hadoop extremely powerful for handling big data.

Apache Hadoop Architecture and Key Hadoop Modules

Apache Hadoop is developed based on a master-slave architecture, comprising several key modules that work together to process and store large volumes of data. Below are the primary modules:

Hadoop Distributed File System (HDFS)

HDFS is the storage unit of Hadoop and is designed to store data across a distributed environment. It follows a master-slave architecture, consisting of the following key components:

  • NameNode (Master Server): The NameNode manages the file system metadata, such as keeping track of the directory tree of all files in the file system, and the datanodes where the data resides. It’s responsible for maintaining the health of DataNodes, coordinating file reads/writes, and executing operations like opening, closing, and renaming files and directories.
  • DataNode (Slave Server): DataNodes are the workhorses of HDFS. They store and retrieve data blocks when they are told to (by clients or the NameNode), and they report back to the NameNode periodically with lists of blocks that they are storing.

MapReduce

This is a programming model for processing large datasets in parallel. It also follows a master-slave architecture:

  • JobTracker (Master): The JobTracker is responsible for resource management, tracking resource consumption/availability, and job life-cycle management (scheduling, running, and tracking jobs).
  • TaskTracker (Slave): TaskTrackers run tasks as directed by the JobTracker and provide task-status information to the JobTracker periodically.

A multi-node Hadoop cluster with master-slave architecture on MapReduce layer and HDFS layer

A multi-node Hadoop cluster with master-slave architecture on MapReduce layer and HDFS layer

YARN (Yet Another Resource Negotiator)

YARN is the resource management layer of Hadoop, which manages resources in the clusters and schedules tasks to be executed on different cluster nodes. Its key components are:

  • ResourceManager: This is the central authority that arbitrates all the available system resources and thus, manages the distributed applications running on the Hadoop system.
  • NodeManager: Running on individual nodes in the Hadoop cluster, the NodeManager is the per-machine agent who is responsible for containers, monitoring their resource usage and reporting the same to the ResourceManager.
  • ApplicationMaster: There is an ApplicationMaster for each application running on the YARN system. It negotiates resources from the ResourceManager and works with NodeManagers to execute and monitor the tasks.

Hadoop Common

These are Java libraries and utilities required by other Hadoop modules. They provide filesystem and OS-level abstractions and contain the necessary Java files and scripts required to start Hadoop.

Each of these modules in Hadoop’s architecture plays a crucial role in dealing with massive datasets. HDFS provides the mechanism to store data across multiple nodes, MapReduce provides the data processing layer, YARN helps in managing resources, and Hadoop Common includes the Java libraries and utilities needed by different Hadoop modules.

How Do HDFS and MapReduce Jobs Work Together?

Hadoop Distributed File System (HDFS) and MapReduce are intimately connected as they are the two core components of Apache Hadoop. HDFS is used for storing data, while MapReduce is used for processing that data. Here’s how they work together:

Data Storage

Data is stored in HDFS, which breaks down large data files into smaller blocks (default size of 128 MB in Hadoop 2.x and 3.x, and 64 MB in Hadoop 1.x) and stores these blocks across different nodes in the Hadoop cluster. This block-based storage allows for large scale and distributed data storage.

Data Replication

HDFS automatically replicates each data block across multiple nodes (default replication factor is 3) to ensure data is reliably stored. This replication also allows processing to be done on any node containing the required data, increasing the flexibility and speed of processing.

Data Processing and MapReduce Jobs

When a MapReduce job is initiated, the Map tasks are distributed by the Master node (JobTracker) to the Slave nodes (TaskTrackers) where the required data resides. This principle of data locality optimizes processing speed by reducing the need to transfer data across the network.

Here’s an overview of how the MapReduce engine works:

  1. Map Task: The input data is divided into independent chunks which are processed by the Map function. The role of the Map function is to transform the complex data into key-value pairs.
    • Input Reader: The Input Reader divides the input data into appropriate size ‘splits’ (for example, the lines of a document), and the map function works on these splits.
    • Map Function: Each split is processed by a Map function, which takes a set of input key-value pairs (usually raw data) and processes each pair to generate a set of intermediate key-value pairs.
  2. Shuffling and Sorting: This is an intermediate step where the output from the map task is taken as input. The key-value pairs from the Map function output are sorted and then grouped by their key values.
  3. Reduce Task: The Reduce function takes these sorted and grouped intermediate key-value pairs from the map output as input and combines them to achieve a smaller set of tuples.
    • Reduce Function: The Reduce function is applied for each unique key in the grouped data. It processes the values related to a unique key and generates a smaller set of key-value pairs as output.
  4. Output Writer: The final output is stored in the HDFS (Hadoop Distributed File System).

Apache Hadoop MapReduce process

Apache Hadoop MapReduce process

Apache Hadoop Ecosystem

In addition to these, the Hadoop ecosystem includes other tools to enhance its capabilities:

Apache Hive

Hive is a data warehousing tool built atop Hadoop for data summarization, analysis, and querying. It simplifies the complexity of Hadoop, allowing users to use a SQL-like language (HiveQL) to query data stored in the Hadoop ecosystem, making big data analysis more accessible. It’s primarily used for batch processing, data mining, and analytics on large-scale datasets. Read more about apache hive in our article here.

HBase

HBase is an open-source, non-relational, distributed database modeled after Google’s Big Table and is written in Java. It is developed as part of Apache Software Foundation’s Hadoop project and runs on top of HDFS, providing Bigtable-like capabilities for Hadoop. HBase provides a fault-tolerant way of storing large quantities of sparse data, and it is used for real-time read/write access to Big Data.

Apache Pig

Pig is a high-level platform for creating MapReduce programs used with Hadoop. The language for this platform is called Pig Latin. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation that makes MapReduce programming high-level, similar to that of SQL for RDBMS systems.

Apache Solr

Apache Solr is an open-source search platform built on Apache Lucene. It’s reliable, scalable, and fault-tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration, and more. Solr powers the search and navigation features of many of the world’s largest internet sites.

Apache Storm

Apache Storm is a real-time data processing system. It is designed to process vast amounts of data in real-time and can handle high-reliability requirements, even in the case of failures. It can be used with any programming language and is often used in real-time analytics, online machine learning, continuous computation, distributed RPC, ETL, and more.

Apache ZooKeeper

Apache ZooKeeper is a centralized service for maintaining configuration information, naming, distributed synchronization, and group services. All these kinds of services are used in some form or another by distributed applications.

Apache Sqoop

Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured data stores such as relational databases. Sqoop uses MapReduce to import and export the data, providing parallel operation and fault tolerance.

Apache Tez

Apache Tez is an extensible framework for building high-performance batch and interactive data processing applications coordinated by YARN in Apache Hadoop. It improves the MapReduce paradigm by dramatically improving its speed while maintaining MapReduce’s ability to scale to petabytes of data. Important Hadoop-related projects like Hive and Pig use Apache Tez.

Each of these tools has its own unique place in the Hadoop ecosystem, and they can be used together to create robust and scalable Big Data solutions.

Hadoop Ecosystem

Hadoop Ecosystem

Apache Spark

Apache Spark is an open-source, distributed computing system used for big data processing and analytics. Spark provides an interface for programming entire clusters with implicit data parallelism and fault tolerance. It can be deployed as a standalone system, but it’s often used in conjunction with Hadoop, leveraging HDFS for data storage and YARN for cluster resource management. Compared to Hadoop’s MapReduce, Spark boasts a much faster processing engine and is capable of performing advanced analytics operations, including machine learning and graph processing. Read more about architecture, use cases, and benefits of Spark.

Key Use Cases for Hadoop

Hadoop is widely used across industries due to its ability to store, process, and analyze vast amounts of data. Here are some key use cases for Hadoop:

Data Warehousing and Analytics: Hadoop can be used as a cost-effective and scalable data warehouse for data storage. It supports structured data storage as well storing large amounts of unstructured data. It allows businesses to gain insights with big data processing from data sources such as social media, log files, etc.

Data Archiving: Hadoop is also used for archiving old data. It simplifies data management as the old data can be moved into Hadoop’s distributed file system (HDFS), where it can be stored economically and efficiently and quickly retrieved if needed.

Sentiment Analysis: Businesses use Hadoop to analyze customer sentiment and feedback from social media, reviews, surveys, and other sources to improve their products and services, and enhance customer satisfaction.

Risk Management: In the finance industry, Hadoop is used to calculate risk models and make critical business decisions. It can rapidly process vast amounts of data to identify potential risks and returns.

Personalized Marketing: Retailers and eCommerce businesses use Hadoop to analyze customer data and market trends to offer personalized recommendations and advertisements to customers.

Healthcare and Natural Sciences: In healthcare, Hadoop is used to predict disease outbreaks, improve treatments, and lower healthcare costs by analyzing patient data, disease patterns, and research data. Data scientists from various scientific fields, such as genomics, climate studies, and physics, leverage Hadoop for processing and analyzing large datasets for research.

Internet of Things (IoT): With the growth of IoT devices generating massive amounts of data, Hadoop is used to store, process, and analyze this data to gain insights and improve decision-making.

Advantages of Using Apache Hadoop

Apache Hadoop brings several advantages in dealing with big data, and it’s an essential tool in the field of data analysis and computation. Here are some key advantages:

  1. Scalability: Hadoop is designed to scale up from a single server to thousands of machines, each providing local computation and storage. As your data grows, you can easily expand your system by adding more nodes to the cluster.
  2. Cost-Effective: Hadoop offers a cost-effective storage solution for businesses’ exploding data sets. The problem of expensive data storage is tackled by distributing the data across a cluster of commodity hardware servers.
  3. Flexibility: Unlike traditional relational databases, you don’t have to preprocess data before storing it. You can store as much data as you want and decide how to use it later. This includes unstructured data like text, images, and videos.
  4. Resilience to Failure: A key advantage of using Hadoop is its fault tolerance. When data is sent to an individual node, that data is replicated to other nodes in the cluster, which means that in the event of failure, another copy is available for use.
  5. Data Locality: Hadoop works on the principle of data locality, where computation is moved to data instead of data to computation. This principle helps in the faster data processing.
  6. Open Source: Being an open-source project, the Hadoop framework is free to use, and it benefits from the collective contributions of a global community of developers who continually work on improvements and updates.
  7. Ease of Use: Despite dealing with complex data processing tasks, Apache Hadoop comes with easy-to-use tools (like Hive, Pig, etc.) that abstract the complexity of underlying tasks and offer a simplified interface for interacting with the stored data.
  8. Parallel Processing: Hadoop is designed to process data in parallel, which means large data sets can be processed quicker as data is divided and conquered by multiple machines working together.

Limitations of Apache Hadoop

Remember that while Hadoop has many advantages, it is not always the best solution for every big data problem, and other systems might be more suitable depending on a project’s specific requirements or constraints. Here are some key limitations you may want to consider:

  1. Not suited for small data: Hadoop is designed for large scale data processing. If the data volume is small, the heavy I/O operations can outweigh the benefits of parallel processing, making traditional relational databases a more suitable choice.
  2. Poor support for small files: Hadoop is not well-suited for managing many small files. Each file in HDFS is divided into blocks, and the default block size is 64MB (in Hadoop 1) or 128MB (in Hadoop 2 and 3). Storage is wasted if the file is significantly smaller than the block size. Furthermore, each file, directory, and block in HDFS is represented as an object in the NameNode’s memory, each of which consumes about 150 bytes. So, if there are a lot of files, this can quickly take up a significant amount of memory.
  3. Latency: Hadoop is not designed for real-time processing. Its batch-processing model works best with a large amount of data and is unsuited for tasks requiring real-time analysis.
  4. Complexity: While tools like Hive and Pig offer higher-level abstractions to make Hadoop more accessible, there’s still a steep learning curve associated with Hadoop, especially when it comes to setup, configuration, and maintenance.
  5. Data Security: Though improvements have been made in recent years, data security in Hadoop can be a challenge. Features like encryption and user authentication have been added, but they’re not as robust as in some other database systems.
  6. Lack of multi-version concurrency control (MVCC): Hadoop does not support MVCC, which allows multiple users to access data simultaneously. This can be a limitation in scenarios where multiple users need to access and modify the same data.
  7. No support for ad-hoc queries: Hadoop is a batch-processing system and doesn’t process records individually, so it’s not designed for workloads that need to do ad-hoc queries on low latency data.
  8. No real-time updates: Hadoop doesn’t provide real-time updating of data. Data can be appended to existing files, but it can’t be updated. This limits its usability for use cases that require real-time data processing.

Remember that many of these limitations are the trade-off for the advantages that Hadoop provides, like the ability to process and store large amounts of data across a distributed system. Additionally, many complementary technologies in the Hadoop ecosystem (like HBase, Storm, etc.) aim to address some of these issues.

Tabular comparison of Hadoop, Spark and Kafka with all important features and distinctions of each framework

Tabular comparison of Hadoop, Spark and Kafka with all important features and distinctions of each framework

Apache Hadoop as Part of the Big Data Infrastructure Stack

In the context of Big Data infrastructure, Apache Hadoop forms a core part of the stack due to its ability to store and process vast amounts of data in a distributed fashion. Its role is even more significant when it’s integrated with other technologies to build comprehensive data solutions.

Big data architecture with Kafka, Spark, Hadoop, and Hive

Big data architecture with Kafka, Spark, Hadoop, and Hive

Hadoop often co-exists and interacts with a multitude of other systems in the data landscape, each playing a part in the larger scheme of data ingestion, storage, processing, and analysis:

Data Ingestion

Tools like Apache Kafka, Flume, and Sqoop are used to ingest data into the Hadoop system. These tools can handle both streaming and batch data, complementing Hadoop’s batch processing nature.

Data Storage and Processing

While HDFS acts as the primary data storage system, Hadoop integrates with NoSQL databases such as HBase and Cassandra for real-time data access and processing needs.

Data Analysis

For data analysis, Hadoop works in conjunction with tools like Apache Hive and Pig, which offer SQL-like interfaces for querying the data stored in Hadoop. Moreover, integration with Apache Spark, another open-source data computation framework, allows for advanced analytics, machine learning, and real-time processing capabilities.

Data Exploration and Visualization

Tools such as Apache Drill allow analysts to explore data stored in Hadoop using SQL-like queries. Once the data is prepared, it can be visualized using business intelligence tools like Tableau, Looker, and others, which can connect to Hadoop.

Machine Learning

Hadoop’s ability to store and process large datasets makes it a great platform for big data analytics and machine learning. Tools like Apache Mahout and MLlib (part of Spark) provide machine learning libraries that integrate well with Hadoop.

Big data architecture based on Kafka, Hadoop, Spark and other frameworks and DBs

Big data architecture based on Kafka, Hadoop, Spark and other frameworks and DBs

Who is using Apache Hadoop Project?

Apache Hadoop is employed by a wide range of organizations worldwide, from tech giants to financial firms and research institutions. For instance, Facebook, an early adopter of Hadoop, used it to manage massive user-generated data and developed Hive for querying it. Yahoo! also extensively utilizes Hadoop for spam detection and content personalization applications. Twitter leverages Hadoop to analyze user behavior. Retail giants like Amazon and Alibaba use Hadoop for personalized recommendations and supply chain optimization. Financial firms like J.P. Morgan employ Hadoop for risk management and fraud detection. Institutions like the European Bioinformatics Institute (EBI) use Hadoop for large-scale data analysis in scientific research and healthcare. Thus, Hadoop’s usage is vast and varied, underlining its capabilities in handling big data.

Apache Hadoop as a Fully-Managed Service

As the volumes of data continue to grow exponentially, the need for powerful data processing frameworks like Apache Hadoop becomes increasingly essential. However, managing a Hadoop cluster can be complex, time-consuming, and require a high level of expertise, leading to fully-managed Hadoop services brought by cloud solutions providers.

A fully-managed Hadoop service simplifies the process of setting up, managing, and scaling a Hadoop cluster. These services are often provided by cloud providers and include automated deployment, configuration, cluster management, and troubleshooting. They also handle data backup, recovery, and software patching tasks. This allows developers and data scientists to focus more on data analysis and less on system administration.

Examples of such services include Amazon EMR (Elastic MapReduce), Google Cloud Dataproc, and Microsoft Azure HDInsight. These platforms offer fully-managed Hadoop services, allowing users to easily process big data without having to manage the underlying infrastructure. They also provide integration with other services in their respective ecosystems, further enhancing the capabilities of Hadoop.

Conclusion

In conclusion, Apache Hadoop has revolutionized the way we handle big data. Its robust architecture and the vast ecosystem of tools make it a powerful platform for storing and processing large data sets, helping organizations to uncover valuable insights from their data. Hadoop’s ability to scale across potentially thousands of servers, its resilience to failure, and its cost-effectiveness make it an invaluable tool in today’s data-driven world.

Whether you’re considering setting up a Hadoop cluster or opting for a fully-managed service, the journey towards a successful big data solution requires skilled professionals who understand the complexities of Hadoop and its ecosystem.

At nexocode, we have a team of experienced data engineers who specialize in developing scalable data solutions tailored to your specific needs. We have the expertise to guide you through the process of implementing and managing Hadoop, helping you unlock the potential of your data. We invite you to contact our data engineering experts to learn how we can assist you in harnessing the power of Apache Hadoop and big data. Let us help you transform your data into actionable insights that drive business growth.

About the author

Wojciech Gębiś

Wojciech Gębiś

Project Lead & DevOps Engineer

Linkedin profile Twitter Github profile

Wojciech is a seasoned engineer with experience in development and management. He has worked on many projects and in different industries, making him very knowledgeable about what it takes to succeed in the workplace by applying Agile methodologies. Wojciech has deep knowledge about DevOps principles and Machine Learning. His practices guarantee that you can reliably build and operate a scalable AI solution.
You can find Wojciech working on open source projects or reading up on new technologies that he may want to explore more deeply.

Would you like to discuss AI opportunities in your business?

Let us know and Dorota will arrange a call with our experts.

Dorota Owczarek
Dorota Owczarek
AI Product Lead

Thanks for the message!

We'll do our best to get back to you
as soon as possible.

This article is a part of

Becoming AI Driven
97 articles

Becoming AI Driven

Artificial Intelligence solutions are becoming the next competitive edge for many companies within various industries. How do you know if your company should invest time into emerging tech? How to discover and benefit from AI opportunities? How to run AI projects?

Follow our article series to learn how to get on a path towards AI adoption. Join us as we explore the benefits and challenges that come with AI implementation and guide business leaders in creating AI-based companies.

check it out

Becoming AI Driven

Insights on practical AI applications just one click away

Sign up for our newsletter and don't miss out on the latest insights, trends and innovations from this sector.

Done!

Thanks for joining the newsletter

Check your inbox for the confirmation email & enjoy the read!

This site uses cookies for analytical purposes.

Accept Privacy Policy

In the interests of your safety and to implement the principle of lawful, reliable and transparent processing of your personal data when using our services, we developed this document called the Privacy Policy. This document regulates the processing and protection of Users’ personal data in connection with their use of the Website and has been prepared by Nexocode.

To ensure the protection of Users' personal data, Nexocode applies appropriate organizational and technical solutions to prevent privacy breaches. Nexocode implements measures to ensure security at the level which ensures compliance with applicable Polish and European laws such as:

  1. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (published in the Official Journal of the European Union L 119, p 1); Act of 10 May 2018 on personal data protection (published in the Journal of Laws of 2018, item 1000);
  2. Act of 18 July 2002 on providing services by electronic means;
  3. Telecommunications Law of 16 July 2004.

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet.

1. Definitions

  1. User – a person that uses the Website, i.e. a natural person with full legal capacity, a legal person, or an organizational unit which is not a legal person to which specific provisions grant legal capacity.
  2. Nexocode – NEXOCODE sp. z o.o. with its registered office in Kraków, ul. Wadowicka 7, 30-347 Kraków, entered into the Register of Entrepreneurs of the National Court Register kept by the District Court for Kraków-Śródmieście in Kraków, 11th Commercial Department of the National Court Register, under the KRS number: 0000686992, NIP: 6762533324.
  3. Website – website run by Nexocode, at the URL: nexocode.com whose content is available to authorized persons.
  4. Cookies – small files saved by the server on the User's computer, which the server can read when when the website is accessed from the computer.
  5. SSL protocol – a special standard for transmitting data on the Internet which unlike ordinary methods of data transmission encrypts data transmission.
  6. System log – the information that the User's computer transmits to the server which may contain various data (e.g. the user’s IP number), allowing to determine the approximate location where the connection came from.
  7. IP address – individual number which is usually assigned to every computer connected to the Internet. The IP number can be permanently associated with the computer (static) or assigned to a given connection (dynamic).
  8. GDPR – Regulation 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of individuals regarding the processing of personal data and onthe free transmission of such data, repealing Directive 95/46 / EC (General Data Protection Regulation).
  9. Personal data – information about an identified or identifiable natural person ("data subject"). An identifiable natural person is a person who can be directly or indirectly identified, in particular on the basis of identifiers such as name, identification number, location data, online identifiers or one or more specific factors determining the physical, physiological, genetic, mental, economic, cultural or social identity of a natural person.
  10. Processing – any operations performed on personal data, such as collecting, recording, storing, developing, modifying, sharing, and deleting, especially when performed in IT systems.

2. Cookies

The Website is secured by the SSL protocol, which provides secure data transmission on the Internet. The Website, in accordance with art. 173 of the Telecommunications Act of 16 July 2004 of the Republic of Poland, uses Cookies, i.e. data, in particular text files, stored on the User's end device.
Cookies are used to:

  1. improve user experience and facilitate navigation on the site;
  2. help to identify returning Users who access the website using the device on which Cookies were saved;
  3. creating statistics which help to understand how the Users use websites, which allows to improve their structure and content;
  4. adjusting the content of the Website pages to specific User’s preferences and optimizing the websites website experience to the each User's individual needs.

Cookies usually contain the name of the website from which they originate, their storage time on the end device and a unique number. On our Website, we use the following types of Cookies:

  • "Session" – cookie files stored on the User's end device until the Uses logs out, leaves the website or turns off the web browser;
  • "Persistent" – cookie files stored on the User's end device for the time specified in the Cookie file parameters or until they are deleted by the User;
  • "Performance" – cookies used specifically for gathering data on how visitors use a website to measure the performance of a website;
  • "Strictly necessary" – essential for browsing the website and using its features, such as accessing secure areas of the site;
  • "Functional" – cookies enabling remembering the settings selected by the User and personalizing the User interface;
  • "First-party" – cookies stored by the Website;
  • "Third-party" – cookies derived from a website other than the Website;
  • "Facebook cookies" – You should read Facebook cookies policy: www.facebook.com
  • "Other Google cookies" – Refer to Google cookie policy: google.com

3. How System Logs work on the Website

User's activity on the Website, including the User’s Personal Data, is recorded in System Logs. The information collected in the Logs is processed primarily for purposes related to the provision of services, i.e. for the purposes of:

  • analytics – to improve the quality of services provided by us as part of the Website and adapt its functionalities to the needs of the Users. The legal basis for processing in this case is the legitimate interest of Nexocode consisting in analyzing Users' activities and their preferences;
  • fraud detection, identification and countering threats to stability and correct operation of the Website.

4. Cookie mechanism on the Website

Our site uses basic cookies that facilitate the use of its resources. Cookies contain useful information and are stored on the User's computer – our server can read them when connecting to this computer again. Most web browsers allow cookies to be stored on the User's end device by default. Each User can change their Cookie settings in the web browser settings menu: Google ChromeOpen the menu (click the three-dot icon in the upper right corner), Settings > Advanced. In the "Privacy and security" section, click the Content Settings button. In the "Cookies and site date" section you can change the following Cookie settings:

  • Deleting cookies,
  • Blocking cookies by default,
  • Default permission for cookies,
  • Saving Cookies and website data by default and clearing them when the browser is closed,
  • Specifying exceptions for Cookies for specific websites or domains

Internet Explorer 6.0 and 7.0
From the browser menu (upper right corner): Tools > Internet Options > Privacy, click the Sites button. Use the slider to set the desired level, confirm the change with the OK button.

Mozilla Firefox
browser menu: Tools > Options > Privacy and security. Activate the “Custom” field. From there, you can check a relevant field to decide whether or not to accept cookies.

Opera
Open the browser’s settings menu: Go to the Advanced section > Site Settings > Cookies and site data. From there, adjust the setting: Allow sites to save and read cookie data

Safari
In the Safari drop-down menu, select Preferences and click the Security icon.From there, select the desired security level in the "Accept cookies" area.

Disabling Cookies in your browser does not deprive you of access to the resources of the Website. Web browsers, by default, allow storing Cookies on the User's end device. Website Users can freely adjust cookie settings. The web browser allows you to delete cookies. It is also possible to automatically block cookies. Detailed information on this subject is provided in the help or documentation of the specific web browser used by the User. The User can decide not to receive Cookies by changing browser settings. However, disabling Cookies necessary for authentication, security or remembering User preferences may impact user experience, or even make the Website unusable.

5. Additional information

External links may be placed on the Website enabling Users to directly reach other website. Also, while using the Website, cookies may also be placed on the User’s device from other entities, in particular from third parties such as Google, in order to enable the use the functionalities of the Website integrated with these third parties. Each of such providers sets out the rules for the use of cookies in their privacy policy, so for security reasons we recommend that you read the privacy policy document before using these pages. We reserve the right to change this privacy policy at any time by publishing an updated version on our Website. After making the change, the privacy policy will be published on the page with a new date. For more information on the conditions of providing services, in particular the rules of using the Website, contracting, as well as the conditions of accessing content and using the Website, please refer to the the Website’s Terms and Conditions.

Nexocode Team

Close

Want to unlock the full potential of Artificial Intelligence technology?

Download our ebook and learn how to drive AI adoption in your business.

GET EBOOK NOW