ScholarNest Blogs
Unique and New Ways to Create Spark Dataframe | Scholarnest
- By ScholarNest
- On March 19, 2023
- Tag : Apache, Apache Spark, Dataframe, Spark
Spark Dataframe Apache Spark is a powerful open-source distributed computing framework that provides efficient and scalable processing of large datasets. One of the key features of Spark is its ability to handle structured data using a powerful data abstraction called Spark Dataframe. Spark Dataframe are similar to tables in a relational database, ...
Continue ReadingKnow about Dimensional Data Modelling for Big Data
- By ScholarNest
- On February 23, 2023
- Tag : Data, Dimentional Data modelling
Dimensional Data modelling has been a popular and effective approach for designing data warehouses and enabling business intelligence and analytics for many years. However, with the rise of big data, traditional dimensional modelling techniques face new challenges and limitations. Big data is characterized by its sheer volume, velocity, and variety...
Continue ReadingHow to Remove Duplicate Rows in a Spark Data Frame
- By ScholarNest
- On February 6, 2023
- Tag : Apache, Apache Spark, pyspark, Spark Dataframe
Apache Spark Data Frame API allows you to read data from various sources and creates a Spark Data Frame out of the source data. However, you may have duplicate rows in your Spark Data Frame. Duplicate rows may show up in your Spark Data Frames for various reasons. Your ETL tool that moves data from one place to another place …...
Continue Reading