Introduction to Data engineering

Understanding DS Teams

NOTE I: Over the last few years data science has evolved into a multidisciplinary field with specific specialist roles becoming more important.

With this shift, companies are more and more looking to hire “T”-shaped individuals to join their analytics team.

  • Cross-discipline looks to ensure that theteam speaks the same language whenattempting something new or solving aproblem..
  • Specialization helps to resolve diffi cultproblems that cause bottlenecks in thepipeline as fast as possible.

Analytical process

As a data science team, task will most likely be divided up into two main components:
  • Immediate need of client/manager.
  • The things we would like to be doing.

Deciding how to divide up the work is known as demand-leveling and the traditional approach is to balance resourceand demand based on availability of the team.

  • As we can see by the graph, a new task canonly be assigned to a team member, oncethat task has been completed.

As a data science team, tasks often fall into one of the following two categories:
  • Tasks that serve the immediate needs of clients/managers.

  • The things we would like to be doing.

Deciding how to divide up the work is known as demand-leveling and the traditional approach is to balance resources and demand based on availability of the team.

  • In I-shaped teams, specialization results ina silo-ing of tasks required to solveproblems.
    • Can lead to task backlogs.
  • T-shaped people effectively increase the number of available resources.
    • Affords a more adaptable responseunderpinned by the prioritization oftasks.

Advantages of T-shaped Teams

NOTE I: Use experts to solve bottlenecks and non-experts to provide additional support.

By having a data science team of individuals that are T-shaped, experts can offl oad menial tasks to non-experts thathave the required basic knowledge.

  • This frees up the expert to solve the bottleneck much quicker.
  • Even partial knowledge on how to solve a bottleneck is more valuable than having an expert work on a non-bottleneck problem.
  • The combined effort of all individuals working on tasks together accomplishes more than a silo-ed approach totask management.

So, besides effi ciency what other advantages are there in T-Shaped teams?

Expnand to see additional advantages

Communication among team members is more effi cient
  • In the process up picking up a skill, you will also start to build domain-specifi c language. This ensures that theteam understands different perspectives.
Team members stay interested in what they are doing.
  • What’s boring for one person on the team might be an exciting challenge for another member.
  • A multidisciplinary approach enables learning and a growth mentality among the team.
    • Essential to avoid the paradox of expertise

You’re more attractive to employers.
  • We are building careers in a very dynamic sector where new roles and responsibilities emerge all the time. Cross-competencies make for an adaptable resource, a chracteristic that is very valuable to employers.

Moving to a T-shaped DS Team

Note II Unfortunately, there isn’t a universal cheatcheat…

Assess what skills and knowledge that each individual already has.
  • Rate their ability within these different areas as well as the skills that the team member wants to improve on.
  • Ensure that work is broken down into incremental outcomes across the work streams.

  • This approach ensures that tasks usually require multiple skills.
  • Example: creation of a new indicator. This would entail writing SQLin a programmatic language (Data scientist), compiling the software bundle (Developer) and putting the new version into production (DevOps).
  • This approach also encourages pair-coding which greatly increases knowledge sharing..
  • What are these expert roles?

    Data science hieracrhy of needs.

    What are these expert roles?

    The journey might look a bit different for each individual, but the general career path looks like this:

    Analyst (Business Analyst)

    Note III. The typical data analyst role is consulting-centric….

    • A data analyst typically works as part of an interdisciplinary team to determine and execute the organization’s goals. The analyst mostly gathers data to identify trends that help business leaders make strategic decisions.
    • Analysts usually have a strong economic or business background that is supported by his/her knowledge of statistics and mathematics. As opposed to Data Scientist, analysts are more generalist and tend to bemore flexible in the job market.

    Knowledge of

  • Excel
  • SQL
  • Power BI
  • Responsibilities
  • Pull data from databases.
  • Analyse and forecast trends.
  • Create dashboards using Power BI or Tableau.
  • Descriptive, diagnostic,predictive or prescriptive analytics.
  • Data Scientist

    Note IV. A data scientist is someone who is better at statistics than any software engineer, and better at softwareengineering than any statistician.

    Job Description

    Although it is important for Data Scientists to have a good understanding of business processes, the majority of their work will involve solving complex problems by developing new tools, methods or procedures. While the analyst is closely involved in answering an organization’s business questions on a day to day basis, the DS focuses on a macro level to develop ways to meet those business needs. It is important to develop analyses in a structured way so that they can be automated and scaled if the business requires them on a regular basis.

    This is a much more specialized role and organisations that fully utilize the data scientist skillset can be hard to find.

    Knowledge of

  • R
  • Python
  • Java
  • Scala
  • Responsibilities
  • Data cleaning and wrangling(80%).
  • Building APIs and ETL pipelines
  • Statistical analysis using ML.
  • Automate processes.
  • Database Administrator

    Note V. Contingency is the name of the game..

    Job Description

    A database administrator, commonly abbreviated as DBA, maintains the integrity and functioning of a database. This position entails running regular diagnostic tests to ensure data is not corrupt and combing for bugs or glitches within the system. Safely storing and backing-up datain case of system failure or memory loss and creating plans for addressing large-scale errors are also important responsibilities of a DBA.

    It is important that DBAs works closely with SysAdmins to ensure high availability of servers supporting clusters.

    Knowledge of

  • Linux
  • SQL
  • Java
  • Python
  • Responsibilities
  • Manage backups
  • Capacity planning
  • Disaster recovery processesand procedures
  • Security
  • Index maintenance
  • Data Engineer

    Note VI. Overseeing the technical part of data…

    Job Description

    Data Engineers are usually senior individuals in the organisation with extensive knowledge of data models, databases, IT infrastructure and software engineering.

    Your data engineers are responsible for building optimized data flows that can be relied on in every day decision making and operations. To accomplish this, data engineers need experience in building database architecture by allocating data storage, establishing rules for data flow and most importantly, choosing the correct technology stack to run the data pipelines.

    Knowledge of

  • Linux
  • Distributed database systems
  • Python,R
  • Java
  • C++
  • Scala
  • Airflow
  • SysAdmin

    Note VII. When its good, its good, but when its bad, hell hath no fury like a SysAdmin scorned…

    Job Description

    System administrators (SysAdmin) are benevolent creatures with endless power who make sure your computer and network remain in good working order no matter what the silly data engineers/scientist/devops do.

    By far the most important role if a organisation is to succeed, as the SysAdmin is responsible for keeping the system running in a secure manner no matter what the workload is. SysAdmins are also responsible for configuration management tools so that a system can be restored procedurely if it goes down.

    Knowledge of

  • Linux CLI tools (awk, sed, jq)
  • Perl/Python
  • Ansible/ Jenkins etc
  • Responsibilities
  • Be ready for 2am calls. ;-)
  • DevOps

    Note VIII. DevOps transforms the delivery capability of development and software teams….

    Job Description

    Development & Operations (DevOps) is a series of practices and processes that are intended to speed up and automate the developing,testing, and releasing of software to allow for the continuous delivery of software and software updates. DevOps are the team that is responsible for putting models, applications, dashboards and APIs intoproduction and orchestrating how each of the pieces of software interact with one another and the public.

    Knowledge of

  • Linux
  • Kubernetes
  • Docker
  • Python
  • CI/CD tools
  • Ansible/ Jenkins etc
  • Data engineering and its role in data lifecyle

    The data engineering lifecycle comprises stages that turn raw data ingredients into a useful end product, ready for consumption by analysts, data scientists, ML engineers, and others

    We divide the data engineering lifecycle into five stages

  • Generation
  • We cannot engineer data we do not have, and that reason why data generation is the first step in the data lifecycle, data generated in different sources like data collection tools,web scraping the electronic device like PC, smartphones ,sensors, and so on …

  • Ingestion
  • Moves data from multiple sources SQL and NoSQL databases, IoT devices, websites, streaming services, etc.to a target system to be transformed for further analysis. Data comes in various forms and can be both structured and unstructured.

  • Data transformation
  • Adjusts disparate data to the needs of end users. It involves removing errors and duplicates from data, normalizing it, and converting it into the needed format.

  • Data serving
  • Delivers transformed data to end users — a BI platform, dashboard, or data science team

    Note I. we have many types of data pipelines like ETL,ELT but ETL is most commonly used.

    ETL data pipeline

  • Extract retrieving data. At the start of the pipeline, we’re dealing with raw data from numerous sources like databases, APIs, files, etc.
  • Transform standardizing data. Having data extracted, scripts transform it to meet the format requirements. Data transformation significantly improves data discoverability and usability.
  • Load saving data to a new destination. After bringing data into a usable state, engineers can load it to the destination, typically a database management system (DBMS) or data warehouse.
  • Data engineering pipeline

    Note II. The mechanism that automates ingestion, transformation, and serving steps of the data engineering process is known as a data pipeline.

    Commonly, ETL pipelines are used for

  • Data migration between systems or environments (from on-premises to cloud databases);
  • Data wrangling or converting raw data into a usable format for analytics, BI, and machine learning projects;
  • Data integration from various systems and IoT devices; and copying tables from one database to another.
  • major pitfalls in building data pipelines:

  • lacking relevant metrics
  • underestimating data load
  • Note II. Besides a pipeline, a data warehouse must be built to support and facilitate data science activities. Let’s see how it works.

    Introduction to data warehousing and data lakes

    Data warehouse

    DW is a central repository storing data in queryable forms. From a technical standpoint, a data warehouse is a relational database optimized for reading, aggregating, and querying large volumes of data. Traditionally, DWs only contained structured data or data that can be arranged in tables. However, modern DWs can also support unstructured data (such as images, pdf files, and audio formats).

    Relational Databases and SQL Fundamentals

    Assignment I

    Click on button to expand and ready instructions and download materials

    Postgresql download windows and instalations

    https://drive.google.com/drive/folders/1vVU17PaTe9yUfuA4SQ_7xcpe1WjmSlgI?usp=sharing

    navigate in folder colled setup and download postgresql and dbeaver

    Install postgresql

    NB do you have to remember the passowrd you set

    Install dbeaver

    Set up connection to database

    if every thing is good, you have to check is Postgresql installed well

    • Check via SQLshell(psql)

    Enter your password

    If we get same output now we are save,postgresql is installed well

    • check if we con connect via dbeaver now, we are ready to go

    Uploading data from CSV to database

    Creating database

    we have two way to create database

    • Creating database via SQL shell
    CREATE DATABASE training; 

    or

    • Via any Grapgical application used to interact with database

    After creating database, now we have to create table with the flowing sql command

    drop table if exists property;
    
    create table property(
        property_type VARCHAR(255),
        addresslocality VARCHAR(255),
        bedrooms INT,
        bathrooms INT,
        derived_lcy DOUBLE precision not null
    );

    Importing data postgressql table

    All remaining query we run using them using dbvear edit query

    check if data loaded cullectly

    • viewing all data in propery table
    select
        property_type,
        addresslocality,
        bedrooms,
        bathrooms,
        derived_lcy
    from
        public.property;

    or

    select
        *
    from
        public.property;

    SQL Commands

    Data Definition Language(DDL)

    • Create a table

      In order to upload our synthetic transactions data set, we need to create the table structure into which the data must go. These structures can become quite complex, but for now we only going to create a fact and dimension table (transactions and customer information respectively). We allocate our PRIMARY KEY as the date or” transaction, the id and the customer who performed the transaction.

         create table customers(
      customer VARCHAR(20),
      gender VARCHAR(100),
      age int,
      
      primary key(customer));

    Note I Keys are an important feature which can optimize looking up a transaction and also ensuring performance while maintaining data integrity. We don’t cover keys in this course, but its something to be aware of, especially. when we start learning about joins

    Now we gonna load the customer information. This table contains the demographic information associated with each customer.

    
     create table transactions(
    
    transaction_date DATE,
    
    transaction_id VARCHAR(20),
    
    customer VARCHAR(20),
    
    sku VARCHAR(100),
    
    amount decimal,
    
    primary key(transaction_date,
    transaction_id,
    customer),
    foreign key(customer) references customers(customer)
    );

    From the two CREATE TABLE statements you can see that the two tables are linked via the customer column. We will be learning JOINS near the end of today, which willjoin the tables together so that we can get demographic information on transactions.

    Our final table structures

    Loading data from google drive folder (Transaction.csv,customer.csv)

    Data Query Language(DQL)

    Note I The KING of all statements: SELECT

    Previously I decided I wanted to return all the columns (*), but what if l only want to return one or two of columns?

    SELECT customer, amount FROM transactions LIMIT 10;

    Lets build a bigger SELECT statement( I like 3 tab indentation ):

    select
        transaction_date,
        customer,
        sku,
        amount
    from
        transactions
    limit 10;

    SELECT but with filter criteria

    What happens if we only want to return transactions of a certain type?

    Well, then we can employ the WHERE statement We are going to collect the same columns as previously, but now we will specify the WHERE criteria on sku column where equal to airtime:

    select
        transaction_date,
        customer,
        sku,
        amount
    from
        transactions
    where
        sku = 'airtime'
    limit 10;
    • exercise

      In your notebook, write the code to bring back 100 examples where the sku is p2p and transaction_date is 2020-08-19.

      SELECT but with filter criteria and order In certain circumstances, it is necessary to order your date to get the correct output For instance if we want to get the top 10 largest amounts:

    
    select
        transaction_date,
        customer,
        sku,
        amount
    from
        transactions
    order by
        amount desc
    limit 10;

    Aggregations (Pivoting) in SQL

    Pivoting forms part ofthe aggregation function of SQL. This helps us answer questions like:

    - What is the average amount of spent per gender?
    - Total value and volume per date?
    - Total volume and value dissagregated by gender and age?

    As you can see, aggregations or ROUP BY clase gets used OFTEN, so learn it well and get comfortable with it.

    Lets illustrate a basic example before moving onto complex queries. What is the total value per date?

    select
        transaction_date,
    ROUND(SUM(amount))
    from
        transactions
    group by
        transaction_date
    order by
        transaction_date desc
    limit 10;

    What is the total value, volume and distinct customers doing p2p per day?

    select
        transaction_date,
        ROUND(SUM(amount)) value,
        COUNT(*) volume,
        COUNT(distinct customer) distinct_cust
    from
        transactions
    where
        sku = 'p2p'
    group by
        transaction_date
    order by
        transaction_date desc
    limit 10;

    Notice how l ALIAS my aggregations as {aggregation} then {name}. This will make your life a lot easier and in some case it is mandatory… as in joins.

    Last but not least: JOINS

    By now you are asking yourself, if we designed our database in the beautiful star schema that we talked about earlier, how do we join all the information together again? This is where JOINS come in and there are a multitude ofthem. Mostimportant one is LEFT JOIN and INNER JOIN:

    Lets attempt a basic join before we combine joins with aggregations. To start off we will JOIN the customers table onto the transactions table:

    
    select
        *
    from
        transactions as trans
    left join customers as cust
    
    on
        trans.customer = cust.customer
    limit 10;

    There are ways to optimize your joins to be extremely fast.

    • One is keys (which is why we used primary keys in our tables).

    Notice how I use the term USING and not ON. If your columns are named the same in both tables this is a much cleaner way to join.

    select
        transaction_date,
        gender,
        age,
        COUNT(*) volume,
        ROUND(SUM(amount)) value,
        COUNT(distinct customer) distinCt_customers
    from
        transactions as trans
    left join customers as cust
            using(customer)
    where
        sku = 'p2p'
    group by
        transaction_date,
        gender,
        age
    limit 10;

    Data analysis using SQL

    Assignment II

    Flow instruction and steps and then Answer provided questions from database you have created

    Question 1.

      1. How many transactions are there in the transactions table?
      1. How many customers are there in the customer database?
      1. Are there the same number of customers in the transactions and customer databases? Why is it important that these figures are the same?
      1. How many years and months does this data cover?
      1. What are all the different transaction types present in the transactions data?
      1. What is the most common transaction type?
      1. Which transaction type has the highest total amount (value) transacted? What is the average transaction size for each transaction type?. and explain why is this type in rwanda context?
      1. Which transaction is done by the highest number of unique customers?
      1. What percentage of customers are female?
      1. How many customers are 25 years old? How many customers are 60 years old? What does this say about financial inclusion?
      1. What is the total value transacted by customers 18 years or younger? How many distinct customers are there 18 years or younger? Provide your answer in one table.

    Question 2.

    A

    NOSQL databases

    Four V’s of Big Data

    SQL VS NoSQL

    Common Characteristics of NoSQL database

    • Non relational

      • They don’t use the relational data model and thus don’t use the SQL language
    • Cluster friendly

      • They tend to be designed to run on a cluster
    • Open Source

    • Schema less

      • They don’t have a fixed schema

      • They allow you to store any data in any record

    NoSQL Originators

    • Google ( BigTable , LevelDB

    • LinkedIn (Voldemort)

    • Facebook (Cassandra)

    • Twitter ( Hadoop Hbase , FlockDB , Cassandra)

    • Netflix ( SimpleDB , Hadoop HBase ,Cassandra)

    Different Types of NoSQL

    • Key Value Store

      • A key that refers to a payload (actual content / data)

      • MemcacheDB , Azure Table Storage, Redis

    • Column Store

      • Column data is saved together, as opposed to row data

      • Super useful for data analytics

      • Hadoop , Cassandra, Hypertable

    • Document / XML / Object Store

      • Key (and possibly other indexes) point at a serialized object

      • DB can operate against values in document

      • MongoDB , CouchDB , RavenDB

    • Graph Store

      • Nodes are stored independently, and the relationship between nodes (edges) are stored with data

      • Neo4j

    Mongodb

    why Document-based

    • Handles Schema Changes Well (easy development)

    • Solves Impedance Mismatch problem

    • Usually in JSON

    • Not really schema-less

      • Implicit schema to retrieve specific values

      • E.g.: I want a price of an order!

    More documentation

    Mobgodb Commands

    • Creating database
    use bookdb;

    use can be used to switch to database or create it if does not exist

    • Insert a document without an _id field example
    db.books.insertOne(
    {
        itle: 'MongoDB insertOne',
        isbn: '0-7617-6154-3'
    }
    )

    Check if data is inserted correctly

    db.books.find()

    Output

    {
        "_id" : ObjectId("64855cf6acadb7ef84350738"),
        "itle" : "MongoDB insertOne",
        "isbn" : "0-7617-6154-3"
    }
    • Insert a document with an _id field example
    _id: 1,
       title: "Mastering Big Data",
       isbn: "0-9270-4986-4"

    Output

    {
        "_id" : ObjectId("64855cf6acadb7ef84350738"),
        "itle" : "MongoDB insertOne",
        "isbn" : "0-7617-6154-3"
    }
    
    {
        "_id" : NumberInt(1),
        "title" : "Mastering Big Data",
        "isbn" : "0-9270-4986-4"
    }
    
    • Using MongoDB insertMany()
    db.books.insertMany(
    [
    { title:  "NoSQL Distilled", isbn: "0-4696-7030-4",autor:"sample name"},
       { title:  "NoSQL in 7 Days", isbn: "0-4086-6859-8"},
       { title:  "NoSQL Database", isbn: "0-2504-6932-4"},
    ]
    )
    • MongoDB findOne()
    use productdb
    db.products.insertMany([
        { "_id" : 1, "name" : "xPhone", "price" : 799, "releaseDate": ISODate("2011-05-14"), "spec" : { "ram" : 4, "screen" : 6.5, "cpu" : 2.66 },"color":["white","black"],"storage":[64,128,256]},
        { "_id" : 2, "name" : "xTablet", "price" : 899, "releaseDate": ISODate("2011-09-01") , "spec" : { "ram" : 16, "screen" : 9.5, "cpu" : 3.66 },"color":["white","black","purple"],"storage":[128,256,512]},
        { "_id" : 3, "name" : "SmartTablet", "price" : 899, "releaseDate": ISODate("2015-01-14"), "spec" : { "ram" : 12, "screen" : 9.7, "cpu" : 3.66 },"color":["blue"],"storage":[16,64,128]},
        { "_id" : 4, "name" : "SmartPad", "price" : 699, "releaseDate": ISODate("2020-05-14"),"spec" : { "ram" : 8, "screen" : 9.7, "cpu" : 1.66 },"color":["white","orange","gold","gray"],"storage":[128,256,1024]},
        { "_id" : 5, "name" : "SmartPhone", "price" : 599,"releaseDate": ISODate("2022-09-14"), "spec" : { "ram" : 4, "screen" : 5.7, "cpu" : 1.66 },"color":["white","orange","gold","gray"],"storage":[128,256]}
     ])
     db.products.findOne(
     {
         _id:2
     }
     )
    • Using MongoDB findOne() method to select some fields
    db.collection.findOne(query, projection);
    
    db.products.findOne({})
    
    

    Example 1

    db.products.findOne(
    {_id:5},
    {name:1,color:1},
    
    )
    • MongoDB find()
    db.collection.find(query, projection)
    
    
    
    use bookdb
    
    db.books.insertMany([
        { "_id" : 2, "title" : "Android in Action, Second Edition", "isbn" : "1935182722", "categories" : [ "Java" ] },
        { "_id" : 3, "title" : "Specification by Example", "isbn" : "1617290084", "categories" : [ "Software Engineering" ] },
        { "_id" : 4, "title" : "Flex 3 in Action", "isbn" : "1933988746", "categories" : [ "Internet" ] },
        { "_id" : 5, "title" : "Flex 4 in Action", "isbn" : "1935182420", "categories" : [ "Internet" ] },
        { "_id" : 6, "title" : "Collective Intelligence in Action", "isbn" : "1933988312", "categories" : [ "Internet" ] },
        { "_id" : 7, "title" : "Zend Framework in Action", "isbn" : "1933988320", "categories" : [ "Web Development" ] },
        { "_id" : 8, "title" : "Flex on Java", "isbn" : "1933988797", "categories" : [ "Internet" ] },
        { "_id" : 9, "title" : "Griffon in Action", "isbn" : "1935182234", "categories" : [ "Java" ] },
        { "_id" : 10, "title" : "OSGi in Depth", "isbn" : "193518217X", "categories" : [ "Java" ] },
        { "_id" : 11, "title" : "Flexible Rails", "isbn" : "1933988509", "categories" : [ "Web Development" ] },
        { "_id" : 13, "title" : "Hello! Flex 4", "isbn" : "1933988762", "categories" : [ "Internet" ] },
        { "_id" : 14, "title" : "Coffeehouse", "isbn" : "1884777384", "categories" : [ "Miscellaneous" ] },
        { "_id" : 15, "title" : "Team Foundation Server 2008 in Action", "isbn" : "1933988592", "categories" : [ "Microsoft .NET" ] },
        { "_id" : 16, "title" : "Brownfield Application Development in .NET", "isbn" : "1933988711", "categories" : [ "Microsoft" ] },
        { "_id" : 17, "title" : "MongoDB in Action", "isbn" : "1935182870", "categories" : [ "Next Generation Databases" ] },
        { "_id" : 18, "title" : "Distributed Application Development with PowerBuilder 6.0", "isbn" : "1884777686", "categories" : [ "PowerBuilder" ] },
        { "_id" : 19, "title" : "Jaguar Development with PowerBuilder 7", "isbn" : "1884777864", "categories" : [ "PowerBuilder", "Client-Server" ] },
        { "_id" : 20, "title" : "Taming Jaguar", "isbn" : "1884777686", "categories" : [ "PowerBuilder" ] },
        { "_id" : 21, "title" : "3D User Interfaces with Java 3D", "isbn" : "1884777902", "categories" : [ "Java", "Computer Graphics" ] },
        { "_id" : 22, "title" : "Hibernate in Action", "isbn" : "193239415X", "categories" : [ "Java" ] },
        { "_id" : 23, "title" : "Hibernate in Action (Chinese Edition)", "categories" : [ "Java" ] },
        { "_id" : 24, "title" : "Java Persistence with Hibernate", "isbn" : "1932394885", "categories" : [ "Java" ] },
        { "_id" : 25, "title" : "JSTL in Action", "isbn" : "1930110529", "categories" : [ "Internet" ] },
        { "_id" : 26, "title" : "iBATIS in Action", "isbn" : "1932394826", "categories" : [ "Web Development" ] },
        { "_id" : 27, "title" : "Designing Hard Software", "isbn" : "133046192", "categories" : [ "Object-Oriented Programming", "S" ] },
        { "_id" : 28, "title" : "Hibernate Search in Action", "isbn" : "1933988649", "categories" : [ "Java" ] },
        { "_id" : 29, "title" : "jQuery in Action", "isbn" : "1933988355", "categories" : [ "Web Development" ] },
        { "_id" : 30, "title" : "jQuery in Action, Second Edition", "isbn" : "1935182323", "categories" : [ "Java" ] }
    ]);

    Example 1

    db.books.find({ categories: 'Java'}, { title: 1,isbn: 1})

    Example 2

    db.books.find({ categories: 'Java'}, { title: 1,isbn: 1}).limit(3)
    • MongoDB projection

    Example 1

    db.products.find({price: 899})

    Example 2

    db.products.find({}, {
        name: 1,
        price: 1
    });

    Example 3

    db.products.find({}, {
        name: 1,
        price: 1
    ,
        _id: 0
    });

    Example

    db.products.find({_id:1}, {
        releaseDate: 0,
        spec: 0,
        storage: 0
    })
    • Comparison query operations