📈 Emerging EdTech Innovators


| Introduction and Objective


The primary objective of this project was to explore emerging companies and trends in the EdTech industry, as featured on HolonIQ’s list. This initiative aimed to gain a comprehensive understanding of the current market landscape, categorize the startups based on their focus areas, and pinpoint potential clients, providers, partners, competitors, or emerging markets. The analysis was structured to examine these entities by geographic regions and industry verticals, ultimately providing strategic insights to understand edunext’s position within the industry and enhance its market positioning and growth.

🚨 Why the HolonIQ EdTech Ranking was Selected

The HolonIQ EdTech ranking was selected due to its reputation as a comprehensive and authoritative source of information on the most innovative and high-potential EdTech startups globally. The ranking provides valuable insights into the leading companies in the EdTech sector, making it an ideal resource for identifying key players and emerging trends in the industry. By leveraging this ranking, eduNext can ensure its analysis is based on credible and up-to-date information, enabling informed decision-making and strategic planning.

💫 Company Overview

Edunext is an infrastructure-as-a-service provider and software company dedicated to the Open edX platform. It empowers organizations worldwide by delivering robust, scalable, and customizable online learning solutions. Edunext’s mission is to enhance the quality of education through technology, supporting successful online learning initiatives across various sectors.


| Project Overview


The project involved generating a list of the 1000 EdTech startups featured in the HolonIQ ranking. This list included links to their LinkedIn profiles. A scraper was developed to extract information from LinkedIn, and additional data was gathered from the startups’ websites, which were saved in text files. This collected information was then used to feed the ChatGPT API with specific prompts. The purpose of these prompts was to build detailed profiles of each company and obtain better answers to questions oriented toward classifying and identifying potential business opportunities for eduNext. This methodology provided a comprehensive and detailed understanding of each startup, facilitating strategic insights for edunext’s market positioning and growth.


| HolonIQ EdTech Ranking Overview


📘 Introduction to HolonIQ

HolonIQ is a global market intelligence firm specializing in the education, climate, and health sectors. Each year, it publishes the Global EdTech 1000 list, which highlights the most promising startups in the educational technology (EdTech) field at both global and regional levels. Sublists include the “Top 200 EdTech in North America,” the “Top 50 EdTech in Australia and New Zealand,” and many more for different regions worldwide.

🔍 How Does the HolonIQ Ranking Work?

Evaluation and Selection:

Evaluation Criteria:

Selection Process:

📊 Importance of the Ranking

These rankings not only highlight the most promising startups but also help connect different regions of the world and share innovations that can improve educational outcomes globally. Promising startups are those that show exceptional potential in terms of innovation, market impact, and growth trajectory, making them key players in driving the future of education. Additionally, they provide investors and other stakeholders with a clear view of emerging trends and the companies leading the change in education.

For more details on HolonIQ rankings and methodologies, you can visit their official website: HolonIQ.


| Methodology


📋 Generate the list with the 1000 companies

The project leveraged a curated list of 1,000 EdTech startups compiled by Holoniq. This list, provided in image format, segmented the startups by geographic region (Africa, Nordic-Baltic, South Asia, etc.). To facilitate further analysis, the initial step involved meticulously extracting key information from each entry. This information included the company name, LinkedIn profile URL, and company website address. The data extraction process was a collaborative effort, requiring manual work from multiple team members. Additionally, support tools like Gemini and GPT-3 were employed to enhance efficiency.

This is an example of the source format of the lists mentioned above.


The Global EdTech 1000 list for this case includes:

We used the 2023 HolonIQ EdTech lists for our analysis.

Once this process was done, the list looked like this:


📋 LinkedIn Scraper

Initially, there were 1,046 companies in the list. After reviewing, it was found that only 937 were unique, as the list was manually compiled by several team members and some entries were duplicated. Of these 937 unique companies, 117 had no LinkedIn page. The LinkedIn scraping process was therefore conducted with 820 companies, yielding 703 successful responses.

The results are summarized in the following table:

Category Number of companies
Total Companies in List 1046
Total Unique Companies 937
Companies without LinkedIn in list 117
Total companies with LinkedIn 820
LinkedIn Retrieved 707
LinkedIn Not Retrieved (Needs Review) 113
LinkedIn Retrieved Webpage 703

The links that could not be retrieved are often in school format rather than company format. Here is an example of the links from which information could not be collected:

Failed Links:

While the links that have company format worked correctly. Here are some examples:

Successful Links:

Company Name URL
byteXL https://in.linkedin.com/company/bytexl
Toodle https://in.linkedin.com/company/toodlerungta
Eupheus Learning https://in.linkedin.com/company/eupheus-learning
10 Minute School https://bd.linkedin.com/company/10ms
Adda247 https://in.linkedin.com/company/adda247
Apars Classroom https://bd.linkedin.com/company/aparsclassroom
EduGorilla https://in.linkedin.com/company/edugorilla-pvt-ltd
Infinity Learn https://in.linkedin.com/company/infinity-learn-by-sri-chaitanya

Note: If you want to see the technical information of the process, please click the button below.

Document Technical Information

This section explains the technical details of how we collected and processed the data. Here is an easy-to-understand breakdown:

  1. Guest Mode: The code acts like a guest browsing LinkedIn, meaning it doesn’t log in but still retrieves the necessary information.
  2. Fetching Data: It fetches web pages from LinkedIn and other company websites to get the HTML content. This content includes all the visible information about a company.
  3. Parsing Data: After getting the HTML content, the code extracts the important details like company name, address, number of employees, and description. This is done using special tools that can read and understand HTML code.
  4. Storing Data: The extracted information is then neatly organized into tables (dataframes) and saved as Excel files. This makes it easy to review and analyze the data later.
  5. Error Handling: If there are any issues while fetching the data, the code tries a few more times before giving up. This ensures that as much data as possible is collected reliably.

This approach allowed us to gather comprehensive information about each company, which was then analyzed to identify potential business opportunities for EduNext.


import jmespath
import asyncio
import json
from typing import List, Dict
from httpx import AsyncClient, Response, RemoteProtocolError
from parsel import Selector
from loguru import logger as log
import pandas as pd
import re
from bs4 import BeautifulSoup
from urllib.parse import urlparse, parse_qs
import os
import time
import requests

# Initialize an async httpx client
client = AsyncClient(
    http2=True,
    headers={
        "Accept-Language": "en-US,en;q=0.9",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Encoding": "gzip, deflate, br",
    },
    follow_redirects=True
)

def strip_text(text):
    """Remove extra spaces while handling None values."""
    return text.strip() if text is not None else text

def get_actual_url(link):
    parsed_url = urlparse(link)
    query_params = parse_qs(parsed_url.query)
    return query_params['url'][0] if 'url' in query_params else link

def parse_company(response_text: str) -> Dict:
    """Parse company main overview page."""
    selector = Selector(response_text)
    script_data = selector.xpath("//script[@type='application/ld+json']/text()").get()
    if script_data:
        script_data = json.loads(script_data)
    else:
        script_data = {}
    script_data = jmespath.search(
        """{
        name: name,
        url: url,
        mainAddress: address,
        description: description,
        numberOfEmployees: numberOfEmployees.value,
        logo: logo
        }""",
        script_data
    ) or {}
    data = {}
    for element in selector.xpath("//div[contains(@data-test-id, 'about-us')]"):
        name = element.xpath(".//dt/text()").get().strip()
        value = element.xpath(".//dd/text()").get().strip()
        data[name] = value
    addresses = []
    for element in selector.xpath("//div[contains(@id, 'address') and @id != 'address-0']"):
        address_lines = element.xpath(".//p/text()").getall()
        address = ", ".join(line.replace("\n", "").strip() for line in address_lines)
        addresses.append(address)
    affiliated_pages = []
    for element in selector.xpath("//section[@data-test-id='affiliated-pages']/div/div/ul/li"):
        affiliated_pages.append({
            "name": element.xpath(".//a/div/h3/text()").get().strip(),
            "industry": strip_text(element.xpath(".//a/div/p[1]/text()").get()),
            "address": strip_text(element.xpath(".//a/div/p[2]/text()").get()),
            "linkeinUrl": element.xpath(".//a/@href").get().split("?")[0]
        })
    similar_pages = []
    for element in selector.xpath("//section[@data-test-id='similar-pages']/div/div/ul/li"):
        similar_pages.append({
            "name": element.xpath(".//a/div/h3/text()").get().strip(),
            "industry": strip_text(element.xpath(".//a/div/p[1]/text()").get()),
            "address": strip_text(element.xpath(".//a/div/p[2]/text()").get()),
            "linkeinUrl": element.xpath(".//a/@href").get().split("?")[0]
        })

    # Additional fields from the second script
    soup = BeautifulSoup(response_text, 'html.parser')
    title_tag = soup.find('title')
    designation_tag = soup.find('h2')
    followers_tag = soup.find('meta', {"property": "og:description"})
    description_tag = soup.find('p', class_='break-words')
    website_tag = soup.find('a', attrs={'data-tracking-control-name': 'about_website'})
    website = get_actual_url(website_tag['href']) if website_tag else "Website not found"
    description_span = soup.find('h4', class_='top-card-layout__second-subline')
    description = description_span.get_text(strip=True) if description_span else "Description not found"
    
    # Crunchbase funding information
    funding_section = soup.find('section', attrs={'data-test-id': 'funding'})
    if funding_section:
        all_rounds_tag = funding_section.find('a', attrs={'data-tracking-control-name': 'funding_all-rounds'})
        if all_rounds_tag:
            all_rounds_match = re.search(r'(\d+ total rounds)', all_rounds_tag.get_text(strip=True))
            all_rounds_info = all_rounds_match.group(1) if all_rounds_match else "All rounds info not found"
        else:
            all_rounds_info = "All rounds info not found"

        last_round_tag = funding_section.find('a', attrs={'data-tracking-control-name': 'funding_last-round'})
        if last_round_tag:
            last_round_info = last_round_tag.find('time').get_text(strip=True)
            last_round_amount_tag = funding_section.find('p', class_='text-display-lg')
            last_round_amount = last_round_amount_tag.get_text(strip=True) if last_round_amount_tag else "Last round amount not found"
            last_round_link = last_round_tag['href']
            last_round_formatted_date = last_round_tag.find('time')['datetime']
        else:
            last_round_info = "Last round info not found"
            last_round_amount = "Last round amount not found"
            last_round_link = "Last round link not found"
            last_round_formatted_date = "Last round date not found"

        investors_tag = funding_section.find('a', attrs={'data-tracking-control-name': 'funding_investors'})
        investors_info = investors_tag.get_text(strip=True) if investors_tag else "Investors info not found"
    else:
        all_rounds_info = "Crunchbase funding info not found"
        last_round_info = "Last round info not found"
        last_round_amount = "Last round amount not found"
        investors_info = "Investors info not found"
        last_round_link = "Last round link not found"
        last_round_formatted_date = "Last round date not found"

    # Check if the tags are found before calling get_text()
    name = title_tag.get_text(strip=True).split("|")[0].strip() if title_tag else "Profile Name not found"
    designation = designation_tag.get_text(strip=True) if designation_tag else "Designation not found"
    followers_match = re.search(r'\b(\d[\d,.]*)\s+followers\b', followers_tag["content"]) if followers_tag else None
    followers_count = followers_match.group(1) if followers_match else "Followers count not found"
    description_profile = description_tag.get_text(strip=True) if description_tag else "Profile Description not found"

    additional_data = {
        "profileName": name,
        "designation": designation,
        "followersCount": followers_count,
        "profileDescription": description_profile,
        "website": website,
        "crunchbaseAllRoundsInfo": all_rounds_info,
        "crunchbaseLastRoundInfo": last_round_info,
        "crunchbaseLastRoundAmount": last_round_amount,
        "crunchbaseInvestorsInfo": investors_info,
        "lastRoundFormattedDate": last_round_formatted_date,
        "crunchbaseLink": last_round_link,
    }

    data = {**script_data, **data, **additional_data}
    data["addresses"] = addresses    
    data["affiliatedPages"] = affiliated_pages
    data["similarPages"] = similar_pages
    return data

def read_links_from_file(file_path: str) -> List[str]:
    """Read URLs from a text or Excel file."""
    if file_path.endswith('.txt'):
        with open(file_path, 'r') as file:
            urls = file.read().splitlines()
    elif file_path.endswith('.xlsx'):
        df = pd.read_excel(file_path)
        urls = df['Links'].tolist()
    else:
        raise ValueError("Unsupported file format. Please use a .txt or .xlsx file.")
    return urls

# Initialize dataframes globally
df_company_info = pd.DataFrame()
df_company_addresses = pd.DataFrame()
df_affiliated_pages = pd.DataFrame()
df_similar_pages = pd.DataFrame()

async def fetch_with_retry(url, retries=3, backoff_factor=0.5):
    for attempt in range(retries):
        try:
            response = await client.get(url)
            return response
        except RemoteProtocolError as e:
            log.error(f"Attempt {attempt + 1} for {url} failed: {e}")
            time.sleep(backoff_factor * (2 ** attempt))
    raise Exception(f"All {retries} attempts failed for {url}")

def fetch_with_requests(url, retries=3, backoff_factor=0.5):
    headers = {
        "User-Agent": "Guest",
    }
    for attempt in range(retries):
        try:
            response = requests.get(url, headers=headers)
            if response.status_code == 200:
                return response
        except requests.RequestException as e:
            log.error(f"Attempt {attempt + 1} for {url} failed: {e}")
            time.sleep(backoff_factor * (2 ** attempt))
    raise Exception(f"All {retries} attempts failed for {url}")

async def scrape_company(urls: List[str]) -> List[Dict]:
    """Scrape public LinkedIn company pages."""
    data = []
    failed_links = []
    for url in urls:
        try:
            response = await fetch_with_retry(url)
            if response.status_code == 200:
                data.append(parse_company(response.text))
                log.success(f"Successfully scraped {url}")
            elif response.status_code == 999:  # Use requests as fallback
                log.warning(f"Status code 999 for {url}, switching to requests")
                response = fetch_with_requests(url)
                if response.status_code == 200:
                    data.append(parse_company(response.text))
                    log.success(f"Successfully scraped {url} with requests fallback")
                else:
                    failed_links.append(url)
                    log.error(f"Failed to scrape {url} with status code {response.status_code}")
            else:
                failed_links.append(url)
                log.error(f"Failed to scrape {url} with status code {response.status_code}")
            # Delay between requests to avoid rate limiting
            time.sleep(1)
        except Exception as e:
            failed_links.append(url)
            log.error(f"Error scraping {url}: {e}")
    return data, failed_links

async def run():
    urls = read_links_from_file('profiles.txt')  # Using the 'profiles.txt' file
    profile_data, failed_links = await scrape_company(urls)
    
    global df_company_info, df_company_addresses, df_affiliated_pages, df_similar_pages

    for company in profile_data:
        main_address = company.get('mainAddress', {})
        company_info = {
            "name": company.get("name"),
            "url": company.get("url"),
            "streetAddress": main_address.get("streetAddress") if main_address else None,
            "addressLocality": main_address.get("addressLocality") if main_address else None,
            "addressRegion": main_address.get("addressRegion") if main_address else None,
            "postalCode": main_address.get("postalCode") if main_address else None,
            "addressCountry": main_address.get("addressCountry") if main_address else None,
            "description": company.get("description"),
            "numberOfEmployees": company.get("numberOfEmployees"),
            "Industry": company.get("Industry"),
            "Company size": company.get("Company size"),
            "Headquarters": company.get("Headquarters"),
            "Type": company.get("Type"),
            "Specialties": company.get("Specialties"),
            "profileName": company.get("profileName"),
            "designation": company.get("designation"),
            "followersCount": company.get("followersCount"),
            "profileDescription": company.get("profileDescription"),
            "website": company.get("website"),
            "crunchbaseAllRoundsInfo": company.get("crunchbaseAllRoundsInfo"),
            "crunchbaseLastRoundInfo": company.get("crunchbaseLastRoundInfo"),
            "crunchbaseLastRoundAmount": company.get("crunchbaseLastRoundAmount"),
            "crunchbaseInvestorsInfo": company.get("crunchbaseInvestorsInfo"),
            "lastRoundFormattedDate": company.get("lastRoundFormattedDate"),
            "crunchbaseLink": company.get("crunchbaseLink"),
        }
        df_company_info = pd.concat([df_company_info, pd.DataFrame([company_info])])

        for address in company.get("addresses", []):
            parts = address.split(", ")
            country = parts[-1] if parts else ""
            company_address = {
                "name": company.get("name"),
                "url": company.get("url"),
                "addresses": address,
                "country offices": country
            }
            df_company_addresses = pd.concat([df_company_addresses, pd.DataFrame([company_address])])

        for affiliated in company.get("affiliatedPages", []):
            affiliated_page = {
                "name": company.get("name"),
                "url": company.get("url"),
                "affiliated_name": affiliated["name"],
                "industry": affiliated["industry"],
                "address": affiliated["address"],
                "linkeinUrl": affiliated["linkeinUrl"]
            }
            df_affiliated_pages = pd.concat([df_affiliated_pages, pd.DataFrame([affiliated_page])])

        for similar in company.get("similarPages", []):
            similar_page = {
                "name": company.get("name"),
                "url": company.get("url"),
                "similar_name": similar["name"],
                "industry": similar["industry"],
                "address": similar["address"],
                "linkeinUrl": similar["linkeinUrl"]
            }
            df_similar_pages = pd.concat([df_similar_pages, pd.DataFrame([similar_page])])

        # Save to Excel after each company
        df_company_info.to_excel("company_information.xlsx", index=False)
        df_company_addresses.to_excel("company_addresses.xlsx", index=False)
        df_affiliated_pages.to_excel("affiliated_pages.xlsx", index=False)
        df_similar_pages.to_excel("similar_pages.xlsx", index=False)

    if failed_links:
        df_failed_links = pd.DataFrame({"failed_links": failed_links})
        df_failed_links.to_excel("failed_links.xlsx", index=False)

if __name__ == "__main__":
    try:
        asyncio.run(run())
    except Exception as e:
        log.error(f"Script terminated due to an error: {e}")

        # Save what has been scraped so far
        if not df_company_info.empty:
            df_company_info.to_excel("company_information.xlsx", index=False)
        if not df_company_addresses.empty:
            df_company_addresses.to_excel("company_addresses.xlsx", index=False)
        if not df_affiliated_pages.empty:
            df_affiliated_pages.to_excel("affiliated_pages.xlsx", index=False)
        if not df_similar_pages.empty:
            df_similar_pages.to_excel("similar_pages.xlsx", index=False)


The data collection and analysis process generates four main outputs.

1. LinkedIn Info

The first is called “LinkedIn Info” and contains the following columns:

Company Name Eupheus Learning
URL https://in.linkedin.com/company/eupheus-learning
Street Address A-12, Mohan Co-operative Industrial Estate
Locality New Delhi
Region New Delhi
Postal Code 110044
Country IN
Description “Eupheus in Greek means -”“Active seeking of knowledge”“​ Our Vision is to offer pedagogically differentiated technology driven solutions that lead to critical thinking and achievement of higher learning outcomes by seamlessly integrating in-class and at home learning in the private school segment of the Pre-K to 12 market. Our aim is to bridge the gap between what is taught in-class using institutional textbook driven solutions and retail at-home learning providers by seamlessly integrating both.”
Number of Employees 275
Industry E-Learning Providers
Company Size 51-200 employees
Headquarters New Delhi, New Delhi
Company Type Privately Held
Specialties Education, K-12, Curricular, E Learning, Pre Primary, Middle School Solutions, Senior School Solutions, Digital Reference Resources, Language Learning, Primary School Solutions, Teacher Support, Age Appropriate Resource, Digital, Learning, Print, Live Books, Fiction e Books, Coding, Kinesthetic Learning, CBSE Aligned Text Book, ICSE Aligned Text Book, Digital Library, Reading Program, Atal Tinkering Lab, and TOEFL
Profile Name Eupheus Learning
Designation E-Learning Providers
Followers Count 9,981
Profile Description “Eupheus in Greek means -”“Active seeking of knowledge”“​ Our Vision is to offer pedagogically differentiated technology driven solutions that lead to critical thinking and achievement of higher learning outcomes by seamlessly integrating in-class and at home learning in the private school segment of the Pre-K to 12 market. Our aim is to bridge the gap between what is taught in-class using institutional textbook driven solutions and retail at-home learning providers by seamlessly integrating both.”
Website https://www.eupheus.in
Crunchbase All Rounds Info 4 total rounds
Crunchbase Last Round Info Oct 14, 2021
Crunchbase Last Round Amount US$ 10.0M
Crunchbase Investors Info Lightrock
Last Round Formatted Date 14/10/2021
Crunchbase Link Crunchbase Link

For more detailed information, you can refer to the complete dataset here.

2. Afilliated Pages LinkedIn

The second output is called “Affiliated Pages LinkedIn” and contains information about affiliated pages (i.e., other pages related to the same company). Below is an example of the data collected:

Name upGrad
URL https://in.linkedin.com/company/ueducation
Affiliated Name upGrad Placements
Industry Human Resources Services
Address
LinkedIn URL https://in.linkedin.com/company/upgrad-placements-

For more detailed information, you can refer to the complete dataset here.

3. Similar Pages LinkedIn

The third output is called “Similar Pages LinkedIn” and contains the similar pages recommended by the LinkedIn algorithm. These are pages that LinkedIn suggests as being similar based on various factors such as industry, company size, and other attributes. Below is an example of the data collected:

Name byteXL
URL https://in.linkedin.com/company/bytexl
Similar Company CODINGCLUB
Industry Education
Address Vadodara, GUJARAT
LinkedIn URL https://in.linkedin.com/company/codingclub36

For more detailed information, you can refer to the complete dataset here.

4. Country Offices Address LinkedIn

The fourth output is called “Country Offices Addresses LinkedIn” and contains information about the different office locations and countries where the company has a presence. Below is an example of the data collected:

Name byteXL
URL https://in.linkedin.com/company/bytexl
Addresses Plano, TX, US
Country Offices US

For more detailed information, you can refer to the complete dataset here.

🪟 Website Scraper

Following the generation of a list containing 1000 ed-tech startups from the HolonIQ list, along with their LinkedIn links, the team proceeded to scrape the information available on LinkedIn. Utilizing the websites retrieved by the LinkedIn scraper (rather than those manually added during the initial list generation to ensure greater accuracy), they developed a scraper for the companies’ web pages. This information was then cleanly written into txt files to capture more comprehensive details about each company.

Note: If you want to see the technical information of the process, please click the button below.

Document Technical Information

This section explains the technical details of how we collected and processed the data. Here is an easy-to-understand breakdown:

  1. Cleaning Text: The clean_text function removes excess whitespace and special characters from the text to make it more readable.
  2. Language Detection: The detect_language_from_html and detect_language functions identify the language of the text, either from the HTML tag or the text content itself.
  3. Fetching HTML Content: The get_html_with_selenium function uses Selenium to load web pages and ensure all content is captured, especially for pages that load content dynamically.
  4. Saving Web Page Content: The save_formatted_webpage_content function retrieves web page content, cleans it, detects the language, and saves it to a text file. If the initial token count is insufficient, it switches to using Selenium for a more thorough scrape.
  5. Handling Requests and Errors: The function handles various exceptions to ensure robustness, including request errors and general exceptions.
  6. Main Processing Loop: The main function reads input data from an Excel file, processes each company’s website, and saves the results and any errors to separate Excel files.

This approach allowed us to gather comprehensive information about each company, which was then analyzed to identify potential business opportunities for EduNext.


import os
import re
import time
import pandas as pd
import requests
from bs4 import BeautifulSoup
from langdetect import detect, LangDetectException
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

from webdriver_manager.chrome import ChromeDriverManager

def clean_text(text):
    text = re.sub(r'\s+', ' ', text)
    text = text.replace('\xa0', ' ')
    return text.strip()

def detect_language_from_html(soup):
    html_tag = soup.find('html')
    if html_tag and html_tag.get('lang'):
        return html_tag['lang']
    else:
        return None

def detect_language(text):
    try:
        return detect(text)
    except LangDetectException:
        return 'unknown'

def get_html_with_selenium(url):
    options = Options()
    options.add_argument("--headless")
    options.add_argument("--disable-gpu")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-features=SameSiteByDefaultCookies")
    options.add_argument("--disable-features=CookiesWithoutSameSiteMustBeSecure")
    options.add_argument("log-level=3")  # Reduce logging output
    
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
    driver.get(url)
    
    try:
        # Wait for the body content to be loaded
        WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.TAG_NAME, 'body')))
        # Scroll to the bottom to ensure all content is loaded
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(5)  # Wait for content to load
        # Additionally, wait for the specific element that indicates the content is fully loaded
        WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//p")))
    except Exception as e:
        print(f"Error waiting for the page to load: {e}")
    
    html_content = driver.page_source
    driver.quit()
    
    return html_content

def save_formatted_webpage_content(url, company_name):
    try:
        print(f"Retrieving content from {url} for {company_name}...")

        response = requests.get(url, timeout=10)
        response.raise_for_status()
        response.encoding = response.apparent_encoding
        html_content = response.text

        soup = BeautifulSoup(html_content, 'html.parser')
        paragraphs = soup.find_all('p')

        paragraph_texts = '\n\n'.join([clean_text(p.get_text()) for p in paragraphs])
        token_count = len(paragraph_texts.split())

        # If the token count is less than 300, use Selenium
        if token_count < 300:
            print(f"Insufficient token count retrieved from {url} using requests. Switching to Selenium...")
            html_content = get_html_with_selenium(url)
            soup = BeautifulSoup(html_content, 'html.parser')
            paragraphs = soup.find_all('p')
            paragraph_texts = '\n\n'.join([clean_text(p.get_text()) for p in paragraphs])
            token_count = len(paragraph_texts.split())

        # Check if HTML content is retrieved
        if not soup.body:
            print(f"No body content found for {company_name} at {url}.")
            return None, None, f"No body content found at {url}"
        
        title = clean_text(soup.title.string) if soup.title else 'No Title Found'
        formatted_content = f"Title: {title}\n\n{paragraph_texts}"

        # Detect language
        language = detect_language_from_html(soup)
        if not language:
            language = detect_language(paragraph_texts)
        
        filepath = os.path.join('webpages', f"{company_name}.txt")
        with open(filepath, 'w', encoding='utf-8') as file:
            file.write(formatted_content)
        
        print(f"Content retrieved and saved for {company_name}. Token count: {token_count}, Language: {language}.")
        return token_count, language, None
    except requests.exceptions.RequestException as req_err:
        print(f"Request error for {company_name}: {req_err}")
        return None, None, f"Request error: {req_err}"
    except Exception as err:
        print(f"Error for {company_name}: {err}")
        return None, None, f"Error: {err}"

def main():
    input_filename = 'cleaned_company_information.xlsx'
    output_data_filename = 'company_token_counts.xlsx'
    output_errors_filename = 'website_errors.xlsx'
    
    # Read the input file
    df = pd.read_excel(input_filename)
    
    # Ensure the 'webpages' directory exists
    if not os.path.exists('webpages'):
        os.makedirs('webpages')
    
    # Lists to store results and errors
    results = []
    errors = []

    # Iterate over the rows in the DataFrame
    for index, row in df.iterrows():
        print(f"Processing {index + 1}/{len(df)}: {row['Company Name']} ({row['Website']})")
        company_name = row['Company Name']
        website = row['Website']
        token_count, language, error = save_formatted_webpage_content(website, company_name)
        
        if token_count is not None:
            results.append({'Company Name': company_name, 'Website': website, 'Token Count': token_count, 'Language': language})
        if error is not None:
            errors.append({'Company Name': company_name, 'Website': website, 'Error': error})
    
    # Save the results to an Excel file
    results_df = pd.DataFrame(results)
    results_df.to_excel(output_data_filename, index=False)
    
    # Save the errors to an Excel file
    errors_df = pd.DataFrame(errors)
    errors_df.to_excel(output_errors_filename, index=False)

    print(f"Process completed. Data saved to {output_data_filename} and errors saved to {output_errors_filename}.")

if __name__ == '__main__':
    main()

  

Following the web scraping process, the output of this part is a folder 📁 containing various txt files, with each file corresponding to a specific company. These files contain detailed information extracted from the companies’ web pages. This approach ensures that all relevant data is captured and organized systematically for further analysis.

As shown in the following example:

🔧 Integration with Chat-GPT

The next step involved integrating with OpenAI using the GPT-3.5 model to leverage the LinkedIn information and the Txt files with website information. These data sources were used to feed the model and answer specific questions designed to build a comprehensive profile of each company. The following questions were asked of the model:

The code leverages additional data from several Excel tables to enrich the analysis:

📄 Role Definitions
Provides detailed definitions for roles like client, partner, competitor, and provider, as well as information about Edunext.

Role Detailed Definitions
Edunext Edunext is the company driving this analysis. It is a software and services company dedicated to the Open edX platform.
CLIENT An entity that is likely to require hosting, maintenance, or professional services for the Open edX platform.
PARTNER An entity that provides instructional design services or a technology aggregator that may subcontract hosting or professional services for the Open edX platform to Edunext, as it is not part of its core business. It can also be an entity that provides a tool for online learning that can be integrated into the Open edX LMS or uses standard interoperability protocols such as LTI.
COMPETITOR An entity that has Open edX hosting, maintenance, or custom development as part of their core business offering and expertise.
PROVIDER An entity that supplies services that may be relevant and useful to Edunext to enhance its value proposition or raise its productivity.

📄 Verticals

These additional sources ensure the accuracy and comprehensiveness of the analysis, allowing the GPT-3.5 model to generate more precise and contextually relevant responses.

Vertical Name Vertical Definition
K-12 Education Technologies and platforms aimed at primary and secondary education.
Higher Education Solutions tailored for colleges, universities, and other tertiary education institutions.
Professional Development & Corporate Training Tools and programs designed for employee training, upskilling, and professional certifications.
Language Learning Apps, platforms, and services focused on teaching new languages.
STEM Education Resources and tools specific to Science, Technology, Engineering, and Mathematics education.
Learning Management Systems (LMS) Platforms that provide a comprehensive management system for learning processes, often used by institutions and corporations.
Tutoring and Mentoring Services and platforms that connect students with tutors and mentors.
Online Courses and MOOCs (Massive Open Online Courses) Platforms offering a variety of courses across different subjects, typically available to a large audience.
Content Creation and Publishing Tools and platforms for creating, sharing, and publishing educational content.
Edutainment Educational tools and resources that incorporate entertainment, such as educational games and interactive learning tools.
Special Education Technologies designed to support learners with special needs.
Assessment and Testing Solutions that focus on student evaluation, testing, and examination.
Early Childhood Education Platforms and tools aimed at pre-K education.
Virtual and Augmented Reality (VR/AR) in Education Immersive technologies used to enhance the learning experience.
Coding and Programming Platforms and tools that teach coding and programming skills.
Collaboration and Communication Tools Solutions that facilitate communication and collaboration among students, teachers, and parents.
School Administration and Management Systems designed to manage school operations, such as attendance, grades, and resource planning.
Educational Hardware Devices and physical technologies used in educational settings, like tablets, interactive whiteboards, and robotics kits.
Adaptive Learning Technologies that use data and analytics to personalize the learning experience.
EdTech Infrastructure Backend technologies and services that support educational platforms and tools, like cloud services, data management, and cybersecurity solutions.

📄 Capabilities

The following list of capabilities was used to analyze the potential and competencies of each company. These capabilities are categorized under main categories and sub-categories to ensure a comprehensive evaluation. The list is based on the HolonIQ framework, which can be found at Digital Capability Framework. The image provided below, sourced from HolonIQ, illustrates these capabilities. However, the list below has been reformatted for ease of understanding and to ensure better performance when processed by the ChatGPT algorithm.

The list shown below is a reinterpretation of the previous image to categorize the capabilities and be able to pass them through the gpt chat api, so that it could be interpreted in a more understandable way and in text format.

Main Category Sub-Category Capability
DEMAND AND DISCOVERY PRODUCT STRATEGY MARKET INSIGHTS & TRENDS
DEMAND AND DISCOVERY PRODUCT STRATEGY UNDERSTAND CUSTOMER NEEDS
DEMAND AND DISCOVERY PRODUCT STRATEGY COMPETITORS & ALTERNATES
DEMAND AND DISCOVERY PRODUCT STRATEGY NEW BUSINESS MODELS
DEMAND AND DISCOVERY PRODUCT STRATEGY B2B RECRUITMENT & PARTNERSHIPS
DEMAND AND DISCOVERY MARKETING PROCESSES STUDENT RELATIONSHIP MANAGEMENT (CRM)
DEMAND AND DISCOVERY MARKETING PROCESSES COMMS & CAMPAIGN MANAGEMENT
DEMAND AND DISCOVERY MARKETING PROCESSES MARKETING AUTOMATION
DEMAND AND DISCOVERY MARKETING PROCESSES SOCIAL MEDIA & COMMUNITY MANAGEMENT
DEMAND AND DISCOVERY STUDENT RECRUITMENT RECRUITMENT EVENTS
DEMAND AND DISCOVERY STUDENT RECRUITMENT CHANNEL PARTNERSHIPS
DEMAND AND DISCOVERY STUDENT RECRUITMENT SCHOOLS & COMMUNITY OUTREACH
DEMAND AND DISCOVERY STUDENT RECRUITMENT SCHOLARSHIP PROGRAM
DEMAND AND DISCOVERY ENROLLMENT MANAGEMENT COURSE SELECTION & GUIDANCE
DEMAND AND DISCOVERY ENROLLMENT MANAGEMENT APPLICATION & ADMISSIONS
DEMAND AND DISCOVERY ENROLLMENT MANAGEMENT RECOGNIZING PRIOR LEARNING
DEMAND AND DISCOVERY ENROLLMENT MANAGEMENT TUITION FINANCING
LEARNING DESIGN CURRICULUM DESIGN DIGITAL DESIGN PRINCIPLES
LEARNING DESIGN CURRICULUM DESIGN PROGRAM ARCHITECTURE
LEARNING DESIGN CURRICULUM DESIGN LEARNING ENVIRONMENTS & PLATFORMS
LEARNING DESIGN CURRICULUM DESIGN LEARNING DELIVERY MODELS
LEARNING DESIGN CURRICULUM DESIGN ACCREDITATION
LEARNING DESIGN CURRICULUM DESIGN CURRICULUM QUALITY MANAGEMENT
LEARNING DESIGN DIGITAL CONTENT & COURSEWARE DIGITAL CONTENT CREATION
LEARNING DESIGN DIGITAL CONTENT & COURSEWARE IMMERSION, SIMULATION & LAB
LEARNING DESIGN DIGITAL CONTENT & COURSEWARE OER & CONTENT LICENSING
LEARNING DESIGN DIGITAL CONTENT & COURSEWARE MANAGING INTEGRATED CONTENT
LEARNING DESIGN SUBJECT MATTER EXPERTISE DESIGNING FOR DIGITAL LEARNING
LEARNING DESIGN SUBJECT MATTER EXPERTISE FACULTY EXPERTISE & SPECIALISMS
LEARNING DESIGN SUBJECT MATTER EXPERTISE SOURCING & MANAGING EXPERTISE
LEARNING DESIGN SUBJECT MATTER EXPERTISE SPECIALIST INDUSTRY PARTNERS
LEARNING DESIGN TEACHING STRATEGIES LEARNER NEEDS & ANALYTICS
LEARNING DESIGN TEACHING STRATEGIES DESIGNING ASSESSMENT
LEARNING DESIGN TEACHING STRATEGIES EXPERIENTIAL LEARNING APPROACHES
LEARNING DESIGN TEACHING STRATEGIES DESIGNING GROUP WORK
LEARNING DESIGN TEACHING STRATEGIES PERSONALIZED & ADAPTIVE LEARNING
LEARNER EXPERIENCE ACADEMIC ADMINISTRATION FACULTY PROFESSIONAL DEVELOPMENT
LEARNER EXPERIENCE ACADEMIC ADMINISTRATION FACULTY MANAGEMENT & SUPPORT
LEARNER EXPERIENCE ACADEMIC ADMINISTRATION TIMETABLING & SCHEDULE MANAGEMENT
LEARNER EXPERIENCE ACADEMIC ADMINISTRATION RETENTION & LEARNING SUPPORT
LEARNER EXPERIENCE ACADEMIC ADMINISTRATION REPORTING & REGULATORY COMPLIANCE
LEARNER EXPERIENCE ACADEMIC ADMINISTRATION LIBRARY SERVICES
LEARNER EXPERIENCE LEARNING & ACADEMIC EXPERIENCE STUDENT PORTAL & LMS
LEARNER EXPERIENCE LEARNING & ACADEMIC EXPERIENCE SYNCHRONOUS LEARNING EXPERIENCES
LEARNER EXPERIENCE LEARNING & ACADEMIC EXPERIENCE ASYNCHRONOUS LEARNING EXPERIENCES
LEARNER EXPERIENCE LEARNING & ACADEMIC EXPERIENCE VOICE, CHAT & INTERACTIVE LEARNING
LEARNER EXPERIENCE LEARNING & ACADEMIC EXPERIENCE INDEPENDENT LEARNING RESOURCES
LEARNER EXPERIENCE LEARNING & ACADEMIC EXPERIENCE EXCHANGE PROGRAMS
LEARNER EXPERIENCE STUDENT LIFE ONBOARDING & ORIENTATION
LEARNER EXPERIENCE STUDENT LIFE WELLBEING & MENTAL HEALTH
LEARNER EXPERIENCE STUDENT LIFE STUDENT COMMUNITIES, CLUBS & SOCIETIES
LEARNER EXPERIENCE STUDENT LIFE VOLUNTEERING & STUDENT LEADERSHIP
LEARNER EXPERIENCE STUDENT LIFE STUDENT VOICE & SURVEYS
LEARNER EXPERIENCE STUDENT LIFE GRADUATION & SUCCESS
LEARNER EXPERIENCE ASSESSMENT & VERIFICATION TESTS & EXAMS
LEARNER EXPERIENCE ASSESSMENT & VERIFICATION PORTFOLIOS & PRACTICAL
LEARNER EXPERIENCE ASSESSMENT & VERIFICATION ASSESSMENT FEEDBACK
LEARNER EXPERIENCE ASSESSMENT & VERIFICATION PEER & GROUP ASSESSMENT
LEARNER EXPERIENCE ASSESSMENT & VERIFICATION BADGING & CREDENTIALING
WORK AND LIFELONG LEARNING WORK INTEGRATED LEARNING EMPLOYABILITY SKILLS BUILDING
WORK AND LIFELONG LEARNING WORK INTEGRATED LEARNING WORKPLACE SIMULATION & PROJECTS
WORK AND LIFELONG LEARNING WORK INTEGRATED LEARNING INTERNSHIPS & PLACEMENTS
WORK AND LIFELONG LEARNING WORK INTEGRATED LEARNING STUDENT WORK
WORK AND LIFELONG LEARNING WORK INTEGRATED LEARNING ENTREPRENEURSHIP & STARTUPS
WORK AND LIFELONG LEARNING CAREER PLANNING & PLACEMENT COMPETENCIES & SKILLS EVALUATION
WORK AND LIFELONG LEARNING CAREER PLANNING & PLACEMENT CAREER PLANNING SERVICES
WORK AND LIFELONG LEARNING CAREER PLANNING & PLACEMENT CAREER & RECRUITMENT EVENTS
WORK AND LIFELONG LEARNING CAREER PLANNING & PLACEMENT JOB APPLICATION SUPPORT
WORK AND LIFELONG LEARNING CAREER PLANNING & PLACEMENT JOB FINDING & GRADUATE PLACEMENT
WORK AND LIFELONG LEARNING INDUSTRY & BUSINESS ENGAGEMENT INDUSTRY COLLABS & PARTNERSHIPS
WORK AND LIFELONG LEARNING INDUSTRY & BUSINESS ENGAGEMENT PROFESSIONAL & INDUSTRY ASSOCIATIONS
WORK AND LIFELONG LEARNING INDUSTRY & BUSINESS ENGAGEMENT CUSTOMIZED PROGRAMS (B2B)
WORK AND LIFELONG LEARNING INDUSTRY & BUSINESS ENGAGEMENT EDUCATION AS EMPLOYEE BENEFIT
WORK AND LIFELONG LEARNING ALUMNI & CONTINUING EDUCATION CONTINUING EDUCATION
WORK AND LIFELONG LEARNING ALUMNI & CONTINUING EDUCATION INDUSTRY MENTORING & NETWORKS
WORK AND LIFELONG LEARNING ALUMNI & CONTINUING EDUCATION ALUMNI ENGAGEMENT

Note: If you want to see the technical information of the process, please click the button below.

Document Technical Information

This section explains the technical details of how we collected and processed the data. Here is an easy-to-understand breakdown:

  1. OpenAI API Interaction: The query_openai_api function handles the interaction with the OpenAI API, including retry logic to manage errors.
  2. Role Categorization: The categorize_potential_role function determines the potential role of the company (client, partner, competitor, or provider) based on the API response.
  3. Vertical Categorization: The categorize_potential_vertical function identifies the appropriate vertical for the company from a predefined list.
  4. Capability Extraction and Validation: The extract_and_validate_capabilities function extracts and validates the capabilities of the company, ensuring exactly five unique capabilities.
  5. Company Processing: The process_company_column function processes each company’s data, constructs prompts, queries the API, and compiles the results.
  6. Progress Saving: The save_progress function periodically saves the progress of the analysis to an Excel file.
  7. Main Processing Loop: The script processes each company in the dataset, reads webpage content, handles errors, and saves the final results to an Excel file.

This approach allowed us to gather comprehensive information about each company, which was then analyzed to identify potential business opportunities for EduNext.


import openai
import pandas as pd
import logging
import time
from random import random
import os

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Set your OpenAI API key
openai.api_key = 'ADD THE KEY'

# Load the LinkedIn data
linkedIn_data = pd.read_excel('cleaned_company_information.xlsx', skiprows=range(1, 660))

# Load the capabilities data
capabilities_data = pd.read_excel('Input/capabilities.xlsx')
valid_capabilities = set(capabilities_data['Capability'].str.upper().tolist())

# Load the role definitions
role_definitions = pd.read_excel('Input/company_profile.xlsx')

# Load the vertical data
vertical_data = pd.read_excel('Input/vertical.xlsx')

# Define valid roles and verticals
valid_roles = ['CLIENT', 'PARTNER', 'COMPETITOR', 'PROVIDER']
valid_verticals = vertical_data['Vertical'].tolist()

# Prepare the output DataFrame
output_columns = [
    'Company Name', 'URL (LinkedIn profile link)', 'Website', 'OpenedX Connection',
    'Main Product', 'Competitors', 'Customers', 'B2B or B2C',
    'Number of Customers', 'Country', 'Potential Role', 'Potential Vertical',
    'Capability 1', 'Capability 2', 'Capability 3', 'Capability 4', 'Capability 5'
]

output_data = pd.DataFrame(columns=output_columns)
backup_interval = 10  # Save progress every 10 companies

# Function to query OpenAI API with retry logic
def query_openai_api(prompt, model="gpt-3.5-turbo", max_tokens=512, retries=5):
    for i in range(retries):
        try:
            response = openai.ChatCompletion.create(
                model=model,
                messages=[
                    {"role": "system", "content": "You are an assistant that helps analyze company data."},
                    {"role": "user", "content": prompt}
                ],
                max_tokens=max_tokens,
                temperature=0.7
            )
            return response.choices[0].message['content'].strip()
        except openai.error.OpenAIError as e:
            logging.error(f"APIError encountered: {e}. Retrying {i + 1}/{retries}...")
            time.sleep((2 ** i) + random())
    return "API request failed."

# Function to categorize potential role
def categorize_potential_role(response):
    response = response.lower()
    if "client" in response:
        return "CLIENT"
    if "partner" in response:
        return "PARTNER"
    if "competitor" in response:
        return "COMPETITOR"
    if "provider" in response:
        return "PROVIDER"
    return ""

# Function to categorize potential vertical
def categorize_potential_vertical(response):
    verticals = vertical_data['Vertical'].tolist()
    for vertical in verticals:
        if vertical.lower() in response.lower():
            return vertical
    return ""

# Function to ensure the capabilities list has exactly 5 unique entries
def ensure_unique_capabilities(capabilities):
    unique_capabilities = list(dict.fromkeys(capabilities))  # Remove duplicates while preserving order
    while len(unique_capabilities) < 5:
        remaining_capabilities = list(valid_capabilities - set(unique_capabilities))
        if remaining_capabilities:
            unique_capabilities.append(remaining_capabilities[0])  # Add a remaining valid capability
        else:
            unique_capabilities.append('')  # If no valid capabilities are left, append an empty string
    return unique_capabilities[:5]

# Function to extract and validate capabilities
def extract_and_validate_capabilities(response, company_name):
    capabilities = []
    logging.debug(f"Raw API capabilities response for {company_name}: {response}")
    for line in response.split('\n'):
        for capability in valid_capabilities:
            if capability.lower() in line.lower() and capability not in capabilities:
                capabilities.append(capability)
                break
    validated_capabilities = ensure_unique_capabilities(capabilities)
    logging.debug(f"Validated capabilities for {company_name}: {validated_capabilities}")
    return validated_capabilities

# Function to process each company and column
def process_company_column(company_info, webpage_content):
    base_prompt = f"Company LinkedIn Info:\n{company_info}\n\nCompany Webpage Content:\n{webpage_content}\n\n"
    capabilities_list = ', '.join(valid_capabilities)

    # Constructing more detailed prompts with additional context
    prompts = {
        'OpenedX Connection': base_prompt + "Taking into account the information provided and your knowledge about the company, does the company have any connection or use the Open edX platform?",
        'Main Product': base_prompt + "Taking into account the information provided and your knowledge about the company, what is the main product or service the company offers?",
        'Competitors': base_prompt + "Taking into account the information provided and your knowledge about the company, list three main competitors of the company including their names and websites.",
        'Customers': base_prompt + "Taking into account the information provided and your knowledge about the company, list three main customers of the company.",
        'B2B or B2C': base_prompt + "Taking into account the information provided and your knowledge about the company, is the company primarily focused on B2B (business-to-business) or B2C (business-to-consumer) operations?",
        'Number of Customers': base_prompt + "Taking into account the information provided and your knowledge about the company, estimate the number of customers the company has.",
        'Country': base_prompt + "Taking into account the information provided and your knowledge about the company, in which country does the company primarily operate?",
        'Potential Role': base_prompt + "Taking into account the information provided and your knowledge about the company, determine if the company is a potential client, partner, competitor, or provider for Edunext.",
        'Potential Vertical': base_prompt + "Taking into account the information provided and your knowledge about the company, determine the potential vertical for the company from the following list:\n" +
            '\n'.join([f"{row['Vertical']} - {row['Definition']}" for _, row in vertical_data.iterrows()]) + "\nWhat is the potential vertical of this company?",
        'Capability 1': base_prompt + f"Based on the provided information and any additional research, what is the primary capability of the company? Choose from the following: {capabilities_list}",
        'Capability 2': base_prompt + f"Considering the capabilities already mentioned, what is the second main capability of the company? Choose from the following: {capabilities_list}",
        'Capability 3': base_prompt + f"Considering the capabilities already mentioned, what is the third main capability of the company? Choose from the following: {capabilities_list}",
        'Capability 4': base_prompt + f"Considering the capabilities already mentioned, what is the fourth main capability of the company? Choose from the following: {capabilities_list}",
        'Capability 5': base_prompt + f"Considering the capabilities already mentioned, what is the fifth main capability of the company? Choose from the following: {capabilities_list}"
    }
    
    responses = {key: query_openai_api(prompt) for key, prompt in prompts.items()}

    # Log raw responses for debugging
    logging.debug(f"Raw API response for company {company_info['Company Name']}: {responses}")

    # Extract and validate capabilities
    capabilities = extract_and_validate_capabilities("\n".join([responses.get(f'Capability {i}', '') for i in range(1, 6)]), company_info['Company Name'])

    output_row = {
        'Company Name': company_info['Company Name'],
        'URL (LinkedIn profile link)': company_info['URL'],
        'Website': company_info['Website'],
        'OpenedX Connection': responses['OpenedX Connection'],
        'Main Product': responses['Main Product'],
        'Competitors': responses['Competitors'],
        'Customers': responses['Customers'],
        'B2B or B2C': "B2B" if "B2B" in responses['B2B or B2C'] else "B2C",
        'Number of Customers': responses['Number of Customers'],
        'Country': responses['Country'].strip(),
        'Potential Role': categorize_potential_role(responses['Potential Role']),
        'Potential Vertical': categorize_potential_vertical(responses['Potential Vertical']),
        'Capability 1': capabilities[0],
        'Capability 2': capabilities[1],
        'Capability 3': capabilities[2],
        'Capability 4': capabilities[3],
        'Capability 5': capabilities[4]
    }

    return output_row

# Function to save progress periodically
def save_progress(data, filename='company_analysis_backup.xlsx'):
    df = pd.DataFrame(data, columns=output_columns)
    df.to_excel(filename, index=False)
    logging.info(f"Progress saved to {filename}")

# Process companies sequentially with logging
results = []
for idx, row in linkedIn_data.iterrows():
    company_info = row.to_dict()
    company_name = company_info['Company Name']
    logging.info(f"Processing company: {company_name}")

    try:
        # Try to read the webpage content for the company, handle missing or invalid files
        webpage_content = ""
        filepath = os.path.join('webpages', f"{company_name}.txt")
        try:
            with open(filepath, 'r', encoding='utf-8') as file:
                webpage_content = file.read()
        except (FileNotFoundError, OSError) as e:
            logging.warning(f"Could not read file for {company_name}: {e}")
        
        output_row = process_company_column(company_info, webpage_content)
        results.append(output_row)
        logging.info(f"Completed processing for {company_name} ({idx + 1}/{len(linkedIn_data)})")
        
        # Save progress periodically
        if (idx + 1) % backup_interval == 0:
            save_progress(results)

    except Exception as e:
        logging.error(f"Error processing company {company_name}: {e}")

# Save final results
output_data = pd.DataFrame(results, columns=output_columns)
output_data.to_excel('company_analysis.xlsx', index=False)
logging.info("All companies processed and saved to company_analysis.xlsx")
  

Company Information Output

The fourth output is called “Company Information Output” and contains detailed information about various analyzed companies. Below is an example of the data collected:

Company Name byteXL
URL (LinkedIn profile link) https://in.linkedin.com/company/bytexl
Website https://bytexl.com
OpenedX Connection Based on the information provided, there is no direct mention of the company byteXL using or having a connection with the Open edX platform. The company’s focus seems to be on transforming engineering colleges in India through their own integrated college transformation model and proprietary platform, rather than utilizing external platforms like Open edX.
Main Product Based on the information provided from the company’s LinkedIn profile and webpage content, the main product or service offered by byteXL is an experiential learning online platform for IT programming aspirants. This platform integrates curriculum, content, and practical learning to enhance students’ skills and awareness on employability. The company partners with colleges to transform their teaching methodology and learning pedagogy to increase the employability quotient of their students. The platform includes features such as academic & skilling content, online editor, student reports, dashboards on individual college performance, and coding challenges.
Competitors Based on the information provided about byteXL, three main competitors of the company in the E-learning industry could be:

1. Company Name: UpGrad
Website: https://www.upgrad.com/

2. Company Name: Simplilearn
Website: https://www.simplilearn.com/

3. Company Name: Coursera
Website: https://www.coursera.org/
Customers Based on the information provided, three main customers of byteXL are:

1. Tejashri Student from Malineni Lakshmaiah Women’s Engineering College

2. A. Sanyasirao, Head of Electronics & Communication Engineering Department at Christu Jyothi Institute of Technology & Science

3. Kalyani, a student of Malineni Lakshmaiah Women’s Engineering College

These customers have shared their positive experiences with byteXL’s teaching methodology and its impact on their academic journey and career prospects.
B2B or B2C B2B
Number of Customers Based on the content provided, it seems like byteXL has several customers who are students and colleges across India. The webpage content mentions testimonials and success stories from students and faculty at various engineering colleges who have benefited from byteXL’s training and mentorship programs.

While the exact number of customers is not explicitly mentioned in the data provided, we can infer that the company has a significant customer base across multiple colleges and individual students. The testimonials and success stories highlight the impact byteXL has had on students’ academic journeys and career paths, indicating a wide reach and positive reputation among customers.

Therefore, based on the information available, we can estimate that byteXL likely has hundreds or even thousands of customers consisting of both colleges and individual students who have engaged with their educational programs and services.
Country The company, byteXL, primarily operates in India. This is indicated by the company’s headquarters being in Hyderabad, India, the focus on transforming engineering colleges in India, partnerships with Indian colleges such as Malineni Lakshmaiah Women’s Engineering College and Christu Jyothi Institute of Technology & Science, and the testimonials and success stories from Indian students and educational institutions. The company’s efforts and impact are centered around the Indian education system and industry, supporting the employability and skills development of Indian engineering students.
Country Only India
Potential Role CLIENT
Potential Vertical Higher Education
Capability 1 PERSONALIZED & ADAPTIVE LEARNING
Capability 2 WORKPLACE SIMULATION & PROJECTS
Capability 3 IMMERSION, SIMULATION & LAB
Capability 4 VOLUNTEERING & STUDENT LEADERSHIP
Capability 5 DIGITAL DESIGN PRINCIPLES

For more detailed information, you can refer to the complete dataset here.


| Analysis and Results

With the data resulting from this exercise, the analysis begins with geographical insights. The first part will focus on the number of startups by country, while the second part will analyze the amount of investment. The geographical location in these charts is provided by LinkedIn, indicating where the company is based, and the investment data also comes from LinkedIn information in conjunction with Crunchbase. It is important to note that not all companies have this information available, and not all investments were made in this or the past year. To verify the date of investment by company, this link can be consulted: Investment Date Verification

Number of startups

The first map provides a visual representation of the number of startups by country based on data sourced from LinkedIn. The map uses a gradient color scale to indicate the density of startups in each country, with lighter shades of blue representing fewer startups and darker shades of blue indicating a higher number of startups.


The second map illustrates the number of startups by country, categorized by their estimated size based on employee count from LinkedIn data. Startups are classified into Small, Medium, Large, and Very Large using employee quartiles. The map features interactive tooltips with country names, estimated sizes, and startup counts. Bubble sizes indicate the number of startups, while colors (Yellow for Small, Orange for Medium, Red for Large, Dark Red for Very Large, and Grey for Unknown) represent size categories. This visualization helps identify the distribution of different-sized startups globally, emphasizing regions with diverse entrepreneurial activities.

Estimated Size Definition
Small Employees ≤ 1st quartile
Medium 1st quartile < Employees ≤ 2nd quartile
Large 2nd quartile < Employees ≤ 3rd quartile
Very Large Employees > 3rd quartile
Unknown Missing employee data
Size based on number of employees
Country Small Medium Large Very Large Unknown Total
United States 60 35 0 50 36 181
India 10 4 0 43 3 60
United Kingdom 13 18 0 10 15 56
Unknown 5 8 1 4 9 27
Brazil 5 7 0 4 9 25
Germany 7 7 0 4 3 21
France 5 6 0 3 7 21
Singapore 6 4 0 4 7 21
Canada 4 4 0 3 7 18
Spain 1 1 0 4 9 15
Mexico 4 4 0 5 1 14
Egypt 5 3 0 2 3 13
Nigeria 3 2 0 1 7 13
South Africa 3 3 0 1 6 13
Sweden 1 3 0 2 6 12
Finland 0 4 1 1 5 11
Italy 4 4 0 1 0 9
Indonesia 0 1 0 6 1 8
South Korea 2 3 1 1 1 8
Norway 0 3 1 0 4 8
Vietnam 3 0 0 4 1 8
United Arab Emirates 3 2 1 1 0 7
Colombia 2 3 0 1 1 7
Switzerland 2 3 0 0 1 6
Kenya 1 1 0 0 4 6
Netherlands 3 2 0 0 1 6
Saudi Arabia 1 1 0 3 1 6
Bangladesh 0 2 0 3 0 5
Ireland 0 1 0 2 2 5
Israel 0 2 0 0 3 5
Argentina 0 2 0 2 0 4
Chile 0 1 0 0 3 4
China 0 2 0 1 1 4
Hungary 0 1 0 1 2 4
Japan 1 1 0 0 2 4
Pakistan 4 0 0 0 0 4
Poland 1 2 0 1 0 4
Austria 0 1 0 1 1 3
Denmark 0 0 0 2 1 3
Iceland 0 0 0 0 3 3
Jordan 1 0 0 2 0 3
Kazakhstan 0 2 0 0 1 3
Portugal 1 1 0 0 1 3
Belgium 1 0 0 0 1 2
Cameroon 0 1 0 0 1 2
Estonia 0 1 0 0 1 2
Kuwait 1 0 0 0 1 2
Lithuania 0 0 0 0 2 2
Malaysia 2 0 0 0 0 2
Peru 0 1 0 0 1 2
Romania 0 2 0 0 0 2
Tunisia 0 0 0 0 2 2
Taiwan 0 2 0 0 0 2
Tanzania 0 1 0 0 1 2
Ukraine 0 0 0 1 1 2
Venezuela 0 1 0 0 1 2
Australia 1 0 0 0 0 1
Bahrain 0 1 0 0 0 1
Congo - Kinshasa 1 0 0 0 0 1
Costa Rica 0 0 0 0 1 1
Czechia 0 0 0 0 1 1
Dominican Republic 0 0 0 0 1 1
Ecuador 0 1 0 0 0 1
Ghana 0 0 0 0 1 1
Greece 0 1 0 0 0 1
Lebanon 0 1 0 0 0 1
Morocco 0 1 0 0 0 1
Madagascar 1 0 0 0 0 1
Panama 0 0 0 0 1 1
Philippines 0 1 0 0 0 1
Thailand 1 0 0 0 0 1
Uzbekistan 1 0 0 0 0 1
Amount of investments

The chart is built using data from LinkedIn, listing countries where offices are based and the last round of investment amounts. This chart shows the sum of all money invested in startups by country, providing a visual representation of global investment distribution. Each country is colored based on the total investment amount, with interactive tooltips offering detailed information about the investments.


This interactive map visualizes the median investment in startups by country based on data from LinkedIn. The map uses the median to avoid the impact of outliers. China is highlighted in violet because it is an outlier, with only one company providing information on the last investment round, which was exceptionally large. The gradient color scale from red to yellow to green indicates low to high median investments in millions of dollars, allowing for a clear comparison of investment levels across countries.


This map illustrates the number of startups by country, categorized by their estimated investment size based on data from LinkedIn. Investment sizes are classified into Small, Medium, Large, and Very Large using quartiles of the last round investment amounts. The map features interactive tooltips with country names and estimated investment sizes. Bubble sizes indicate the number of startups, while colors (Light Green for Small, Green for Medium, Dark Green for Large, Forest Green for Very Large, and Grey for Unknown) represent investment size categories. This visualization helps identify the distribution of startups with varying investment sizes globally, highlighting regions with diverse levels of startup funding.

Estimated Investment Size Definition
Small Investment ≤ 1st quartile
Medium 1st quartile < Investment ≤ 2nd quartile
Large 2nd quartile < Investment ≤ 3rd quartile
Very Large Investment > 3rd quartile
Unknown Missing investment data

Size based on last round amount
Country Small Medium Large Very Large Unknown Total
United States 23 14 32 0 50 119
India 11 9 8 0 10 38
United Kingdom 9 8 13 0 6 36
Germany 5 2 3 0 4 14
France 4 3 5 0 1 13
Canada 4 1 1 0 6 12
Unknown 4 1 3 0 2 10
Spain 2 4 0 0 3 9
Brazil 4 1 1 1 0 7
Singapore 2 1 1 0 3 7
South Korea 2 0 4 0 0 6
Mexico 1 1 2 0 2 6
Egypt 0 4 1 0 0 5
Israel 1 2 2 0 0 5
Italy 2 1 2 0 0 5
Sweden 1 1 2 0 1 5
Colombia 1 2 1 0 0 4
Nigeria 1 3 0 0 0 4
Netherlands 2 2 0 0 0 4
Vietnam 0 0 4 0 0 4
Denmark 0 1 0 0 2 3
Finland 1 1 0 0 1 3
Hungary 0 3 0 0 0 3
Indonesia 1 0 1 0 1 3
Japan 0 1 1 0 1 3
Norway 1 1 1 0 0 3
Portugal 2 1 0 0 0 3
South Africa 0 3 0 0 0 3
United Arab Emirates 1 1 0 0 0 2
Bangladesh 0 1 1 0 0 2
Belgium 0 0 2 0 0 2
Chile 0 2 0 0 0 2
Estonia 0 1 1 0 0 2
Ireland 0 2 0 0 0 2
Kenya 0 2 0 0 0 2
Malaysia 1 1 0 0 0 2
Pakistan 2 0 0 0 0 2
Poland 2 0 0 0 0 2
Romania 0 2 0 0 0 2
Saudi Arabia 1 1 0 0 0 2
Tunisia 0 2 0 0 0 2
Congo - Kinshasa 0 1 0 0 0 1
Switzerland 0 0 1 0 0 1
China 0 0 0 0 1 1
Costa Rica 0 1 0 0 0 1
Czechia 0 1 0 0 0 1
Ghana 0 1 0 0 0 1
Iceland 0 1 0 0 0 1
Jordan 0 0 0 0 1 1
Kuwait 1 0 0 0 0 1
Kazakhstan 1 0 0 0 0 1
Madagascar 0 1 0 0 0 1
Peru 0 1 0 0 0 1
Thailand 1 0 0 0 0 1
Taiwan 0 0 1 0 0 1
Uzbekistan 0 1 0 0 0 1
Venezuela 0 1 0 0 0 1


Time analysis

📄 By country

Number of startups

This stacked bar chart illustrates the number of companies receiving investments each year, broken down by country. Each bar represents a year, and the different colored segments within each bar denote the number of companies from various countries that received investments in that particular year. The legend on the right-hand side identifies the countries corresponding to each color. This visualization helps track investment trends over time and highlights which countries have seen increasing or decreasing investment activity in the EdTech sector.

Amount of investments

This stacked bar chart illustrates the total amount of investment received by companies each year, broken down by country. Each bar represents a year, and the different colored segments within each bar denote the total investment received by companies from various countries in that particular year. The legend on the right-hand side identifies the countries corresponding to each color. This visualization helps track investment trends over time and highlights which countries have received the most significant financial investments in the EdTech sector.


📄 By Vertical

Number of startups

This stacked bar chart displays the number of companies by year, categorized by verticals. Each bar represents a year, and the different colored segments within each bar denote the number of companies within specific verticals for that particular year. The legend on the right-hand side identifies the verticals corresponding to each color. This visualization helps track trends over time and highlights the growth or decline of companies within each vertical in the EdTech sector.

Note: The verticals were defined and selected by the EduNext team, and the detailed definitions of these verticals are provided earlier in the report.

Amount of investment

This stacked bar chart displays the total amount of investment by year, categorized by verticals. Each bar represents a year, and the different colored segments within each bar denote the total investment received by companies within specific verticals for that particular year. The legend on the right-hand side identifies the verticals corresponding to each color. This visualization helps track investment trends over time and highlights the growth or decline of investment within each vertical in the EdTech sector.

Note: The verticals were defined and selected by the EduNext team, and the detailed definitions of these verticals are provided earlier in the report.


📄 By B2B or B2C

Number of startups

This stacked bar chart illustrates the number of companies making investments each year, categorized by their business model: B2B (Business to Business) and B2C (Business to Consumer). Each bar represents a year, and the different colored segments within each bar denote the number of companies in each category for that particular year. The legend on the right-hand side identifies the categories corresponding to each color. This visualization helps track investment activity trends over time, highlighting the dynamics between B2B and B2C companies in the EdTech sector.

Amount of investment

This stacked bar chart illustrates the total amount of investment by year, categorized by business model: B2B (Business to Business) and B2C (Business to Consumer). Each bar represents a year, and the different colored segments within each bar denote the total investment received by companies in each category for that particular year. The legend on the right-hand side identifies the categories corresponding to each color. This visualization helps track investment trends over time, highlighting the dynamics between B2B and B2C companies in the EdTech sector.

📄 By Industry

Number of startups

This treemap chart displays the distribution of companies by industry, based on the information available on their LinkedIn profiles. Each rectangle represents an industry, with the size of the rectangle proportional to the number of companies in that industry. The color and size of the rectangles provide a quick visual reference for the prevalence of different industries among the analyzed companies. This visualization helps to identify which industries are most common in the EdTech sector and how companies are distributed across various fields.

Note: According to LinkedIn, EduNext operates in the E-learning industry. This classification is based on the information available on their LinkedIn profile.

Amount of investment

This treemap chart displays the distribution of investment by industry, based on the information available on the companies’ LinkedIn profiles. Each rectangle represents an industry, with the size of the rectangle proportional to the total investment received by companies in that industry. The color and size of the rectangles provide a quick visual reference for the allocation of investments across different industries among the analyzed companies. This visualization helps to identify which industries attract the most investment in the EdTech sector.

Note: According to LinkedIn, EduNext operates in the E-learning industry. This classification is based on the information available on their LinkedIn profile.

📄 By Role

Role

This donut chart visualizes the distribution of potential roles identified for companies, based on their relevance to EduNext. The chart segments represent the percentage of companies classified as partners, clients, competitors, and providers. The legend indicates the role corresponding to each color. This visualization helps to understand the predominant roles among the analyzed companies.

Note: The EduNext team chose the roles and definitions for each role.

Potential Clients

This table lists the companies identified as potential clients for EduNext. The columns include company name, size, country, type, number of employees, industry, business model (B2B or B2C), potential vertical, and the amount of the last funding round as reported by Crunchbase. This table provides detailed information on potential clients, aiding strategic decision-making.

Note: The EduNext team chose the roles and definitions for each role.

Potential partners
This table lists the companies identified as potential partners for EduNext. The columns include company name, size, country, type, number of employees, industry, business model (B2B or B2C), potential vertical, and the amount of the last funding round as reported by Crunchbase. This table provides detailed information on potential partners, aiding strategic collaborations.

Note: The EduNext team chose the roles and definitions for each role.

Potential competitors

This table lists the companies identified as potential competitors for EduNext. The columns include company name, size, country, type, number of employees, industry, business model (B2B or B2C), potential vertical, and the amount of the last funding round as reported by Crunchbase. This table provides detailed information on potential competitors, aiding competitive analysis.

Note: The EduNext team chose the roles and definitions for each role.

Potential providers

This table lists the companies identified as potential providers for EduNext. The columns include company name, size, country, type, number of employees, industry, business model (B2B or B2C), potential vertical, and the amount of the last funding round as reported by Crunchbase. This table provides detailed information on potential providers, aiding strategic sourcing decisions.

Ranking of startups

This horizontal bar chart ranks companies by the amount of their most recent investment, as reported by Crunchbase. The length of each bar represents the size of the investment, and the company names are listed along the vertical axis. This visualization helps to quickly identify the most heavily funded companies within the EdTech sector.

Note: The EduNext team chose the roles and definitions for each role.


📄 By Capabilities

Number of companies

This table lists various capabilities identified in the analyzed companies, showing how many companies possess each capability. The columns represent different capabilities, while the rows display the count of companies that possess each capability. The “Total” column sums up the number of companies for each capability. This table provides a detailed overview of the distribution of specific capabilities across companies, aiding in understanding the strengths and focus areas within the EdTech sector.

Note: The capabilities were taken from the HolonIQ source and are explained in more detail earlier in the report.

Number of Appearances
Capability Capability 1 Capability 2 Capability 3 Capability 4 Capability 5 Total
PERSONALIZED & ADAPTIVE LEARNING 375 178 88 46 14 701
IMMERSION, SIMULATION & LAB 5 198 225 149 67 644
VOLUNTEERING & STUDENT LEADERSHIP 1 2 205 223 147 578
DIGITAL DESIGN PRINCIPLES 8 37 17 209 213 484
ACCREDITATION 4 13 7 3 214 241
DIGITAL CONTENT CREATION 49 27 10 0 2 88
CUSTOMIZED PROGRAMS (B2B) 11 22 17 5 0 55
ASSESSMENT FEEDBACK 6 37 7 3 0 53
JOB APPLICATION SUPPORT 10 24 11 4 0 49
BADGING & CREDENTIALING 3 7 14 12 8 44
DESIGNING ASSESSMENT 17 15 7 2 0 41
PROFESSIONAL & INDUSTRY ASSOCIATIONS 3 13 8 4 8 36
FACULTY MANAGEMENT & SUPPORT 0 0 8 14 12 34
REPORTING & REGULATORY COMPLIANCE 3 17 8 4 0 32
JOB FINDING & GRADUATE PLACEMENT 11 15 3 1 1 31
STUDENT COMMUNITIES, CLUBS & SOCIETIES 8 8 13 1 0 30
TESTS & EXAMS 2 2 2 7 14 27
TUITION FINANCING 26 0 1 0 0 27
CAREER PLANNING SERVICES 9 9 4 2 2 26
WELLBEING & MENTAL HEALTH 21 5 0 0 0 26
COURSE SELECTION & GUIDANCE 7 8 3 4 0 22
APPLICATION & ADMISSIONS 17 1 0 1 0 19
STUDENT PORTAL & LMS 4 6 3 3 0 16
UNDERSTAND CUSTOMER NEEDS 3 6 6 0 0 15
WORKPLACE SIMULATION & PROJECTS 10 4 1 0 0 15
EXPERIENTIAL LEARNING APPROACHES 5 4 2 1 0 12
STUDENT RELATIONSHIP MANAGEMENT (CRM) 5 4 1 1 0 11
RETENTION & LEARNING SUPPORT 3 4 3 0 0 10
COMPETITORS & ALTERNATES 1 3 3 2 0 9
PEER & GROUP ASSESSMENT 3 5 1 0 0 9
PROGRAM ARCHITECTURE 2 1 6 0 0 9
ASYNCH. LEARNING EXPERIENCES 1 3 2 0 1 7
EDUCATION AS EMPLOYEE BENEFIT 6 1 0 0 0 7
ENTREPRENEURSHIP & STARTUPS 6 1 0 0 0 7
INDUSTRY MENTORING & NETWORKS 6 1 0 0 0 7
LEARNING DELIVERY MODELS 3 4 0 0 0 7
ONBOARDING & ORIENTATION 7 0 0 0 0 7
VOICE, CHAT & INTERACTIVE LEARNING 6 1 0 0 0 7
COMPETENCIES & SKILLS EVALUATION 6 0 0 0 0 6
EMPLOYABILITY SKILLS BUILDING 3 0 3 0 0 6
INDUSTRY COLLABS & PARTNERSHIPS 2 3 0 0 0 5
LEARNER NEEDS & ANALYTICS 0 4 1 0 0 5
TIMETABLING & SCHEDULE MANAGEMENT 3 1 1 0 0 5
B2B RECRUITMENT & PARTNERSHIPS 4 0 0 0 0 4
SCHOOLS & COMMUNITY OUTREACH 2 1 1 0 0 4
SOURCING & MANAGING EXPERTISE 2 1 1 0 0 4
STUDENT VOICE & SURVEYS 0 1 3 0 0 4
DESIGNING GROUP WORK 0 1 1 1 0 3
GRADUATION & SUCCESS 1 0 1 1 0 3
INTERNSHIPS & PLACEMENTS 1 1 1 0 0 3
OER & CONTENT LICENSING 3 0 0 0 0 3
SYNCHRONOUS LEARNING EXPERIENCES 2 1 0 0 0 3
EXCHANGE PROGRAMS 0 0 2 0 0 2
LEARNING ENVIRONMENTS & PLATFORMS 2 0 0 0 0 2
LIBRARY SERVICES 1 1 0 0 0 2
RECRUITMENT EVENTS 1 1 0 0 0 2
SCHOLARSHIP PROGRAM 1 0 1 0 0 2
DESIGNING FOR DIGITAL LEARNING 0 1 0 0 0 1
FACULTY EXPERTISE & SPECIALISMS 0 0 1 0 0 1
MANAGING INTEGRATED CONTENT 1 0 0 0 0 1
MARKETING AUTOMATION 1 0 0 0 0 1
Score

Score by Capabilities

This table assigns a weighted importance to each capability based on its position to measure and rank the most important capabilities effectively. The weights are assigned as follows:

  • Capability 1 is multiplied by 5
  • Capability 2 is multiplied by 4
  • Capability 3 is multiplied by 3
  • Capability 4 is multiplied by 2
  • Capability 5 is multiplied by 1

The total score for each capability is then calculated by summing these weighted values, ensuring that higher importance is given to capabilities listed earlier. This method reflects the significance of each capability in the overall ranking. The table includes columns for each capability’s score and the total score for each row.

Note: The capabilities were taken from the HolonIQ source and are explained in more detail earlier in the report.

Score
Capability Capability 1 Capability 2 Capability 3 Capability 4 Capability 5 Score
PERSONALIZED & ADAPTIVE LEARNING 375 178 88 46 14 2957
IMMERSION, SIMULATION & LAB 5 198 225 149 67 1857
VOLUNTEERING & STUDENT LEADERSHIP 1 2 205 223 147 1221
DIGITAL DESIGN PRINCIPLES 8 37 17 209 213 870
DIGITAL CONTENT CREATION 49 27 10 0 2 385
ACCREDITATION 4 13 7 3 214 313
ASSESSMENT FEEDBACK 6 37 7 3 0 205
CUSTOMIZED PROGRAMS (B2B) 11 22 17 5 0 204
JOB APPLICATION SUPPORT 10 24 11 4 0 187
DESIGNING ASSESSMENT 17 15 7 2 0 170
TUITION FINANCING 26 0 1 0 0 133
JOB FINDING & GRADUATE PLACEMENT 11 15 3 1 1 127
WELLBEING & MENTAL HEALTH 21 5 0 0 0 125
BADGING & CREDENTIALING 3 7 14 12 8 117
REPORTING & REGULATORY COMPLIANCE 3 17 8 4 0 115
STUDENT COMMUNITIES, CLUBS & SOCIETIES 8 8 13 1 0 113
PROFESSIONAL & INDUSTRY ASSOCIATIONS 3 13 8 4 8 107
CAREER PLANNING SERVICES 9 9 4 2 2 99
APPLICATION & ADMISSIONS 17 1 0 1 0 91
COURSE SELECTION & GUIDANCE 7 8 3 4 0 84
WORKPLACE SIMULATION & PROJECTS 10 4 1 0 0 69
FACULTY MANAGEMENT & SUPPORT 0 0 8 14 12 64
STUDENT PORTAL & LMS 4 6 3 3 0 59
UNDERSTAND CUSTOMER NEEDS 3 6 6 0 0 57
TESTS & EXAMS 2 2 2 7 14 52
EXPERIENTIAL LEARNING APPROACHES 5 4 2 1 0 49
STUDENT RELATIONSHIP MANAGEMENT (CRM) 5 4 1 1 0 46
RETENTION & LEARNING SUPPORT 3 4 3 0 0 40
PEER & GROUP ASSESSMENT 3 5 1 0 0 38
ONBOARDING & ORIENTATION 7 0 0 0 0 35
EDUCATION AS EMPLOYEE BENEFIT 6 1 0 0 0 34
ENTREPRENEURSHIP & STARTUPS 6 1 0 0 0 34
INDUSTRY MENTORING & NETWORKS 6 1 0 0 0 34
VOICE, CHAT & INTERACTIVE LEARNING 6 1 0 0 0 34
PROGRAM ARCHITECTURE 2 1 6 0 0 32
LEARNING DELIVERY MODELS 3 4 0 0 0 31
COMPETENCIES & SKILLS EVALUATION 6 0 0 0 0 30
COMPETITORS & ALTERNATES 1 3 3 2 0 30
ASYNCH. LEARNING EXPERIENCES 1 3 2 0 1 24
EMPLOYABILITY SKILLS BUILDING 3 0 3 0 0 24
INDUSTRY COLLABS & PARTNERSHIPS 2 3 0 0 0 22
TIMETABLING & SCHEDULE MANAGEMENT 3 1 1 0 0 22
B2B RECRUITMENT & PARTNERSHIPS 4 0 0 0 0 20
LEARNER NEEDS & ANALYTICS 0 4 1 0 0 19
SCHOOLS & COMMUNITY OUTREACH 2 1 1 0 0 17
SOURCING & MANAGING EXPERTISE 2 1 1 0 0 17
OER & CONTENT LICENSING 3 0 0 0 0 15
SYNCHRONOUS LEARNING EXPERIENCES 2 1 0 0 0 14
STUDENT VOICE & SURVEYS 0 1 3 0 0 13
INTERNSHIPS & PLACEMENTS 1 1 1 0 0 12
GRADUATION & SUCCESS 1 0 1 1 0 10
LEARNING ENVIRONMENTS & PLATFORMS 2 0 0 0 0 10
DESIGNING GROUP WORK 0 1 1 1 0 9
LIBRARY SERVICES 1 1 0 0 0 9
RECRUITMENT EVENTS 1 1 0 0 0 9
SCHOLARSHIP PROGRAM 1 0 1 0 0 8
EXCHANGE PROGRAMS 0 0 2 0 0 6
MANAGING INTEGRATED CONTENT 1 0 0 0 0 5
MARKETING AUTOMATION 1 0 0 0 0 5
DESIGNING FOR DIGITAL LEARNING 0 1 0 0 0 4
FACULTY EXPERTISE & SPECIALISMS 0 0 1 0 0 3

| Opportunities for Improvement

The analysis of EdTech companies has revealed several areas for improvement:

  • Data Refinement: Enhancing the data regarding the relationship of these companies with the Open edX platform is crucial. Clear and accurate information in this aspect will lead to more precise insights.
  • Definition of eduNEXT: The definition and concept of “eduNEXT” needs to be well-defined and incorporated into the model training. This will ensure a better understanding and identification of the company’s role in the EdTech ecosystem.
  • Client vs. Partner Roles: Addressing the ambiguity between ‘client’ and ‘partner’ roles is essential. Leveraging advanced models such as GPT-4-O can provide deeper contextual analysis and differentiation.
  • Role Classification: When asking for roles using the ChatGPT algorithm, including a “none of the above” category would have been beneficial. This approach would prevent the forceful assignment of a nonexistent relationship.
  • Investment Data Source: The investment information used in the analysis was based solely on data retrieved from Crunchbase via LinkedIn. To enhance the comprehensiveness of future analyses, considering additional sources of investment data, especially for companies in Asia where LinkedIn usage is less prevalent, would be beneficial.

The project shows significant potential for further exploration and development. As it is still in progress, any feedback and recommendations are highly appreciated to refine and improve the results.


Go to Top

