Welcome to ElectraGrid, a national power company responsible for supplying electricity to thousands of homes and businesses spread across a huge region. You have just joined the data science team, and today you’ve been assigned your first technical challenge. ElectraGrid tracks the locations of all its customers, and it also monitors the position of each power hub. Your role is to help the engineering department understand how far customers are from the hub, whether any of them fall outside the guaranteed service area, and how the overall spread of customers looks from a distance.
The only rule: you must use NumPy and avoid writing loops wherever possible.
This is how real data science teams work—fast, efficient, and vectorised.
Begin by importing NumPy and setting the location of the company’s main power hub:
To simulate the thousands of customers in the region, generate one thousand random 2D coordinates:
The expression customers = np.random.randn(1000, 2) * 20
creates 1,000 customer locations by first generating random
2-dimensional points from a standard normal distribution, where most
values lie close to zero, and then multiplying them by 20 to spread them
out across a much larger, more realistic area. Without this scaling,
almost all customers would cluster tightly around the hub at (0, 0),
making the distances artificially small and the analysis uninteresting.
By multiplying the coordinates by 20, we effectively “zoom out” the map
so the customers occupy a wider region, allowing for meaningful
differences in distance, clearer patterns, and a more believable
scenario for analysing coverage and service range.
You now have everything you need to begin the analysis.
Your first responsibility is to understand the geography. ElectraGrid needs to know how far each customer lies from the main hub at (0, 0). Using NumPy, compute the Euclidean distance from every customer to the hub.
Once you’ve done this, identify:
This helps the planning team understand how stretched the current grid might be.
ElectraGrid promises reliable service for anyone living within 50 units of the main hub. Your job now is to identify the customers who fall outside this guaranteed range. Use NumPy’s Boolean masking (no loops) to:
These are the customers most likely to experience issues—and who may need additional support.
Engineers are preparing routine maintenance near the main hub and
want to know which customers lie closest. Use np.argsort()
to sort customers by their distance from the hub, and then:
This gives the field team a quick snapshot of the immediate surrounding area.
If you have time, step back and consider the broader picture. Using the distance values you calculated earlier, work out some basic statistics:
If you want to go further, create a histogram to visualise the distribution. Finish with one sentence describing what this spread tells you about ElectraGrid’s customers.
Sometimes ElectraGrid needs to understand not only distance, but direction—for example, when planning expansions. Take customer 0 and customer 1. Treat both as direction vectors starting from the hub.
Write one sentence explaining what a high cosine similarity would mean here. (Hint: think about the paths from the hub to each customer.)
If you want to visualise your work, create a scatter plot:
This offers a helpful “map” of how ElectraGrid’s customers are arranged.
ElectraGrid has plans to expand by adding a second hub at (20, 10). Your final challenge is to decide which hub each customer should belong to.
Calculate how far each customer is from both hubs and assign them to whichever one is closer. If you choose to plot this, colour customers by the hub they end up assigned to.
```