The difficult part of this project was not choosing a cloud service. It was translating an automotive-market question into a data model and then making every transformation between source files and analysis outputs explicit.
01The market question was wider than vehicle sales
The starting point was a third-party vehicle database containing global sales volumes. That source answered an important question—how many vehicles were sold—but it did not describe the powertrain components inside each model, the suppliers behind those components, their indicative prices or the share of the market that might be addressable to competing suppliers.
The project therefore had to combine commercial, technical and organisational views of the same market. Total and addressable volumes needed to be analysed by electrification type and region. Component turnover had to be connected to product type, vehicle architecture and supplier position. None of those questions can be answered reliably if vehicle, component, supplier and price data remain separate spreadsheets.
02Model the powertrain before choosing the platform
Before the cloud architecture, the project defined a four-level hierarchy for powertrain components. Level 1 identifies the voltage type. Level 2 identifies the product type. Level 3 refines the product subtype. Level 4 records the product rank or position in the powertrain system—a distinction that can affect pricing.
That hierarchy is the conceptual backbone of the database. It gives each component a place in a shared model, so later joins do not depend only on column names or ad-hoc labels. In data-engineering terms, the project first established the domain model and only then selected the services that would implement it.
Low- and high-voltage families.
The main component family.
A more precise technical classification.
Its position in the powertrain system and pricing logic.
03Keep domain expertise close to the data
Four business-maintained elements capture knowledge that a generic vehicle-sales source does not contain. The Value List describes possible vehicle architectures and the powertrain components associated with each architecture. The Power Schedule contains formulas for calculating component power. Supplier Dictionaries map suppliers and addressable competitors to models and components. The Price List connects component prices to power and voltage.
The design keeps these inputs in Google Sheets so marketing specialists can maintain them in a familiar interface. Updated files are copied to Cloud Storage through an API, while the third-party vehicle database is uploaded separately. This arrangement gives business experts ownership of domain inputs without asking them to edit notebook code or BigQuery tables directly.
Vehicle architectures and the powertrain components expected in each one.
Formulas used to calculate component power for price and sizing logic.
Supplier mappings and addressable competitors by model and component.
Prices connected to component power and system voltage.
Three data contracts behind the workflow
04Make the transformation chain explicit
The repository exposes one master notebook and nine processing notebooks. The master installs the required packages, authenticates with Google Cloud, retrieves notebook inputs from Cloud Storage and uses Papermill to execute the remaining notebooks in sequence. That sequence makes dependencies visible instead of hiding them inside one monolithic script.
The processing chain cleans incoming files; merges the vehicle database with the Value List; adds the Power Schedule; applies separate supplier-mapping logic for electric motors and on-board chargers; transforms the merged structure; reshapes low- and high-voltage price data; and finally prepares market-share outputs. Intermediate artefacts are written back to cloud storage or queried through BigQuery as the workflow advances.
00 Master.ipynb → 01 Cleaning.ipynb → 02 PIHS VL.ipynb → 03 PIHS VL PS.ipynb → 04.1 / 04.2 supplier mapping → 05 Transformation.ipynb → 06 / 07 prices → 08 MARKETSHARE.ipynbThe filenames and execution order are preserved from the supplied repository.
05Use BigQuery as the analysis layer
After the notebook transformations, prepared CSV outputs are loaded into BigQuery. SQL queries add price information and supplier addressability to the transformed vehicle-component structure. The resulting database can then be connected to Google Sheets and Looker Studio for analysis and reporting.
This separation matters. Google Sheets remains the editable business-input layer. Cloud Storage holds controlled files. Colab Enterprise notebooks implement transformation logic. BigQuery becomes the structured analytical layer. Looker Studio provides a consumption interface. Each service has a distinct responsibility rather than becoming another place where the same logic is duplicated.
Questions made possible by the combined layer
Compare total and addressable component volumes and turnover.
Explore how component demand and supplier positions vary by region.
Analyse product and supplier distributions across vehicle architectures.
06What the public repository demonstrates—and what it does not
The public repository is strong evidence of the engineering workflow: notebook orchestration, Google Cloud authentication, storage access, pandas transformations and BigQuery queries are all inspectable. The presentation also records the data model, planned automation and business-analysis objectives.
It does not publish the commercial source datasets, supplier mappings, prices or operational dashboards. Nor does the public record independently establish production-service levels or business outcomes. This article therefore documents the architecture and implementation approach without turning a student project into an unsupported claim of live industrial deployment.
What is evidenced
- The repository contains the notebook workflow and cloud integrations.
- The presentation documents the data hierarchy and business inputs.
- The code shows sequential cleaning, merging, transformation and BigQuery operations.
What remains outside the public record
- The commercial input datasets are not published.
- Business results and dashboards cannot be independently reproduced from the repository alone.
- The public material does not establish production-service or security certification.
07Data-engineering lessons from the project
The project illustrates a recurring principle: cloud architecture is most useful when it clarifies responsibility. The business model belongs in explicit data structures. Domain experts need controlled ways to maintain inputs. Transformation stages need an observable order. Analytical outputs need a stable layer that can serve more than one interface.
It also shows why “automation” is not one feature. Copying updated files, selecting the latest source, executing notebooks, loading tables and refreshing analysis are separate transitions. Treating them separately makes failures easier to locate and the workflow easier to evolve.
A shared component hierarchy prevents cloud services from becoming a substitute for data modelling.
Business teams can maintain knowledge without directly modifying analytical tables.
One master notebook makes processing order and dependencies explicit.
Open code can demonstrate architecture without exposing proprietary data or overstating deployment status.
08Project record
The repository contains the master notebook, nine processing notebooks, the project README and the high-level presentation. Together they provide a compact public record of the project’s architecture and transformation sequence.
Nicolas Len is credited as the project author. His public profiles and repository are linked below for readers who want to inspect the notebooks or follow his subsequent work.
Project resources

Nicolas Len
Nicolas Len developed this project while studying Data Engineering at DSTI. It combines automotive-market modelling, cloud data architecture, Python notebooks, BigQuery and business-facing analytics.
Source and editorial note: Article developed from the supplied project presentation, README and repository snapshot. Product names, notebook filenames and technical identifiers are preserved. The article describes the public architecture and implementation evidence without disclosing or inferring proprietary source data.