Building a cloud database for electric-vehicle powertrain analysis

The difficult part of this project was not choosing a cloud service. It was translating an automotive-market question into a data model and then making every transformation between source files and analysis outputs explicit.

The useful result is not simply “a database in the cloud”. It is a repeatable chain of data contracts, transformations and analysis layers that keeps business expertise connected to the engineering workflow.

01The market question was wider than vehicle sales

The starting point was a third-party vehicle database containing global sales volumes. That source answered an important question—how many vehicles were sold—but it did not describe the powertrain components inside each model, the suppliers behind those components, their indicative prices or the share of the market that might be addressable to competing suppliers.

The project therefore had to combine commercial, technical and organisational views of the same market. Total and addressable volumes needed to be analysed by electrification type and region. Component turnover had to be connected to product type, vehicle architecture and supplier position. None of those questions can be answered reliably if vehicle, component, supplier and price data remain separate spreadsheets.

02Model the powertrain before choosing the platform

Before the cloud architecture, the project defined a four-level hierarchy for powertrain components. Level 1 identifies the voltage type. Level 2 identifies the product type. Level 3 refines the product subtype. Level 4 records the product rank or position in the powertrain system—a distinction that can affect pricing.

That hierarchy is the conceptual backbone of the database. It gives each component a place in a shared model, so later joins do not depend only on column names or ad-hoc labels. In data-engineering terms, the project first established the domain model and only then selected the services that would implement it.

Level 1Voltage type

Low- and high-voltage families.

Level 2Product type

The main component family.

Level 3Product subtype

A more precise technical classification.

Level 4Product rank

Its position in the powertrain system and pricing logic.

03Keep domain expertise close to the data

Four business-maintained elements capture knowledge that a generic vehicle-sales source does not contain. The Value List describes possible vehicle architectures and the powertrain components associated with each architecture. The Power Schedule contains formulas for calculating component power. Supplier Dictionaries map suppliers and addressable competitors to models and components. The Price List connects component prices to power and voltage.

The design keeps these inputs in Google Sheets so marketing specialists can maintain them in a familiar interface. Updated files are copied to Cloud Storage through an API, while the third-party vehicle database is uploaded separately. This arrangement gives business experts ownership of domain inputs without asking them to edit notebook code or BigQuery tables directly.

VLValue List

Vehicle architectures and the powertrain components expected in each one.

PSPower Schedule

Formulas used to calculate component power for price and sizing logic.

SDSupplier Dictionary

Supplier mappings and addressable competitors by model and component.

PLPrice List

Prices connected to component power and system voltage.

Three data contracts behind the workflow

Editable by domain expertsGoogle Sheets keeps business-maintained inputs accessible.

Controlled transferCloud Storage separates working inputs from analytical tables.

Reproducible transformationNotebook order and code make each processing step inspectable.

04Make the transformation chain explicit

The repository exposes one master notebook and nine processing notebooks. The master installs the required packages, authenticates with Google Cloud, retrieves notebook inputs from Cloud Storage and uses Papermill to execute the remaining notebooks in sequence. That sequence makes dependencies visible instead of hiding them inside one monolithic script.

The processing chain cleans incoming files; merges the vehicle database with the Value List; adds the Power Schedule; applies separate supplier-mapping logic for electric motors and on-board chargers; transforms the merged structure; reshapes low- and high-voltage price data; and finally prepares market-share outputs. Intermediate artefacts are written back to cloud storage or queried through BigQuery as the workflow advances.

Master orchestration

00 Master.ipynb → 01 Cleaning.ipynb → 02 PIHS VL.ipynb → 03 PIHS VL PS.ipynb → 04.1 / 04.2 supplier mapping → 05 Transformation.ipynb → 06 / 07 prices → 08 MARKETSHARE.ipynb

The filenames and execution order are preserved from the supplied repository.

01CleaningStandardise uploaded source files.

02PIHS + VLAdd component architectures to vehicle models.

03+ Power ScheduleCalculate component power.

04.1+ EMOT suppliersApply electric-motor mapping rules.

04.2+ OBC suppliersApply on-board-charger mapping rules.

05TransformationReorganise components by year and remove redundant fields.

06LV pricesReshape low-voltage price data.

07HV pricesReshape high-voltage price data.

08Market sharePrepare the final market-analysis layer.

05Use BigQuery as the analysis layer

After the notebook transformations, prepared CSV outputs are loaded into BigQuery. SQL queries add price information and supplier addressability to the transformed vehicle-component structure. The resulting database can then be connected to Google Sheets and Looker Studio for analysis and reporting.

This separation matters. Google Sheets remains the editable business-input layer. Cloud Storage holds controlled files. Colab Enterprise notebooks implement transformation logic. BigQuery becomes the structured analytical layer. Looker Studio provides a consumption interface. Each service has a distinct responsibility rather than becoming another place where the same logic is duplicated.

Google SheetsBusiness-maintained value lists, schedules, suppliers and prices.

Cloud StorageInput, intermediate and output files.

Colab EnterprisePython and pandas transformation notebooks.

BigQueryPricing, addressability and analytical tables.

Looker StudioTeam-facing exploration and reporting.

Questions made possible by the combined layer

Volumes and turnover

Compare total and addressable component volumes and turnover.

Regional distribution

Explore how component demand and supplier positions vary by region.

Supplier market share

Analyse product and supplier distributions across vehicle architectures.

06What the public repository demonstrates—and what it does not

The public repository is strong evidence of the engineering workflow: notebook orchestration, Google Cloud authentication, storage access, pandas transformations and BigQuery queries are all inspectable. The presentation also records the data model, planned automation and business-analysis objectives.

It does not publish the commercial source datasets, supplier mappings, prices or operational dashboards. Nor does the public record independently establish production-service levels or business outcomes. This article therefore documents the architecture and implementation approach without turning a student project into an unsupported claim of live industrial deployment.

What is evidenced

The repository contains the notebook workflow and cloud integrations.
The presentation documents the data hierarchy and business inputs.
The code shows sequential cleaning, merging, transformation and BigQuery operations.

What remains outside the public record

The commercial input datasets are not published.
Business results and dashboards cannot be independently reproduced from the repository alone.
The public material does not establish production-service or security certification.

07Data-engineering lessons from the project

The project illustrates a recurring principle: cloud architecture is most useful when it clarifies responsibility. The business model belongs in explicit data structures. Domain experts need controlled ways to maintain inputs. Transformation stages need an observable order. Analytical outputs need a stable layer that can serve more than one interface.

It also shows why “automation” is not one feature. Copying updated files, selecting the latest source, executing notebooks, loading tables and refreshing analysis are separate transitions. Treating them separately makes failures easier to locate and the workflow easier to evolve.

01Model the domain first

A shared component hierarchy prevents cloud services from becoming a substitute for data modelling.

02Separate editable inputs from governed outputs

Business teams can maintain knowledge without directly modifying analytical tables.

03Orchestrate visible stages

One master notebook makes processing order and dependencies explicit.

04State the evidence boundary

Open code can demonstrate architecture without exposing proprietary data or overstating deployment status.

08Project record

The repository contains the master notebook, nine processing notebooks, the project README and the high-level presentation. Together they provide a compact public record of the project’s architecture and transformation sequence.

Nicolas Len is credited as the project author. His public profiles and repository are linked below for readers who want to inspect the notebooks or follow his subsequent work.

Project resources

Nicolas Len

Nicolas Len developed this project while studying Data Engineering at DSTI. It combines automotive-market modelling, cloud data architecture, Python notebooks, BigQuery and business-facing analytics.

GitHub LinkedIn Project

Source and editorial note: Article developed from the supplied project presentation, README and repository snapshot. Product names, notebook filenames and technical identifiers are preserved. The article describes the public architecture and implementation evidence without disclosing or inferring proprietary source data.