๐—”๐—œ ๐˜‹๐˜ฆ๐˜ต๐˜ฆ๐˜ณ๐˜ฎ๐˜ช๐˜ฏ๐˜ช๐˜ด๐˜ต๐˜ช๐˜ค vs ๐˜—๐˜ณ๐˜ฐ๐˜ฃ๐˜ข๐˜ฃ๐˜ช๐˜ญ๐˜ช๐˜ด๐˜ต๐˜ช๐˜ค

๐— ๐—ผ๐˜€๐˜ ๐˜๐—ฒ๐—ฎ๐—บ๐˜€ ๐—ณ๐—ผ๐—ฐ๐˜‚๐˜€ ๐—ผ๐—ป “๐˜‚๐˜€๐—ถ๐—ป๐—ด ๐—”๐—œ”

However, my interest lies in whether ๐—”๐—œ ๐—ถ๐˜€ ๐˜๐—ฟ๐˜‚๐—น๐˜† ๐—ฑ๐—ฒ๐—น๐—ถ๐˜ƒ๐—ฒ๐—ฟ๐—ถ๐—ป๐—ด ๐—ฏ๐˜‚๐˜€๐—ถ๐—ป๐—ฒ๐˜€๐˜€ ๐—ผ๐˜‚๐˜๐—ฐ๐—ผ๐—บ๐—ฒ๐˜€.

Utilising AI tools is straightforward, but delivering real value with AI is where it becomes intriguing.


๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด & ๐—ฃ๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ฐ๐—ฒ #๐Ÿญ

When engineers adopt tools like GitHub Copilot or Cursor, they primarily change how they write code. Yet, when an organisation embeds AI into its core business architecture, a deeper transformation occurs:

๐Ÿ‘‰ Traditional engineering practices begin to break down.

This is where an extended skillset becomes essentialโ€”regardless of the terminology we use.


๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ถ๐—ป๐—ด & ๐—ฃ๐—ฟ๐—ฎ๐—ฐ๐˜๐—ถ๐—ฐ๐—ฒ #๐Ÿฎ

The real shift is from:
๐˜‹๐˜ฆ๐˜ต๐˜ฆ๐˜ณ๐˜ฎ๐˜ช๐˜ฏ๐˜ช๐˜ด๐˜ต๐˜ช๐˜ค ๐˜ด๐˜บ๐˜ด๐˜ต๐˜ฆ๐˜ฎ๐˜ด โ†’ ๐˜ช๐˜ง ๐˜Ÿ, ๐˜ต๐˜ฉ๐˜ฆ๐˜ฏ ๐˜ 
to
๐˜—๐˜ณ๐˜ฐ๐˜ฃ๐˜ข๐˜ฃ๐˜ช๐˜ญ๐˜ช๐˜ด๐˜ต๐˜ช๐˜ค ๐˜ด๐˜บ๐˜ด๐˜ต๐˜ฆ๐˜ฎ๐˜ด โ†’ ๐˜ช๐˜ง ๐˜Ÿ, ๐˜ต๐˜ฉ๐˜ฆ๐˜ฏ ๐˜ฑ๐˜ณ๐˜ฐ๐˜ฃ๐˜ข๐˜ฃ๐˜ญ๐˜บ ๐˜ .

In discussions with senior tech leaders, one thing is evident:
Managing that uncertainty at scale is not merely a coding problem.

It evolves into:

  • A systems design challenge
  • A testing and validation challenge
  • A cultural shift in our perception of quality.

๐—˜๐˜…๐—ฎ๐—บ๐—ฝ๐—น๐—ฒ: ๐—ง๐—ฒ๐˜€๐˜๐—ถ๐—ป๐—ด ๐—ฃ๐—ฟ๐—ผ๐—ฏ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜€๐˜๐—ถ๐—ฐ ๐—ฆ๐˜†๐˜€๐˜๐—ฒ๐—บ๐˜€

One area of exploration is how testing must evolve.
We transition from binary correctness to evaluation-based systems (โ€œevalsโ€):

  • Optimising for statistically significant improvement over time
  • Accepting variability while maintaining quality control.

In practice:

  • Using LLM-as-a-Judge patterns to assess outputs (factuality, helpfulness, tone)
  • Running multiple models in parallel to validate outputs
  • Maintaining golden datasets and conducting regression-style evals on every change.

Where I currently stand

Across most areas of the AI SDLC, a hybrid model appears most effective:

  • Deterministic where precision and control are vital
  • Probabilistic where flexibility and adaptability add value.

The challenge lies not in choosing one approach but in designing systems that effectively balance both.
This remains a working hypothesis that I am actively refining. However, the direction is becoming clearer:

๐Ÿ‘‰ For AI-native delivery (beyond just AI-assisted coding), we must rethink how we design systems, define quality, and evolve engineering practices.

Discussion

Iโ€™m curious about how others are navigating this:
๐˜ˆ๐˜ณ๐˜ฆ ๐˜บ๐˜ฐ๐˜ถ ๐˜ญ๐˜ฆ๐˜ข๐˜ฏ๐˜ช๐˜ฏ๐˜จ ๐˜ต๐˜ฐ๐˜ธ๐˜ข๐˜ณ๐˜ฅ๐˜ด ๐˜ฅ๐˜ฆ๐˜ต๐˜ฆ๐˜ณ๐˜ฎ๐˜ช๐˜ฏ๐˜ช๐˜ด๐˜ต๐˜ช๐˜ค, ๐˜ฑ๐˜ณ๐˜ฐ๐˜ฃ๐˜ข๐˜ฃ๐˜ช๐˜ญ๐˜ช๐˜ด๐˜ต๐˜ช๐˜ค, ๐˜ฐ๐˜ณ ๐˜ฅ๐˜ฆ๐˜ด๐˜ช๐˜จ๐˜ฏ๐˜ช๐˜ฏ๐˜จ ๐˜ง๐˜ฐ๐˜ณ ๐˜ข ๐˜ฉ๐˜บ๐˜ฃ๐˜ณ๐˜ช๐˜ฅ ๐˜ฎ๐˜ฐ๐˜ฅ๐˜ฆ๐˜ญ?
๐Ÿ’ฌ๐˜‹๐˜” ๐˜ฎ๐˜ฆโ€”๐˜’๐˜ฎ ๐˜ฆ๐˜ข๐˜จ๐˜ฆ๐˜ณ ๐˜ต๐˜ฐ ๐˜ค๐˜ฐ๐˜ฎ๐˜ฑ๐˜ข๐˜ณ๐˜ฆ ๐˜ฏ๐˜ฐ๐˜ต๐˜ฆ๐˜ด ๐˜ธ๐˜ช๐˜ต๐˜ฉ ๐˜ฐ๐˜ต๐˜ฉ๐˜ฆ๐˜ณ๐˜ด ๐˜ต๐˜ข๐˜ค๐˜ฌ๐˜ญ๐˜ช๐˜ฏ๐˜จ ๐˜ต๐˜ฉ๐˜ช๐˜ด ๐˜ด๐˜ฉ๐˜ช๐˜ง๐˜ต.

Leave a comment

Create a website or blog at WordPress.com

Up ↑