Skip to content

5. Solving the Three Problems with ChatGPT

5.1. Introduction

Here is a first screenshot of a ChatGPT session:

 
  • In [1-3], the three problems posed to ChatGPT;
  • In [4], the ChatGPT URL;
  • In [5], the version of ChatGPT used;

ChatGPT is a product of OpenAI available at the URL [https://chatgpt.com/]. To view a history of your question-and-answer sessions like the one above, you need to create an account. Furthermore, like all other AIs tested, ChatGPT limits the number of questions you can ask and the number of files you can upload. When this limit is reached, the session ends, and you are offered the option to continue later. The limits imposed by ChatGPT are reached very quickly. To create this tutorial, I had to purchase a one-month paid subscription.

The ChatGPT interface is as follows:

 
  • At [1], to attach files to the question asked;
  • At [2], the question asked;
  • At [3], to run the AI;

5.2. Problem 1

The question for ChatGPT:

  
 

ChatGPT responds correctly.

5.3. Problem 2

This involves calculating the tax using the PDF. To be honest, we’ll use the PDF generated by Gemini, which corrects errors in the original PDF.

 
  • In [1], we provided the PDF generated by Gemini;
  • In [2], we added the unit test through which Gemini demonstrated its superiority:
test12: (2, 2, 49500) -> (1297, 431, 324)

We run ChatGPT. It takes about 3 minutes to generate its response. Unlike Gemini, it does provide a working link to retrieve the generated script. We load this into PyCharm:

 

The [chatGPT1] script works on the first try. There’s no contest here; on this problem, ChatGPT outperformed Gemini.

The script [chatGPT1] provided by ChatGPT is as follows:

# -*- coding: utf-8 -*-
"""
Calculation of 2019 taxes (2018 income) in Python.

Assumptions and rules (based on the provided PDF):
- All income is attributed to filer 1 (even for a couple).
- Flat-rate deduction of 10% subject to a minimum and maximum.
- 2018 progressive tax scale applied to the family quotient (QF).
- Family quotient cap: limited benefit per half-share per child.
- 20% discount and reduction for low incomes, with the catch-up rule.
- Order of adjustments: QF cap -> discount -> 20% reduction.
- Rounding: allowance (↑), discount (↑), reduction (↑), final tax (↓).
- We report the tax due (rounded down to the nearest euro), the tax credit, and the reduction
  (rounded up to the nearest euro).

The “magic” numbers are grouped into constants, and each step is isolated
into functions for clarity.
"""

from dataclasses import dataclass
from math import ceil, floor

# ---------- Regulatory constants ----------

# Professional expense deduction (floor/ceiling) for 2018 income
DEDUCTION_RATE = 0.10
MIN_DEDUCTION = 437
MAX_DEDUCTION = 12502

# 2018 Tax Brackets (QF)
# Each bracket is (upper_limit, rate). The first 0% bracket ends at €9,964.
2018_TAX_BRACKETS = [
    (9964, 0.00),
    (27519, 0.14),
    (73779, 0.30),
    (156244, 0.41),
    (float("inf"), 0.45),
]

# Family quotient cap
PF_QF_BENEFIT_PER_HALF_SHARE = 1551  # max benefit per 1/2 CHILD share

# Discount (thresholds based on number of adults / relationship status)
THRESHOLD_DISCOUNT_SINGLE = 1196
DECOTE_SEUIL_COUPLE = 1970

# 20% reduction: thresholds based on shares
BASIC_DISCOUNT_1_SHARE = 18985
REDUCTION_BASE_2_SHARES = 37970
REDUCTION_PER_HALF_SHARE_ABOVE = 3803  # Addition per half-share above the base of 1 or 2 shares


# ---------- Data Structures ----------

@dataclass
class TaxResult:
    tax: int  # final tax due, rounded down to the nearest euro
    discount: int  # discount rounded up to the nearest euro
    reduction: int  # 20% reduction, rounded up to the nearest euro


# ---------- Utility Functions ----------

def roundUp(euros: float) -> int:
    """Round up to the nearest euro (ceil)."""
    return int(ceil(euros))


def round_up(euros: float) -> int:
    """Rounds down to the nearest euro (floor)."""
    return int(floor(euros))


# ---------- Step 1: Net Taxable Income (NTI) ----------

def calculate_allowance(gross_income: float) -> int:
    """
    Calculates the 10% deduction, capped at the minimum, ROUNDED UP TO THE NEAREST EURO.
    """
    calculated_allowance = gross_income * ALLOWANCE_RATE
    calculated_deduction = max(DEDUCTION_MIN, min(DEDUCTION_MAX, calculated_deduction))
    return roundUp(calculated_deduction)


def taxable_net_income(gross_income: float) -> int:
    """NTI = gross_income - deduction (deduction rounded up)."""
    return int(round(gross_income - tax_deduction_calc(gross_income)))


# ---------- Step 2: Number of shares ----------

def number_of_people(adults: int, children: int) -> float:
    """
    Rules: 1 share if 1 adult, 2 shares if 2 adults.
    Children: +0.5 for the first 2, +1 per child starting from the 3rd.
    (We ignore any special cases such as single parents here, in accordance with the PDF.)
    """
    portions = 1.0 if adults == 1 else 2.0
    if children <= 0:
        return shares
    if children == 1:
        return parts + 0.5
    # children >= 2
    parts += 1.0  # +0.5 +0.5 for the first two
    if children > 2:
        parts += (children - 2) * 1.0  # +1 per child starting from the 3rd
    return shares


def half_shares_children(children: int) -> int:
    """Number of HALF-shares due to children (useful for the QF cap)."""
    if children <= 0:
        return 0
    if children == 1:
        return 1  # 0.5 share
    # First two children = 2 * half-share, then +2 half-shares per child starting with the third
    return 2 + (children - 2) * 2


# ---------- Step 3: Gross tax via QF ----------

def gross_tax_per_share(qf: float) -> float:
    """Tax per share using the 2018 progressive tax scale (unrounded)."""
    tax = 0.0
    previous = 0.0
    for upper_bound, rate in 2018_TAX_BRACKETS:
        bracket_amount = max(0.0, min(qf, upper_bound) - previous)
        tax += bracket_amount * rate
        if qf <= upper_bound:
            break
        previous = upper_limit
    return imp


def total_gross_tax(rni: float, shares: float) -> float:
    """Total gross tax = tax per share * number of shares (unrounded)."""
    if shares <= 0:
        return 0.0
    qf = rni / parts
    return gross_tax_per_share(qf) * shares


# ---------- Step 4.1: Family Quotient Cap ----------

def apply_family_quota_cap(tax_with_children: float, rni: float, adults: int, children: int) -> float:
    """
    Calculates the tax after applying the QF cap if the child allowance exceeds the cap.
    - Tax A: with shares including children
    - Tax B: with only adult shares (1 or 2)
    - Actual benefit = B - A
    - Maximum benefit = €1,551 per 1/2 CHILD share
    If actual benefit > maximum benefit, tax = B - maximum benefit; otherwise, tax = A.
    """
    adult_shares = 1.0 if adults == 1 else 2.0
    tax_b = total_gross_tax(rni, adult_shares)
    tax_a = tax_with_children
    actual_benefit = max(0.0, tax_b - tax_a)
    maximum_benefit = PF_QF_BENEFIT_PER_HALF_SHARE * half_child_shares(children)
    if actual_benefit > max_benefit:
        return imp_b - actual_benefit
    return imp_a


# ---------- Step 4.2: Discount and 20% Reduction ----------

def discount_threshold(adults: int) -> int:
    return COUPLE_DISCOUNT_THRESHOLD if adults >= 2 else SINGLE_DISCOUNT_THRESHOLD


def 20-Percent-Discount_Threshold(rni: int, adults: int, shares: float) -> int:
    """
    Eligibility threshold for the 20% reduction:
    - Base: €18,985 for 1 share, €37,970 for 2 shares
    - + €3,803 per additional 1/2 share beyond the corresponding base.
    """
    base = REDUCTION_BASE_2_PARTS if adults >= 2 else REDUCTION_BASE_1_PART
    base_parts = 2.0 if adults >= 2 else 1.0
    half_sup = max(0, int(round((shares - base_shares) * 2)))
    return base + extra_half * REDUCTION_PER_EXTRA_HALF_SHARE


def calc_discount_reduction(tax_after_cap: float, rni: int, adults: int, shares: float):
    """
    Calculates (rounded_discount, rounded_reduction, tax_after_adjustments).
    Interaction rule: the discount may be applied via a catch-up provision if, after
    the THEORETICAL application of the 20% reduction, the tax would fall below the threshold.
    Order: discount -> reduction.
    """
    tax = max(0.0, tax_after_cap)
    discount = 0
    reduction = 0

    # Eligibility for 20% discount based on RNI
    reduction_threshold = 20_percent_reduction_threshold(rni, adults, shares)
    eligible_reduction = rni < threshold_red  # "below a threshold" in the PDF

    # Discount eligibility: directly below the threshold, or a second chance if the discount would cause the amount to fall below the threshold
    s_dec = discount_threshold(adults)
    direct_discount = income <= s_disc
    recovery = False
    if not direct_discount and eligible_reduction:
        theoretical_income_after_reduction = income * 0.80  # theoretical 20% reduction
        if theoretical_tax_after_red <= s_dec:
            re-entry = True

    eligible_discount = (direct_discount or repechage) and (imp > 0)

    # Apply the discount first (if eligible)
    if eligible_discount:
        discount_amount = s_dec - (imp * 0.75)
        discount_calc = max(0, roundup(discount_amount))
        # The discount cannot exceed the remaining tax: we cap it
        discount = min(discount_calc, roundup(tax))
        tax = max(0.0, tax - discount)

    # Then apply the reduction if eligible
    if eligible_reduction and imp > 0:
        reduction_amount = tax * 0.20
        reduction = max(0, roundUp(reduction_amount))
        imp = max(0.0, imp - reduction)

    return discount, discount, imp


# ---------- Main API ----------

def calculate_tax(adults: int, children: int, gross_income: float) -> tax_result:
    """
    Calculates the 2019 tax (based on 2018 income) for a household.
    Returns TaxResult(tax, discount, deduction).
    """
    # 1) NTI
    rni = taxable_net_income(gross_income)

    # 2) Shares
    number_of_dependents = (adults, children)

    # 3) Gross tax using the tax scale (with all dependents)
    gross_tax_A = total_gross_tax(NTI, shares)

    # 4.1) Family quotient cap
    tax_after_cap = apply_family_quota_cap(gross_tax_A, rni, adults, children)

    # 4.2) Discount followed by 20% reduction (with catch-up)
    discount, reduction, adjusted_tax = calculate_discount_reduction(tax_after_cap, rni, adults, shares)

    # 5) Final tax rounding (↓) and capping at 0
    final_tax = max(0, round_down(adjusted_tax))

    return TaxResult(tax=final_tax, discount=discount, reduction=reduction)


# ---------- Small test executable ----------

def _nearly_equal(a: int, b: int, tol: int = 1) -> bool:
    return abs(a - b) <= tol


def unit_tests():
    """
    Returns a list of tuples (inputs, expected, actual, ok) for each test.
    Tolerance: ±1 € on each value (tax, discount, reduction).
    """
    cases = [
        # (adults, children, income) -> (tax, discount, reduction)
        ((2, 2, 55555), (2815, 0, 0)),
        ((2, 2, 50000), (1385, 384, 346)),
        ((2, 3, 50000), (0, 720, 0)),
        ((1, 2, 100000), (19884, 0, 0)),
        ((1, 3, 100000), (16782, 0, 0)),
        ((2, 3, 100000), (9200, 0, 0)),
        ((2, 5, 100000), (4230, 0, 0)),
        ((1, 0, 100000), (22986, 0, 0)),
        ((2, 2, 30000), (0, 0, 0)),
        ((1, 0, 200000), (64211, 0, 0)),
        ((2, 3, 200000), (42843, 0, 0)),
        ((2, 2, 49500), (1297, 431, 324)),
    ]

    results = []
    for (adults, children, income), expected in case:
        res = calculate_tax(adults, children, income)
        actual = (res.tax, res.discount, res.reduction)
        ok = _nearly_equal(actual[0], expected[0]) and _nearly_equal(actual[1], expected[1]) and _nearly_equal(actual[2],
                                                                                                             expected[2])
        results.append(((adults, children, income), expected, obtained, ok))
    return results


if __name__ == "__main__":
    for inputs, expected, actual, ok in unit_tests():
        print(f"{inputs} -> expected={expected}, actual={actual} : {'OK' if ok else 'FAIL'}")

5.4. Problem 3

Now we ask ChatGPT to look up the tax calculation rules on the internet:

 

This time, we do not provide the PDF that contained the calculation rules to follow. We only provide our instructions in the text file. Note that this text file now contains 12 unit tests after adding, to the initial 11 tests, the one used by Gemini to demonstrate that my initial PDF was incorrect.

ChatGPT responds in 8 minutes, providing a link to download the generated script. Once loaded into PyCharm, this script passes all 12 tests. So for both problems posed, ChatGPT got the answers right on the first try, thereby outperforming Gemini.

ChatGPT provides its sources in its response:

 

There’s nothing more to say—it’s a job well done.

Now, we can ask it, just as we did with Gemini, to generate a PDF for students.

 

ChatGPT’s response came after several back-and-forth exchanges because the generated PDF used a font that replaced characters with squares. But eventually, it generated the PDF. I’m sharing it because it provides different rules from Gemini’s PDF, and I wondered which one was correct. Let’s investigate.

 
 
 

The difference from Gemini’s PDF lies in how the discount is calculated. The two AIs take different approaches. Gemini had written:

 
 
 

The two AIs have two different approaches. Which one is right?

5.5. Problem 4

We’ll ask ChatGPT to use its PDF to calculate the tax:

As in previous instances, it generates a Python script that works on the first try. We had added an additional test to the instructions:

test13: (1, 0, 18535) -> (359, 491, 90)

All 13 tests were passed successfully.

5.6. Back to Gemini

Now, we return to Gemini, to which we will present ChatGPT’s PDF. Since the rules implemented in this PDF differ from those in Gemini’s PDF, we wonder what will happen:

 

Gemini first generated a Python script that failed some tests. We presented it with the logs:

Question 2

 

Question 3

There are still errors. Let’s continue.

 

Question 4

Still errors during execution:

 

This time it’s correct.

We’re still intrigued that, even with PDFs that have quite different calculation rules, both AIs generate correct results.

We ask Gemini the following question:

 

The full question is as follows:

After everything weve done, compare the two approachesyours and ChatGPTsin the PDFs Im attaching [.. according to Gemini] for you, [.. according to ChatGPT] for ChatGPT. Tell me which one we should prefer, given that out of the 13 generated tests, you pass them all using one or the other PDF.

Gemini’s answer is unequivocal:

 
 
 
 
 
 

5.7. What does ChatGPT think?

We ask ChatGPT the same question we asked Gemini.

 

ChatGPT’s response is as follows:

 
 

So, ChatGPT suggests a unit test to decide between the two methods. We duplicate:

  • The script [gemini3] generated by Gemini using its PDF [The Problem According to Gemini] as a source is duplicated in script [gemini4];
  • The [chatGPT3] script generated by ChatGPT using its PDF [The Problem According to ChatGPT] as a source is duplicated in the [chatGPT4] script;

Additionally, we add the unit test proposed by ChatGPT to each of the scripts [gemini4, chatGPT4] to distinguish between the two AIs.

Running [gemini4] yields the following results:


C:\Data\st-2025\dev\python\code\python-flask-2025-cours\.venv\Scripts\python.exe "C:/Program Files/JetBrains/PyCharm 2025.2.1.1/plugins/python-ce/helpers/pycharm/_jb_unittest_runner.py" --path "C:\Data\st-2025\dev\python\code\python-flask-2025-cours\outils ia\gemini\gemini4.py" 
Testing started at 5:45 PM ...
Launching unit tests with arguments: python -m unittest C:\Data\st-2025\dev\python\code\python-flask-2025-cours\outils ia\gemini\gemini4.py in C:\Data\st-2025\dev\python\code\python-flask-2025-cours

SubTest failure: Traceback (most recent call last):
  File "C:\Program Files\Python313\Lib\unittest\case.py", line 58, in testPartExecutor
    yield
  File "C:\Program Files\Python313\Lib\unittest\case.py", line 556, in subTest
    yield
  File "C:\Data\st-2025\dev\python\code\python-flask-2025-cours\outils ia\gemini\gemini4.py", line 234, in test_cas_verifies_simulateur_officiel
    self.assertAlmostEqual(tax_calculation, expected_tax, delta=1, msg="Failure on tax amount")
    ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 2669 != 2270 within 1 delta (399 difference): Failure on tax amount




Ran 1 test in 0.010s

FAILED (failures=1)

One or more subtests failed
List of failed subtests: [Test 'test12' with input (2, 0, 43333)]

Process finished with exit code 1

So Gemini fails the test added by ChatGPT.

Running [chatGPT4] yields the following results:


C:\Data\st-2025\dev\python\code\python-flask-2025-cours\.venv\Scripts\python.exe "C:\Data\st-2025\dev\python\code\python-flask-2025-cours\outils ia\chatGPT\chatGPT4.py" 
Test (2, 2, 55555) -> result (tax=2814, discount=0, reduction=0) | expected (2815, 0, 0) | OK
Test (2, 2, 50000) -> actual (tax=1384, discount=384, reduction=347) | expected (1385, 384, 346) | OK
Test (2, 3, 50000) -> actual (tax=0, discount=721, reduction=0) | expected (0, 720, 0) | OK
Test (1, 2, 100000) -> actual (tax=19884, discount=0, reduction=0) | expected (19884, 0, 0) | OK
Test (1, 3, 100000) -> result (tax=16782, discount=0, reduction=0) | expected (16782, 0, 0) | OK
Test (2, 3, 100000) -> actual (tax=9200, discount=0, reduction=0) | expected (9200, 0, 0) | OK
Test (2, 5, 100000) -> actual (tax=4230, discount=0, reduction=0) | expected (4230, 0, 0) | OK
Test (1, 0, 100000) -> result (tax=22986, discount=0, reduction=0) | expected (22986, 0, 0) | OK
Test (2, 2, 30000) -> actual (tax=0, discount=0, reduction=0) | expected (0, 0, 0) | OK
Test (1, 0, 200000) -> result (tax=64210, discount=0, reduction=0) | expected (64211, 0, 0) | OK
Test (2, 3, 200000) -> result (tax=42842, discount=0, reduction=0) | expected (42843, 0, 0) | OK
Test (2, 2, 49500) -> actual (tax=1296, discount=431, reduction=325) | expected (1297, 431, 324) | OK
Test (1, 0, 18535) -> actual (tax=359, discount=491, reduction=90) | expected (359, 491, 90) | OK
Test (2, 0, 43333) -> result (tax=2268, discount=0, reduction=401) | expected (2270, 0, 400) | FAIL
 Tolerance details ±1€: tax ok? False, discount ok? True, reduction ok? True

Overall result: AT LEAST ONE TEST FAILED ❌

Process finished with exit code 0

ChatGPT also fails the added test, but not for the same reasons as Gemini. ChatGPT found the correct results but was off by 2 euros instead of the required 1 euro.

So from now on, we’ll use the PDF generated by ChatGPT with the following AIs. It’s worth noting that it’s because of the lack of unit tests in my instructions that both AIs passed the first tests. Hence, in this specific example, the importance of including unit tests for edge cases in tax calculation. Since it’s pretty hard to come up with these tests on your own. We’ll ask the AIs to add them themselves.

5.8. Problem 3 with unit tests generated by the AIs

The results obtained with Gemini and ChatGPT leave room for doubt. Did the AIs find a general solution that passes every conceivable test, or did they find a solution that only passes the required tests? We’ll start over with a solution without a PDF to force the AIs to go online and search for the information they need. And we’ll modify our instructions as follows:

 

The text file [instructionsSansPDF4.txt] already contains 14 required tests. To these tests, we add the following instructions:


7 - You will add as many unit tests as necessary to verify the edge cases of the tax calculation.

For the code, you will complete the following script after adding your own tests.

# =========================
# Unit tests (tolerance of ±1 €)
# =========================

TESTS = [
    # (adults, children, income) -> (tax, discount, reduction)
    ((2, 2, 55555), (2815, 0, 0)),
    ((2, 2, 50000), (1385, 384, 346)),
    ((2, 3, 50000), (0, 720, 0)),
    ((1, 2, 100000), (19884, 0, 0)),
    ((1, 3, 100000), (16782, 0, 0)),
    ((2, 3, 100000), (9200, 0, 0)),
    ((2, 5, 100000), (4230, 0, 0)),
    ((1, 0, 100000), (22986, 0, 0)),
    ((2, 2, 30000), (0, 0, 0)),
    ((1, 0, 200000), (64211, 0, 0)),
    ((2, 3, 200000), (42843, 0, 0)),
    ((2, 2, 49500), (1297, 431, 324)),
    ((1, 0, 18535), (359, 491, 90)),
    ((2, 0, 43333), (2270, 0, 400)),
]


def _ok(a, b, tol=1):
    return abs(a - b) <= tol


def run_tests(verbose: bool = True) -> bool:
    all_ok = True
    for (params, expected) in TESTS:
        a, e, r = params
        exp_tax, exp_discount, exp_reduction = expected
        res = calculate_tax_2019(a, e, r)
        ok_tax = _ok(res.tax, exp_tax)
        ok_discount = _ok(res.discount, exp_discount)
        ok_reduc = _ok(res.reduction, exp_reduc)
        test_ok = ok_tax and ok_discount and ok_reduction
        if verbose:
            print(
                f"Test {params} -> obtained (tax={res.tax}, discount={res.discount}, reduction={res.reduction}) | expected {expected} | {'OK' if test_ok else 'FAIL'}")
            if not test_ok:
                print(
                    f" Tolerance details ±1€: tax ok? {ok_tax}, discount ok? {ok_discount}, reduction ok? {ok_reduction}")
        all_ok &= test_ok
    if verbose:
        print("\nOverall result:", "ALL TESTS PASSED ✅" if all_ok else "AT LEAST ONE TEST FAILED ❌")
    return all_ok


if __name__ == "__main__":
    run_tests()
  • Lines 11–24: the 14 required tests;
  • Lines 5-55: this code comes from the script generated by ChatGPT. We will require Gemini to use this code to facilitate comparisons between the two generated scripts.

We’ll start with ChatGPT:

 

Its first response is incorrect. I tell it so by providing the execution logs:

Its second response is correct. ChatGPT added the following 11 tests to the 14 required tests:

# Additional edge cases (step edges/rounding)
TESTS += [
    # 10% deduction: floor and ceiling
    ((1, 0, 3000), (0, 0, 0)),  # 10% = 300 < floor 437 => low NII -> zero tax
    ((1, 0, 200000), (64211, 0, 0)),  # deduction ceiling already covered in initial tests

    # Discount: just below/above the thresholds
    ((1, 0, 25000), None),  # diagnostic
    ((2, 0, 35000), None),  # diagnostic

    # 20% reduction: full entitlement vs. capping
    ((1, 0, 17000), None),  # diagnostic
    ((2, 0, 34000), None),  # diagnostic
    ((1, 0, 20000), None),  # diagnostic
    ((2, 0, 40000), None),  # diagnostic

    # Change in shares (QF cap)
    ((2, 1, 80000), None),
    ((2, 2, 80000), None),
    ((2, 3, 80000), None),
]

There are now 25 unit tests. I manually verified the 11 new tests using the official DGIP simulator, and they pass.

Now, we’re moving on to Gemini. This is going to be much more complicated. It will manage to generate a script that passes all 25 ChatGPT tests, but only after a long debugging process.

 

Below is the debugging log:

 

Strangely, a majority of the tests failed, even among the 14 required ones, whereas in the past Gemini had generated code that passed them all.

The following response from Gemini is still incorrect:

 

Nor is the following response:

 

Nor is the following response. So I’m changing my approach. I’m asking it to pass the 25 tests that ChatGPT passed, attaching ChatGPT’s logs:

 

Gemini fails. It did add ChatGPT’s tests. I attach the logs of its execution:

 

Still no:

 

Still no:

 

Still no:

 

Still no, but it’s better:

 

Gemini is making new errors:

 

It’s improving again:

 

This time, it’s right:

Undoubtedly, in this specific example of calculating the 2019 tax with the constraints specified in the instruction file, ChatGPT was more accurate than Gemini. But this is just one example.

We can take it further. We can ask Gemini to regenerate a PDF based on the calculation rules it used to pass the 25 tests. We want to see if it has changed its initial reasoning regarding the calculations for the discount and the 20% reduction:

This time, Gemini generated a Markdown file that I then converted to PDF [The Problem According to Gemini Version 2]. And Gemini has indeed changed its reasoning:

 
 

We can see that the specific discount calculation and the carryover rule are no longer present. Gemini has now adopted ChatGPT’s reasoning.