π¦π¦π¦ Llama Data Scienceπ¦π¦π¦
In short, ChatGPT’s brand new OPEN-WEIGHTS model gpt-oss does data science better and faster than many people :)
Runner up, Mistral Small 3.2 was also quite competent!
gregarious@hal9000:~/code/llama-datascience$ ./llama_datascience.py --data-file traffic_volume_000256_2004_2024.csv "What is the best time to travel South through the intersection?"
Expert: gpt-oss (temp: 0.5)
--- Expert Reasoning Step 1/20 ---
> Expert wants to use tool: 'python_datascience'
> Reason for tool selection: Load and inspect data
> Code:
import pandas as pd
import numpy as np
# Load data
try:
df = pd.read_csv('/sandbox/data.csv')
except Exception as e:
print('Error loading CSV:', e)
df = None
print('Columns:', df.columns.tolist() if df is not None else None)
print('Head:')
print(df.head() if df is not None else None)
> Code execution complete.
output:
Columns: ['Count Date', 'Dir', '0h', '1h', '2h', '3h', '4h', '5h', '6h', '7h', '8h', '9h', '10h', '11h', '12h', '13h', '14h', '15h', '16h', '17h', '18h', '19h', '20h', '21h', '22h', '23h']
Head:
Count Date Dir 0h 1h 2h 3h 4h ... 17h 18h 19h 20h 21h 22h 23h
0 12/01/2024 P 91 45 32 23 21 ... 837 575 445 358 198 149 63
1 12/01/2024 S 49 30 19 28 42 ... 783 535 359 199 164 89 48
2 12/02/2024 P 54 27 20 26 49 ... 1758 794 486 326 237 191 111
3 12/02/2024 S 18 17 15 36 154 ... 924 506 265 214 163 104 54
4 12/03/2024 P 54 27 17 33 52 ... 1745 1017 584 456 331 145 105
[5 rows x 26 columns]
--- Expert Reasoning Step 2/20 ---
> Expert wants to use tool: 'python_datascience'
> Reason for tool selection: Compute average hourly counts for South direction and find best hour
> Code:
import pandas as pd
import numpy as np
# Load data
try:
df = pd.read_csv('/sandbox/data.csv')
except Exception as e:
print('Error loading CSV:', e)
df = None
# Ensure numeric columns
hour_cols = [col for col in df.columns if col.endswith('h')]
# Convert to numeric
for col in hour_cols:
df[col] = pd.to_numeric(df[col], errors='coerce')
# Filter South direction
south_df = df[df['Dir'] == 'S']
# Compute average per hour across all South rows
avg_counts = south_df[hour_cols].mean()
# Find hour with minimum average
best_hour = avg_counts.idxmin()
print('Average counts per hour for South direction:')
print(avg_counts)
print('\nBest hour (lowest average traffic):', best_hour)
# Also show min count across all South rows for that hour
min_count = south_df[best_hour].min()
print('Minimum count on best hour across all days:', min_count)
> Code execution complete.
output:
Average counts per hour for South direction:
0h 40.570433
1h 25.991288
2h 22.957110
3h 34.281598
4h 97.354108
5h 349.480231
6h 762.930438
7h 1107.911272
8h 900.093553
9h 686.656346
10h 630.000670
11h 644.783273
12h 646.281866
13h 645.476880
14h 654.760488
15h 681.876960
16h 741.068489
17h 734.114194
18h 533.185096
19h 328.717732
20h 260.428897
21h 208.710763
22h 134.479426
23h 70.539204
dtype: float64
Best hour (lowest average traffic): 2h
Minimum count on best hour across all days: 4
--- Expert Reasoning Step 3/20 ---
**Best time to travel South**
The hour with the lowest average vehicle count for the South‑bound direction is **2 h** (the 2 :00 – 3 :00 AM slot).
Across all days in the data set, the average count for that hour is about **23 vehicles**, and the smallest single‑day count recorded in that hour is **4 vehicles**.
So, if you want the least traffic when heading South through the intersection, aim for the 2 :00 – 3 :00 AM window.
Comments
Post a Comment