Skip to content

Working with DataFrames

eptr2 returns pandas DataFrames by default, making it easy to analyze and visualize Turkish electricity market data.

Basic DataFrame Operations

Fetching Data

from eptr2 import EPTR2

eptr = EPTR2(use_dotenv=True, recycle_tgt=True)

# Get Market Clearing Price
df = eptr.call("mcp", start_date="2024-07-01", end_date="2024-07-31")

Exploring the Data

# View first rows
print(df.head())

# Check columns
print(df.columns.tolist())

# Data types
print(df.dtypes)

# Summary statistics
print(df.describe())

# Shape
print(f"Rows: {len(df)}, Columns: {len(df.columns)}")

Common Column Names

Different API calls return different columns. Here are common patterns:

Price Data

Column Description
date Timestamp
price Price in TL/MWh
positiveImbalancePrice Positive imbalance price
negativeImbalancePrice Negative imbalance price

Consumption Data

Column Description
date / period Timestamp
consumption Consumption in MWh
swv Settlement value (UECM)
lep Load plan forecast

Generation Data

Column Description
date Timestamp
total Total generation
wind Wind generation
solar Solar generation
hydro Hydro generation
naturalGas Natural gas generation

Datetime Handling

Convert to Datetime

# Convert string dates to datetime
df['date'] = pd.to_datetime(df['date'])

# Set as index
df = df.set_index('date')

Extract Components

df['hour'] = df['date'].dt.hour
df['day'] = df['date'].dt.day
df['month'] = df['date'].dt.month
df['weekday'] = df['date'].dt.dayofweek
df['is_weekend'] = df['weekday'].isin([5, 6])

Timezone Handling

eptr2 data uses Turkey timezone (Europe/Istanbul, UTC+3):

import pytz

# Ensure timezone awareness
df['date'] = pd.to_datetime(df['date']).dt.tz_localize('Europe/Istanbul')

# Convert to UTC
df['date_utc'] = df['date'].dt.tz_convert('UTC')

Filtering and Selecting

By Date Range

# Filter by date
mask = (df['date'] >= '2024-07-15') & (df['date'] <= '2024-07-20')
filtered = df.loc[mask]

By Time

# Peak hours (08:00-20:00)
df['hour'] = df['date'].dt.hour
peak_hours = df[(df['hour'] >= 8) & (df['hour'] < 20)]

# Off-peak hours
off_peak = df[(df['hour'] < 8) | (df['hour'] >= 20)]

By Value

# High price periods
high_price = df[df['price'] > df['price'].quantile(0.9)]

# Specific values
specific = df[df['price'] == 1500.00]

Aggregations

Daily Aggregations

# Set date index
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')

# Daily statistics
daily = df.resample('D').agg({
    'price': ['mean', 'min', 'max', 'std'],
    'consumption': 'sum'
})

Hourly Patterns

# Average price by hour
hourly_avg = df.groupby(df['date'].dt.hour)['price'].mean()

Monthly Summary

monthly = df.resample('M').agg({
    'price': 'mean',
    'consumption': 'sum'
})

Merging Data

Merge Price and Consumption

# Get both datasets
mcp = eptr.call("mcp", start_date="2024-07-01", end_date="2024-07-31")
cons = eptr.call("rt-cons", start_date="2024-07-01", end_date="2024-07-31")

# Rename columns for clarity
mcp = mcp.rename(columns={'date': 'datetime', 'price': 'mcp_price'})
cons = cons.rename(columns={'date': 'datetime'})

# Merge on datetime
merged = pd.merge(mcp, cons, on='datetime', how='inner')

Merge Multiple Price Types

mcp = eptr.call("mcp", start_date="2024-07-01", end_date="2024-07-31")
smp = eptr.call("smp", start_date="2024-07-01", end_date="2024-07-31")

mcp = mcp[['date', 'price']].rename(columns={'price': 'mcp'})
smp = smp[['date', 'price']].rename(columns={'price': 'smp'})

prices = pd.merge(mcp, smp, on='date')
prices['spread'] = prices['mcp'] - prices['smp']

Calculations

Cost Calculation

# Simple cost
df['cost'] = df['consumption'] * df['price']

# Total cost
total_cost = df['cost'].sum()
print(f"Total cost: {total_cost:,.2f} TL")

Rolling Statistics

# 24-hour rolling average
df['price_24h_avg'] = df['price'].rolling(24).mean()

# 7-day rolling average (168 hours)
df['price_7d_avg'] = df['price'].rolling(168).mean()

Percent Change

# Hour-over-hour change
df['price_change'] = df['price'].pct_change()

# Day-over-day change (24 hours)
df['price_daily_change'] = df['price'].pct_change(24)

Visualization

Basic Plotting

import matplotlib.pyplot as plt

# Price over time
df.plot(x='date', y='price', figsize=(12, 6))
plt.title('Market Clearing Price')
plt.xlabel('Date')
plt.ylabel('Price (TL/MWh)')
plt.show()

Distribution

# Price histogram
df['price'].hist(bins=50, figsize=(10, 6))
plt.title('Price Distribution')
plt.xlabel('Price (TL/MWh)')
plt.ylabel('Frequency')
plt.show()

Hourly Pattern

# Average price by hour
hourly = df.groupby(df['date'].dt.hour)['price'].mean()
hourly.plot(kind='bar', figsize=(12, 6))
plt.title('Average Price by Hour')
plt.xlabel('Hour')
plt.ylabel('Price (TL/MWh)')
plt.show()

Export Options

To CSV

df.to_csv('electricity_data.csv', index=False)

To Excel

df.to_excel('electricity_data.xlsx', index=False)

To JSON

df.to_json('electricity_data.json', orient='records', date_format='iso')

To Parquet

df.to_parquet('electricity_data.parquet', index=False)

Best Practices

  1. Convert dates early:

    df['date'] = pd.to_datetime(df['date'])
    

  2. Use appropriate index:

    df = df.set_index('date').sort_index()
    

  3. Handle missing data:

    # Check for missing
    print(df.isnull().sum())
    
    # Fill or drop
    df = df.fillna(method='ffill')  # Forward fill
    

  4. Use efficient dtypes:

    # Convert to appropriate types
    df['price'] = df['price'].astype('float32')
    df['hour'] = df['hour'].astype('int8')
    

Next Steps