View source on GitHub
|
Reads data from a Pandas DataFrame.
Inherits From: InputDataLoader
meridian.data.load.DataFrameDataLoader(
df: pd.DataFrame,
coord_to_columns: meridian.data.load.CoordToColumns,
kpi_type: str,
media_to_channel: (Mapping[str, str] | None) = None,
media_spend_to_channel: (Mapping[str, str] | None) = None,
reach_to_channel: (Mapping[str, str] | None) = None,
frequency_to_channel: (Mapping[str, str] | None) = None,
rf_spend_to_channel: (Mapping[str, str] | None) = None,
organic_reach_to_channel: (Mapping[str, str] | None) = None,
organic_frequency_to_channel: (Mapping[str, str] | None) = None
)
This class reads input data from a Pandas DataFrame. The coord_to_columns
attribute stores a mapping from target InputData coordinates and array names
to the DataFrame column names if they are different. The fields are:
geo,time,kpi,revenue_per_kpi,population(single column)controls(multiple columns, optional)- (1)
media,media_spend(multiple columns) - (2)
reach,frequency,rf_spend(multiple columns) non_media_treatments(multiple columns, optional)organic_media(multiple columns, optional)organic_reach,organic_frequency(multiple columns, optional)
The DataFrame must include (1) or (2), but doesn't need to include both.
Also, each media channel must appear in (1) or (2), but not both.
Note the following:
- Time column values must be formatted in yyyy-mm-dd date format.
- In a national model,
geoandpopulationare optional. If thepopulationis provided, it is reset to a default value of1.0. - If
mediadata is provided, thenmedia_to_channelandmedia_spend_to_channelare required. Ifreachandfrequencydata is provided, thenreach_to_channelandfrequency_to_channelandrf_spend_to_channelare required. - If
organic_reachandorganic_frequencydata is provided, thenorganic_reach_to_channelandorganic_frequency_to_channelare required.
Example:
# df = [...]
coord_to_columns = CoordToColumns(
geo='dmas',
time='dates',
kpi='conversions',
revenue_per_kpi='revenue_per_conversions',
controls=['control_income'],
population='populations',
media=['impressions_tv', 'impressions_fb', 'impressions_search'],
media_spend=['spend_tv', 'spend_fb', 'spend_search'],
reach=['reach_yt'],
frequency=['frequency_yt'],
rf_spend=['rf_spend_yt'],
non_media_treatments=['price', 'discount']
organic_media=['organic_impressions_blog'],
organic_reach=['organic_reach_newsletter'],
organic_frequency=['organic_frequency_newsletter'],
)
media_to_channel = {
'impressions_tv': 'tv',
'impressions_fb': 'fb',
'impressions_search': 'search',
}
media_spend_to_channel = {
'spend_tv': 'tv', 'spend_fb': 'fb', 'spend_search': 'search'
}
reach_to_channel = {'reach_yt': 'yt'}
frequency_to_channel = {'frequency_yt': 'yt'}
rf_spend_to_channel = {'rf_spend_yt': 'yt'}
organic_reach_to_channel = {'organic_reach_newsletter': 'newsletter'}
organic_frequency_to_channel = {'organic_frequency_newsletter': 'newsletter'}
data_loader = DataFrameDataLoader(
df=df,
coord_to_columns=coord_to_columns,
kpi_type='non-revenue',
media_to_channel=media_to_channel,
media_spend_to_channel=media_spend_to_channel,
reach_to_channel=reach_to_channel,
frequency_to_channel=frequency_to_channel,
rf_spend_to_channel=rf_spend_to_channel,
organic_reach_to_channel=organic_reach_to_channel,
organic_frequency_to_channel=organic_frequency_to_channel,
)
data = data_loader.load()
Methods
load
load() -> meridian.data.input_data.InputData
Reads data from a dataframe and returns an InputData object.
__eq__
__eq__(
other
)
Return self==value.
View source on GitHub