HomeGuidesAPI ReferenceChangelog
Log In

DataHub ESG Tables

📘

General Availability Release

Arcadia’s new aggregated meter level usage details utility data product, DataHub, is officially in General Availability! Arcadia is excited to announce that we support aggregated meter level usage details to support ESG scope 2 reports in Snowflake through this new offering.

Introduction

What is DataHub?

DataHub’s release provides calendarized and aggregated meter level utility data at the monthly, quarterly, and annual timeframes to support ESG scope 2 reports. DataHub’s ESG tables provide the underlying meter data to enable you to calculate your emissions. There are four tables available: Calendarized Monthly Usages, Calendarized Quarterly Usages, Calendarized Annual Usages, and a Statement Availability Report.

For example, the Calendarized Monthly Usages table aggregates your meters’ daily usage values and provides monthly level totals for your meters’ usages. This table provides fields for the original usage and usage unit of measure that are available on the utility statement as well as standardized columns for usage and unit of measure for your meters’ usages. This table enables you to understand how much energy the meter uses for a specific month to calculate how much greenhouse gas this meter is responsible for. The Calendarized Quarterly Usages and Calendarized Annual Usages tables support this use case as well and offer the meter level data aggregation at the quarterly or annual levels respectively.

This release also provides the Statement Availability Report to present a summary of how many statements are included in your calendarized ESG report. The Statement Availability Report table provides columns for your total statements, statements included into your calendarized ESG report, statements excluded from your calendarized ESG report, and the statements included percentage. DataHub strives for 95% or more statements to be available in your ESG report, and there are audits that prevent statements with data quality issues from polluting your meters’ total usage values. For the outstanding statements, the utility provider's statements report data with inconsistencies, unintelligible, or filled with gaps. Proration & Inference features address these data inconsistencies by prorating account level charges to the meter level and by inferring the meter total usage when the meter’s previous reading and current reading are available on the statement while the meter’s total usage is not printed. You can learn more about Proration & Inference features on this feature deep dive. Arcadia will continue to enhance this logic to solve additional edge cases over time.

📘

Note

Because of the inconsistency of when utility providers post statements, Arcadia recommends that you wait until the month, quarter, or year is 45 days or more past the last day of the time period to ensure that all meter data is available in your respective report.

Prerequisites

Enable Proration and Inference Features

If you are a new customer who signed on or after March 25th, 2024, proration and inference features are enabled by default. If you are a legacy customer as an initial required step, ensure that your organization is enabled for proration and inference features.

Onboard Utility Credentials and Utility Files onto Plug

As a second required step, you will onboard your utility data onto Plug. You will submit utility credentials through Plug’s Connect experience or Create Credential API endpoint. If you do not have utility credentials and instead have the source statement PDFs, you will onboard these statements through Plug’s Bill Uploader module or Add File API endpoint.

Create sites and assign meters to site

In order to group your meters by site ID in DataHub, Arcadia recommends that you create Sites for your relevant geographical locations and map utility meters to the site containers to further organize your utility data in Plug.


Data Access Options

DataHub provides access to your utility data through two new ingestion options: zipped CSV files delivered to your SFTP server or access to a Snowflake Direct Share to query the data in Snowflake directly.

SFTP Delivery

In the SFTP option, you must setup your own SFTP server. You receive your selected calendarized table (e.g. monthly, quarterly, or annual) with aggregated meter level utility data for all statements and the Statement Availability Report upon initial SFTP setup. You also receive aggregated meter level utility data for all statements and the Statement Availability Report on the first of each month moving forward. For a quarterly report example, you will receive two files to your SFTP: datahub_quarterly_usages_02_21_2024-10_13_15.csv.gz and datahub_statement_availability_report_02_21_2024-10_13_15.csv.gz .

Arcadia recommends that you use the most recent file to ensure that your statement and usage data is up to date. You may setup all three calendarized tables as SFTP deliveries if your use cases require monthly, quarterly, and annual calendarized meter usage data. For SFTP delivery files, null values are represented as ,"\n",  for VARCHAR-like data types and as ,"", for empty string values. Null values are represented as  ,\\n,  for NUMBER-like data types.

SFTP Setup

In order to set up your SFTP connection with Arcadia, you access the Plug Dashboard, navigate to the DataHub Settings page, and click the button ‘Set up SFTP delivery’. You need to be an admin user to access this dashboard page.

In the modal, you need to share the information below. Please work with your IT or infrastructure department to configure and provide these details. Please work with your Arcadia representative for handling any issues around initial setup.

  • SFTP server hostname/IP
    • e.g. 192.158.1.38 or sftp.example.com
  • SFTP server public key. This key must be in OpenSSH’s authorized_keys format.
    • e.g. ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDXdVM9rD7wE1vBcDpfJszq3Hs7F...
    • If you do not have an existing key pair, you can generate a new private / public key pair in this format with the following command:
    ssh-keygen -t rsa -b 2048
    
    • The .pub file that is generated from the above command contains the OpenSSH format public key that you should share in the SFTP setup modal. You can optionally copy it to your clipboard by running the following command:
    pbcopy < ~/.ssh/<FILENAME_OF_RSA_KEY>.pub
    
    • The private key file (private_key.pem or the file you specified when running the command) is the file you should keep secret and NOT share with Arcadia. This private key file should be used to configure your SFTP server.
  • The username of the user you have set up for Arcadia to access your SFTP server.
    • e.g. arcadia
    • Note that you will need to associate Arcadia’s OpenSSH format public key (2048-bit RSA) with this user, so Arcadia may authenticate to your server. Most SFTP servers have an authorized_keys file that you should add this key to. This key can be found under the “Arcadia public key value” field in the SFTP setup modal on your Arc dashboard.
  • Full path in the above user’s directory that you would like your Arcadia data to be stored.
    • e.g. / or /datahub/reports
    • You should create this folder and provide the created user write access to it before submitting this information to Arcadia.

Snowflake Data Share

For additional information on Snowflake Data Shares, review Snowflake’s overview guide. For the Snowflake Data Share option, your Snowflake account must be hosted on AWS in us-east-1 availability zone. After receiving access to the private Snowflake Data Share, you can query in SQL or Python against the full dataset in all four tables. The Snowflake Data Share table names are datahub_monthly_usages, datahub_quarterly_usages, datahub_annual_usages, and datahub_statement_availability_report.

Snowflake Data Share Setup

To access DataHub’s Snowflake private Data Share, you will need to host your Snowflake account in AWS us-east-1 availability zone. Once you confirm that your Snowflake account is available on this cloud provider and availability zone, you access the Plug Dashboard, navigate to the DataHub Settings page, and complete the Snowflake Data Share form. You need to be an admin user to access this dashboard page.

In your Snowflake environment, run the command below and provide your current organization name, your current account name, and account locator into the form.

Untitled
select current_organization_name(), current_account_name(), current_account();

Once Arcadia sets up your Snowflake Account for Data Share, Arcadia database is available in the ‘Ready to Get’ section. In Snowflake’s Snowsight UI, you will:

  1. Sign in to Snowsight
  2. Select Data » Private Sharing
  3. Select the Shared with You tab
  4. In the Ready to Get section, select the share that you want to create a database for
  5. Set a database name and the roles that are permitted to access the database
  6. Select Get Data

Once the Data Share is available, you can query all the data in one of the four tables with:

SELECT * FROM datahub.shares.<table_name>;

For additional information on Snowflake Data Shares, review Snowflake’s overview guide.


DataHub - Standardize Unit of Measure Features

Standardizing units of measure on meter usages enables DataHub to better serve the ESG use case by providing more accurate usage totals in the same unit of measure. Arcadia derives energy conversion factors from a combination of Energy Star’s US conversions table, Nist.gov conversions factors, and assumed that gallons are measured in US Liquid Gallons. For conversions that are volume to energy (e.g. natural gas) or mass to energy (e.g. steam), Arcadia is leveraging Energy Star’s conversion factors. Arcadia now supports standardizing units of measure where possible for the following service types:

  • Electric/Lighting: kWh for consumption and kW for demand
  • Natural Gas: therms
  • Water/Sewer/Irrigation: Gallons

DataHub does not support unit of measure conversions for reactive max measured demand and reactive total consumption values. This includes the following units of measure: kvar, kva, kvarh, kvah, mvarh, undefined, unit, and days.

DataHub also does not support standardizing the following units of measure: horsepower, residential cooling hecta liters, nm3/h, sm3, m3/h, kgh, and kg.


DataHub - Calendarization Features

To calendarize meter usages, Arcadia’s system compares the meter’s statements over time to identify if the utility provider prints full or partial measurement period start and end dates to measure the calendarized days in the period and to accurately calculate the daily usage value averages. ‘Full’ indicates that the start or end day is a full 24 hour day within a single statement. ‘Partial’ means that the utility provider reads the meter during business hours, and therefore, the period start or end date is not a full 24 hour day. Partial start and end dates are shared between two or more statements, and Arcadia splits partial period dates evenly when calculating the daily usage values.

These daily usage values enable the system to roll up these values into monthly, quarterly, or annual aggregated meter usage data. For the first statement available for a meter, Arcadia’s system assumes that the meter’s period start date is full. For the meter’s most recent statement, Arcadia’s system determines the mode from the three most recent statements and assigns the mode value to the period end date as either full or partial. For full period start and end dates, the period’s days are the date difference between the two dates including the last day in the period. For partial start and end dates, the period’s days are the date difference between the two dates including the last day in the period minus one.


Data Dictionary

The data dictionary includes all four tables and their fields. The tables available are Calendarized Monthly Usages, Calendarized Quarterly Usages, Calendarized Annual Usages, and Statement Availability Report. For each table, there are field details for field name, field example value, if the field can be null, and field description.

Measured Usages

Measured usage is the metered and measured amount of energy on a utility statement that is used for billing purpose. Utility statements leverage the measured usage for a meter and apply the meter’s readings, multiplier, constant factor and the conversion factor to calculate charge items. Measured usage implies actively metered events. Any meter volumes that are estimated due to utility provider logistical problems are still considered a measured usage event.

Measured usage and cited usage are mutually exclusive. Of particular importance here is recognizing that for Arcadia, measured usage is the foundation for ESG and carbon accounting (Scope 2). Any usage item designated as measured is declaring that observed consumption or demand events occurred in the relevant period.  Any usage metrics that are solely functions of the tariff or customer history and are not dynamically measured in the current statement would not be considered measured usage.

Cited Usages

Cited usage implies that the usage amount is used for billing purposes, but this usage does not represent a quantity of energy (e.g. real consumption or demand) that was actually used or measured by a meter. If a meter only contains cited usages, this event triggers an audit and does not store the meter and its cited usages in the ESG tables. Utility providers tend to cite usage elements as justification for each of the charges assessed in the monthly invoices. Any usage metrics that are functions of the tariff or your usage history and are not dynamically measured in the current statement are considered cited usage. Ratchet demands or annual usage totals are examples of cited usages. Cited usages do impact fees and billing, but these usages do not represent current or actual consumption of the meters.


Recipes