🪄Methodology
Last updated
Last updated
At Echo Analytics, we take pride in our data workflow, which is built on industry best practices and implementing our own algorithm rooted in machine learning methodologies.
Ingestion: When it comes to ingesting POI data for our location intelligence, we employ a multi-sourcing approach. There are two primary reasons for this:
Data validation: Cross-referencing data from multiple sources allows for data validation. Inconsistent or conflicting information can be identified and resolved, ensuring the data provided is reliable and trustworthy.
Dealing with complexity: POI data can be intricate. Businesses open, close, or relocate, and data types vary. Multi-sourcing allows us to stay up to date, allowing us to provide a diverse dataset.
Deduplication: Duplicates are a common occurrence in geospatial data domains, and this issue is magnified when we integrate data from multiple sources and undergo regular data refresh cycles. This phenomenon is perfectly normal in our data landscape, but its significance cannot be underestimated.
POI_NAME | Phone Number | Address | Latitude | Longitude |
---|---|---|---|---|
At Echo Analytics, one of our primary objectives is to provide POI with a comprehensive and highly valuable range of attributes. We achieve this by harnessing data from diverse sources.
Thanks to our machine learning algorithms, we can deliver precise, additional insights on brands and taxonomy.
Brands: Determining whether a given POI is affiliated with a brand is a complex topic. Relying on the specific information collected about the POI and the brands themselves, we establish a connection which is then used by our system to enrich the POI by appending key brand-related information, such as website links, stock tickers, and parent organization details.
Categories: Understanding the intricate industry landscape through POIs can be a complex task, especially given the multitude of existing standards. To simplify this, we've crafted our own taxonomy while also aligning with established industry standards like NAICS, ensuring our data is not only comprehensive but also universally accessible and adaptable for a wide array of applications.
How do we determine a place's uniqueness?
Echo_POI_ID: A location can have multiple names, addresses, or IDs across different datasets. Sometimes this information can be messy, change over time, or not be unique (e.g., multiple businesses at one address).
Example: Imagine a popular coffee shop located at 123 Main Street, which has undergone a few changes over the years:
Originally named “Cafe Main”,
It was later changed into “Cafe 123”.
Without a standardized identifier, it could be challenging to track these changes and connect the data accurately.
Therefore, we need to find a unique and persistent solution to identify a place over time, ensuring POI unique IDs are designed to possess certain key characteristics:
Uniqueness: The ID is a readable sequence of characters that serves as a distinctive identifier for a POI. This uniqueness is essential to avoid any confusion when working with various data sources.
Consistency: The ID remains consistent over time and across different updates from various sources. This ensures that it remains reliable regardless of changes in data or source variations.
Independence: The ID is not influenced by “Mix and Match” rules or deduplication processes. It stands independently, unaffected by these operations, providing a stable reference point.
Because we prefer using industry standards to simplify data adoption for our customers, we decided to implement a solution inspired by PlaceKey.
Why not PlaceKey? The PlaceKey solution has limited coverage worldwide while we currently offer POI across 62 countries.
Our Echo_POI_ID consists of two distinct parts:
“What”: Found on the left side of the '@' symbol, this part provides details about the actual Place name and its associated address. It helps us identify the POI.
“Where”: Located to the right of the '@' symbol, this part describes the specific H3 hexagons of resolution (e.g., 2, 5, and 10) where the Place is situated. This spatial information allows us to precisely locate a POI.
Addresses often come in myriad formats due to variations in how people and organizations write addresses. This diversity can include different abbreviations, spellings, and conventions, making it challenging to establish a consistent format. Address normalization offers several benefits, including:
Data Quality: It enhances our ability to comprehend and assess addresses, enabling us to rectify errors, standardize formats, and eliminate irregularities. This is fundamental for ensuring dependable location-based analysis.
Data Integration: When working with POI data from multiple sources, each source may have its own address format and conventions. Normalizing addresses standardizes the data, making it easier to integrate and compare information from various sources.
In order to provide a better insight to our customers, in December 2022 we decided to implement the first version of our address normalization:
What are the benefits of our POI? An improved data completeness for geographical attributes:
ZIP code: 90%
Admin boundary 1: 90%
Admin boundary 2: 100%
Admin boundary 3: 90%
Joe's Pizza
555-123-4567
123 Main St, Suite 101
40.7128
-74.0061
Pizzeria
555-123-4567
123 Main St, Suite 101
40.7130
-74.0067
Central Park
212-555-4321
59th St & 5th Ave
40.7648
-73.9720
Central Park
59th Street & Fifth Ave
40.7651
-73.9723
Coffee
555-789-0123
456 Java Ave
34.0522
-118.2437
Coffee Haven
555-789-0123
456 Java Avenue
34.0522
-118.2436