KQL and Azure Data Explorer – Reference Guide

Recently I’ve been learning a new database query language, KQL which stands for “Kusto Query Language”. It is the language used by Azure Data Explorer, a tool in Microsoft’s Azure Cloud that helps query Kusto Databases. These databases run on server clusters so they are mighty as they can query up to petabytes of information in a very short time. I like the simplicity of the syntax, I wish I could use it to query SQL databases as the language is straightforward to read.

The KQL language is used mainly to analyze information coming from logs, which is huge and is constantly being streamed into these Kusto databases, but what caught my attention was that these databases are now part of the Microsoft Fabric suite of data analytics tools under the “Real Time Analytics” experience. This means that I can analyze information stored in parquet files stored in the Lakehouse or the Datawarehouse and via a direct link, I can connect it to the Kusto database in Fabric and analyze it from there. I have seen how easy it is to not only query these Kusto databases but also its ability to produce quick visual analysis for information of any size in a very efficient way.

This is a reference guide that I did for myself so I can remember how the main operators and functions work

Table of Contents

Basic Operators

OperatorExplanationExample
ExplainWhen placed on top an SQL query, it returns the equivalent of the code in KQL Explain
SELECT
COUNT(*)
FROM
SalesTable
projectWhen placed on top of a SQL query, it returns the equivalent of the code in KQL. It was the following variant:

project-away: (The select columns to exclude)

There is a variant of this function, that is used with the function series_stats(), it displays statistics for series in a table, with a column for each statistic

print x=dynamic([23, 46, 23, 87, 4, 8, 3, 75, 2, 56, 13, 75, 32, 16, 29]) | project series_stats(x)
SalesTable
| project
extendCreates calculated columnsSalesTable
| extend custom = SalesAmount * 0.95
| project SalesAmount, custom
countIt returns the number of rows in a tableSalesTable
| count
take / limitit limits the number of rows. Either take or limit work in the same waySalesTable
| limit 10
whereit works as the WHERE operator in SQL to filter rows depending on the predicate. We have to use double equal (==)to determine the conditionalSalesTable
| where Country == ‘France’
contains / hasit works as the LIKE operator in SQL. The main difference is that CONTAINS can search for partial words, whereas HAS can only search for complete words. It is case-insensitive, and it has the following variants:

!has: (The contrary of has)
has_cs : (For case-sensitive queries)
!has_cs: (For case-sensitive queries)
has_any : has_any (“CAROLINA”, “DAKOTA”, “NEW”)
has_all: has_all (“cold”, “strong”, “afternoon”, “hail”)
SalesTable
| where Country has ‘Fra’
distinct it works as the DISTINCT operator in SQL to select the unique instances of an element contained in a columnSalesTable
| distinct Country
init works as the IN operator in SQL. Filters a record set for data with a case-sensitive string. It is case-sensitive and it has the following variants:

!in
in~ (case-insensitive)
!in~ (case-insensitive)
SalesTable
| where Country in (‘France’, ‘Canada’)
startswith / ends withFilters a record set for data with a case-insensitive string starting or ending sequence. It has the following variants:

!startswith
startswith_cs
!endswith
endswith_cs
SalesTable
| where Country startswith “Fra”
order by / sort byWorks in the same way as SQL, except that the default order is DESCENDINGSalesTable
| distinct Country
| order by Country asc
searchSearches a text pattern in specific or multiple tables and columns. It is case insensitive and it can use wildcards with “*” and regex expressions

search kind=case_sensitive “x”
search columName == “x”
search columName: “x”
search columnName “*x”
search columnName “*x*”
SalesTable
| search Country : “*Fran*”
serializeMarks that the order of the input row set. It is safe to use for window functions.SalesTable
| serialize
topReturns the top rows based on a columnPerf
| top 100 by TimeGenerated
top-hittersReturns an approximation for the most popular distinct values, or the values with the largest sum, in the input.SalesTable
| top-hitters 3 of  Country by SalesAmount
top-nestedPerforms hierarchical aggregation and value selection, similar to top-hitters. It can aggregate the rest of the aggregation groups under “others” if defined:

SalesTable
| top-nested 2 of Country with others =”Other countries” by  avg(SalesAmount)

SalesTable
| top-nested 3 of Country by avg(SalesAmount)
summarizeProduces a table that aggregates the content of the input table. It can work along with these other functions:

* arg_max() / arg_min()
SalesTable
| summarize arg_max(Country, SalesAmount)

* percentile() / percentiles()
SalesTable
| summarize percentile(SalesAmount, 50)

* take_any()
SalesTable
| summarize take_any(Country)

*bin()
SalesTable
| summarize sum(SalesAmount) by bin(SalesAmount, 1000)

*make_set()
SalesTable
| summarize make_set(Country)

*make_list()
SalesTable
| summarize make_list(Country)
SalesTable
| summarize TotalSales = sum(SalesAmount) by Country
betweenFilters a record set for data matching the values in an inclusive range.
between can operate on any numeric, datetime, or timespan expression
SalesTable
| where DateKey between (now() .. ago(1d) )
letsets a variable name equal to an expression, function, or view. It must end with an “;”let myVar = ‘Hector’;
print myVar
datatableReturns a table whose schema and values are defined in the query itself. The table is similar to a JSON table, but instead of separating the keyvalue pairs by “:”, it uses “,”let myTable = datatable (name:string , age: int)
[
    “Micho”, 4,
    “Candy”, 6
];
myTable
| where name == ‘Micho’
join Merge the rows of two tables to form a new table by matching the values of the specified columns from each table.

These are the most common types of joins:
kind = inner
kind = leftouter
kind = rightouter
kind = fullouter
SalesTable
| join kind=inner Customers on CustomerKey
| project CityName, SalesAmount
unionTakes two or more tables and returns the rows of all of them. If using withsource=ColumnName adds a table with the name of the source tables:

union withsource=SourceTable table1, table2
union table1, table2

pivotPivot returns the rotated table with specified columns (column1column2, …) plus all unique values of the pivot columns. Each cell for the pivoted columns will contain the aggregate function computation.SalesTable
| evaluate pivot(Country, sum(SalesAmount), Gender)
mv-expandExpands multi-value dynamic arrays or property bags into multiple records. Very useful for the analysis of time-seriesdatatable (a: int, b: dynamic)
[
    1, dynamic([10, 20]),
    2, dynamic([‘a’, ‘b’])
]
| mv-expand b
externaldataReturns a table whose schema is defined in the query itself, and whose data is read from an external storage artifact, such as a blob in Azure Blob Storage or a file in Azure Data Lake Storage.let Orders = externaldata (OrderDate: datetime , Fruit: string , Weight: int, Customer: string , Sell: int)
[@”urlToExternalFile”]
with (format=”csv”);
Orders
reduceGroups a set of strings together based on value similarity.StormEvents
| reduce by State

Basic Functions

FunctionExplanationExample
strcat()Concatenates stringsprint strcat(“Hello”, ” “, “Micho”)
count()Counts the number of records per summarization groupSalesTable
| summarize count() by Country
isnull() / isempty()Checks if the record is null or empty

There are other functions that are a variant of these functions:

isnotempty()
isnotnull()
SalesTable
| where isnull(Country) and isempty(Country)
now()Returns the current UTC timeprint now()
ago()Subtracts the given timespan from the current UTC time

It accepts timespans in the following format:

1d
1h
1m
1s
1ms
SalesTable
| where DateKey < ago(1d)
datetime()Converts a date into datetime formatprint datetime(1974-02-02)
datetime_part()Extracts the requested date part as an integer value.

There are two variants of this function, that pick the month and year:

getmonth()
getyear()
print datetime_part(“Year”, now())
datetime_diff()Calculates the number of the specified periods between two datetime values.print datetime_diff(“Year”,now(), datetime(1974-02-02))
datetime_add()Calculates a new datetime from a specified period multiplied by a specified amount, added to, or subtracted from a specified datetime.print datetime_add(“Year”,1,now())
format_datetime()Formats a datetime according to the provided format.print format_datetime(now(), “yyyy-MM-dd”)
startofday() / endofday()Returns the start of the day containing the date. This function has other functions that provide similar functionality:

startofmonth()
endofmonth()
startofweek()
endofweek()
startofyear()
endofyear()
print startofweek(now())
iff()Returns the value of then if if evaluates to true, or the value of else otherwise.SalesTable
| extend Continent = iff(Country in (‘Germany’, ‘France’, ‘United Kingdom’), “Europe”, “Not Europe”)
| project Continent, Country
| limit 100
case()Simlar to iff() but we can expand different conditional scenariosSalesTable
| extend Continent =
case(Country == “France”, “Europe”,
        Country == “Germany”, “Europe”,
        Country == “UK”, “Europe”,  
“Not Europe”)
| project Continent, Country
| limit 100
split()Takes a string and splits it into substrings based on a specified delimiter,  returning the substrings in an array.let myString = “a,b,c,d”;
print split(myString, “,”)
parse()it breaks data into multiple columnslet myTable = datatable (student:string )
[‘Name:Hector, Age:49, Country:Mexico’];
myTable
| parse student with “Name:” Name “Age:” Age “Country:” Country
| project Name, Age, Country
parse_json()Interprets a string as a JSON value and returns the value as dynamiclet myVar='{“a”:123, “b”:456}’;
print parse_json(myVar)
dynamic()Function to treat value as dynamicprint o=dynamic({“a”:123, “b”:”hello”, “c”:[1,2,3], “d”:{}})
| extend a=o.a, b=o.b, c=o.c, d=o.d
tostring()Converts the input to a string representation. There are other similar functions to convert strings or numbers which are self-explanatory:

todecimal()
toint()
tolower()
toupper()
toscalar()
let myVar=123;
print tostring(myVar)
round()Returns the rounded number to the specified precisionlet myValue = round(1234567,2);
print myValue
range()Generates a single-column table of values.range timeValues from bin(now(), 1h) – 23h to bin(now(), 1h) step 1h
repeat()Generates a dynamic array containing a series comprised of repeated numbers.print repeat(1,5)
pack_array()Packs all input values into a dynamic array.range x from 1 to 3 step 1
| extend y = x * 2
| extend z = y * 2
| project pack_array(x, y, z)
row_rank_dense()This is part of the window functions group, and they work similarly to the ones in SQL. Returns the current row’s dense rank in a serialized row set.SalesFact
| summarize TotalSales = sum(SalesAmount) by DateKey
| order by TotalSales desc
| extend BestSalesRank = row_rank_dense(TotalSales)
row_cumsum()This is another window function that calculates the cumulative sum of a column in a serialized row set.SalesFact
| summarize TotalSales = sum(SalesAmount) by DateKey
| order by DateKey asc
| extend RunningTotal = row_cumsum(TotalSales, getmonth(DateKey) != getmonth(prev(DateKey)))
| order by DateKey asc, RunningTotal
sequence_detect()This is another plugin. Detects sequence occurrences based on provided predicates and as with any other plugin, it is invoked with the evaluate operatorStormEvents
| evaluate sequence_detect(
    StartTime,
    3d,
    3d,
    Rain = (EventType == “Heavy Rain”),
    Flood = (EventType == “Flood”),
    State)
convert series of functionsConverts data from one type of units of measure to another

convert_angle()
convert_energy()
convert_force()
convert_length()
convert_mass()
convert_volume()
convert_speed()
convert_temperature()
print result = convert_temperature(1.2, ‘Kelvin’, ‘DegreeCelsius’)

Management Commands

The functions in this next table are functions that are used to perform management tasks on the Cluster, Databases or Tables

CommandExplanationExample
.show clusterReturns a table with details on the nodes.show cluster
.show diagnosticsReturns a table with the health status of the cluster.show diagnostics
.show operationsReturns a table with the list of commands executed in the last 15 days.show operations
.show databases detailsReturns a table with the details of the databases in the cluster.show databases details
.show tablesReturns a set that contains the specified tables or all tables in the database..show tables
.show tableReturns a table with details about the specified table

There is a useful variant to get the schema of the table in a JSON object

.show table SalesFact schema as json
.show table SalesFact
.create tableCreates a table in a similar way as in SQL.create table myTable (
    id: int,
    Name: string,
    Country: string  
)
.drop tableDrops a table in a similar way as in SQL.drop table myTable
.show commands-and-queriesReturns a table with admin commands and queries on the cluster that have reached a final state. These commands and queries are available for 30 days.

There are variants of this command that produce a similar result:

.show commands
.show queries
.show commands-and-queries
.show ingestion failuresReturns all recorded ingestion failures

There is a variant that produces the same result but for streaming:

.show streamingingestion failures
.show ingestion failures

Visualization

The charts are produced with the “render” operator, which must be the last operator in the query and can only be used with queries that produce a single tabular data stream result.

More details about the render operator can be found in the Microsoft Documentation https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/renderoperator?pivots=azuredataexplorer

The basic syntax of render is the following:

Types of charts

These charts are produced after typing the name after the render operator. Some of these have parameters.

  • anomalychart
  • areachart
  • barchart
  • card
  • columnchart
  • ladderchart
  • linechart
  • piechart
  • pivotchart
  • scatterchart
  • stackedareachart
  • table
  • timechart
  • timepivot
  • treemap

Properties

Here is a list of the properties that can be used in charts. The most important is the “kind” property which is used to further elaborate on the visualization

  • accumulate
  • kind
  • legend
  • series
  • ymin
  • ymax
  • title
  • xaxis
  • column
  • title
  • yaxis
  • columns
  • ysplit
  • title
  • anomalycolumns (only for anomalychart)

Example1

let min_t = datetime(2017-01-05);
let max_t = datetime(2017-02-03 22:00);
let dt = 2h;
demo_make_series2
| make-series num=avg(num) on TimeStamp from min_t to max_t step dt by sid 
| where sid == 'TS1'   //  select a single time series for a cleaner visualization
| extend (anomalies, score, baseline) = series_decompose_anomalies(num, 1.5, -1, 'linefit')
| render anomalychart with(anomalycolumns=anomalies, title='Web app. traffic of a month, anomalies') 

Example2

StormEvents
| take 100
| project BeginLon, BeginLat
| render scatterchart with (kind = map)

Time Series Analysis

Time Series analysis allows us to look at trends, detect anomalies, outliers make forecasts. It is a specific way of analyzing a sequence of data points collected over an interval of time. It shows how variables change over time.

This analysis is done with the operator “make-series” which has the following syntax:

Just running the make-series operator is not enough to display the timeline chart, we also need to run the “mv-expand” operator so that it can expand into individual rows the initial results that are provided by the “make-series” operator.

let startTime1 = datetime('01-01-2007');
let endTime1 = datetime('01-01-2008');
let binSize = 1d;
StormEvents
| where StartTime between (startTime1 .. endTime1 )
| where State =~ 'Florida'
| make-series StormEvents = count() default = 0 on StartTime from startTime1 to endTime1 step binSize by EventType
| mv-expand StartTime to typeof(datetime ), StormEvents to typeof(int)
| render timechart 

Time Series Functions

FunctionExplanationExample
series_fir()Applies a Finite Impulse Response (FIR) filter on a series.range timeValues from bin( now(), 1h) – 23h to bin( now(), 1h) step 1h
| summarize timeValues = make_list(timeValues)
| project values = dynamic( [0, 0, 5, 10, 15, 20, 25, 100, 25, 20, 15, 10, 5, 0, 0] ), timeValues
| extend MovingAverage3Hour = series_fir(values, dynamic([1,1,1]))
| extend MovingAverage3HourCentered = series_fir(values, dynamic([1,1,1]), true, true)
| render timechart
series_moving_avg_fl()Applies a moving average filter on a series.range timeValues from bin( now(), 1h) – 23h to bin( now(), 1h) step 1h
| summarize timeValues = make_list(timeValues)
| project values = dynamic( [0, 0, 5, 10, 15, 20, 25, 100, 25, 20, 15, 10, 5, 0, 0] ), timeValues
| extend MovingAverage3Hour = series_moving_avg_fl(values, 3, true)
| render timechart
series_fit_line()Applies linear regression on a series, returning multiple columns.demo_series3
| extend (RSquare, Slope, Variance, RVariance, Interception, LineFit) = series_fit_line(num)
| render timechart
series_fit_2lines()Applies a two-segmented linear regression on a series, returning multiple columns.demo_series2
| extend series_fit_line(y), series_fit_2lines(y)
| render linechart with (xcolumn = x)
series_decompose()Applies a decomposition transformation on a series.let startTime = toscalar(demo_make_series1 | summarize min(TimeStamp));
let endTime = toscalar(demo_make_series1 | summarize max(TimeStamp));
let binSize = 1h;
demo_make_series1
| where Country == “United States”
| make-series RequestCount = count() default = 0 on TimeStamp from startTime to endTime step binSize by Country
| extend series_decompose(RequestCount, -1, ‘LineFit’)
| render timechart
series_decompose_anomalies()The function takes an expression containing a series (dynamic numerical array) as input, and extracts anomalous points with scores.Anomaly Detection is based on series decomposition. let startTime = toscalar(demo_make_series1 | summarize min(TimeStamp));
let endTime = toscalar(demo_make_series1 | summarize max(TimeStamp));
let binSize = 1h;
demo_make_series1
| where Country in (“United States”, “Greece”, “France”)
| make-series RequestCount = count() default = 0 on TimeStamp from startTime to endTime step binSize by Country
| extend anomalies = series_decompose_anomalies(RequestCount)
| render anomalychart with (anomalycolumns = anomalies)
series_decompose_forecast()Forecast based on series decomposition.
Takes an expression containing a series (dynamic numerical array) as input, and predicts the values of the last trailing points.
let ts=range t from 1 to 24*7*4 step 1 // generate 4 weeks of hourly data
| extend Timestamp = datetime(2018-03-01 05:00) + 1h * t
| extend y = 2*rand() + iff((t/24)%7>=5, 5.0, 15.0) – (((t%24)/10)*((t%24)/10)) + t/72.0 // generate a series with weekly seasonality and ongoing trend
| extend y=iff(t==150 or t==200 or t==780, y-8.0, y) // add some dip outliers
| extend y=iff(t==300 or t==400 or t==600, y+8.0, y) // add some spike outliers
| make-series y=max(y) on Timestamp from datetime(2018-03-01 05:00) to datetime(2018-03-01 05:00)+24*7*5h step 1h; // create a time series of 5 weeks (last week is empty)
ts
| extend y_forcasted = series_decompose_forecast(y, 24*7)  // forecast a week forward
| render timechart
series_multiply()Calculates the element-wise multiplication of two numeric series inputs.range x from 1 to 5 step 1
| extend y = x * 2
| extend multiply = series_multiply(pack_array(x), pack_array(y))
series_divide()Calculates the element-wise division of two numeric series inputs.range x from 1 to 5 step 1
| extend y = x * 2
| extend divide = series_divide(pack_array(x), pack_array(y))
series_greater()Calculates the element-wise greater (>) logic operation of two numeric series inputs.print array1 = dynamic([1,3,5]), array2 = dynamic([5,3,1])
| extend greater = series_greater(array1, array2)
series_acos()Calculates the element-wise arccosine function of the numeric series input.print array1 = dynamic([-1,0,1])
| extend arccosine = series_acos(array1)

Plugins

Invokes a service-side query extension (plugin).

The evaluate operator is a tabular operator that allows you to invoke query language extensions known as plugins. Unlike other language constructs, plugins can be enabled or disabled. Plugins aren’t “bound” by the relational nature of the language. In other words, they may not have a predefined, statically determined, output schema. One of the plugins is the pivot plugin, which is covered in one of the tables above, and there are many more like some of the ones used in Machine Learning, such as the “autocluster” and “basket” plugins.

Check more about Plugins at the Microsoft Documentation page: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/evaluateoperator

Geospatial Analysis

FunctionExplanationExample
geo_point_in_circle()Calculates whether the geospatial coordinates are inside a circle on Earth.StormEvents
| where isnotempty(BeginLat) and isnotempty(BeginLon)
| project BeginLon, BeginLat, EventType
| where geo_point_in_circle(BeginLon, BeginLat,-82.78, 27.97, 10000)
| render scatterchart with (kind = map)
geo_distance_point_to_line()Calculates the shortest distance in meters between a coordinate and a line or multiline on Earth.let southCoast = dynamic({“type”:”LineString”,”coordinates”:[[-97.185,25.997],[-97.580,26.961],[-97.119,27.955],[-94.042,29.726],[-92.988,29.821],[-89.187,29.113],[-89.384,30.315],[-87.583,30.221],[-86.484,30.429],[-85.122,29.688],[-84.001,30.145],[-82.661,28.806],[-82.814,28.033],[-82.177,26.529],[-80.991,25.204]]});
StormEvents
| project BeginLon, BeginLat, EventType
| where geo_distance_point_to_line(BeginLon, BeginLat, southCoast) < 5000
| render scatterchart with (kind = map)
geo_distance_2points()Calculates the shortest distance in meters between two geospatial coordinates on Earth.print distance_in_meters = geo_distance_2points(-122.407628, 47.578557, -118.275287, 34.019056)

References

Share