Network Traffic Analysis of Google Bard
Google Bard is the latest addition to the growing competition of AI chatbots on the internet. It is developed to imitate conversations with humans, utilizing a mix of machine learning and natural language processing to provide practical and genuine responses to user inquiries. Initially launched as a web application within a particular geographic region, Bard has since gained immense popularity.
This blog will explain what is happening in the background when we are eagerly waiting for the answers to our questions and analyze the observed network traffic.
Network Traffic Analysis
The Bard is accessible in the form of a web application. There are multiple Google domains seen in the overall network capture. Let’s break the network activities into 4 parts and analyze where each of the hosts are used and their characteristics-
Login Management:
The main Bard page is secured behind “accounts.google.com”. The host is using QUIC Version-1. There are multiple request-responses seen on this host which is comparable to Google’s multi factor authentication.
The same host is seen at the end of the session when the user logs out.
Web Content
Once the user logs into the main site, the web content, including CSS, JS, and static content, begins to load. We have observed two primary hosts serving these contents:
-
fonts.googleapis.com - This host is utilizing QUIC Version-1 protocol, with a single QUIC stream being observed. A DNS request was also seen for this host.
-
gstatic.com – We have observed two hosts from the gstatic family- fonts.gstatic.com and www.gstatic.com. Both of these hosts are using QUIC Version-1 protocol as transport, with multiple QUIC streams observed. Leading DNS requests were also detected for the hosts.
Bard Chat Services
Chat is the main functionality in this service. The chat service is seen on a single host “bard.google.com”. However, two separate kinds of request are observed in this functionality-
batchexecute
Fig 1: Initial batchexecute request query
When the chat is opened two batchexecute requests are observed. This is a POST request, and the URL consists of the path /_/BardChatUi/data/batchexecute.
It denotes the UI name of the webapp is BardChat and a batch style RPC request is used.
The query string has some interesting information -
1. RPCID – ID which contains which function will be called on the server.
2. bl – It is the name of the backend service handling the request. In the traffic we are analyzing this field exposes the backend web-server name and version.
3. _reqid – This is an number on each request. We have observed similarity between the IDs on each successive request. The IDs change in the following fashion-
42621 -> 1 42621 -> 2 42621 -> 3 42621 (continued). Interesting fact is that this reqid pattern holds true even if the type of request (streamgenerate) changes on same host.
4. rt – This value is used to specify response formatting.
Let’s look at the payload of this request-
Fig 2: batchexecute request payload
The request body is a form with type application/x-www-form-urlencoded;charset=utf-8.
When decoded it has two key value pairs.
1. f.req – This contains an envelope approach to encapsulate multiple RPC requests. In this example we have only one RPC request in the innermost array. The first element is the rpcid (as shown in the figure 2 ‘otAQ7b’) we have seen in the request query parameter, the second element is the actual payload to be executed, and the last element is the order in which the payload will be processed.
2. at: It is probably some XSRF mitigation parameter. It has seen to be observed as a static value in all the bard.google requests.
In the response we have seen the rpcid again and the response that we received from executing the request payload. Also in the response the length of the payload was preceding the actual payload.
Fig 3: batchexecute response payload.
Streamgenerate
The chat traffic comes after two initial batchexecute request-responses.
Fig 4: streamgenerate request query
The request query is like the batchexecute example we have observed. The differences are -
1. The rpcid is not observed in the request.
2. The URL path is different and seen as /_/BardChatUi/data/assistant.lamda.BardFrontendService/StreamGenerate
Let’s look at the request body -
Fig 5: streamgenerate request body
The request body is of type application/x-www-form-urlencoded;charset=utf-8 like previous example. The form data also holds similar pattern. We can observe the chat question is visible in the f.req and the at contains the earlier static XSRF mitigation token.
The processed answer for the requests is seen on the response body.
Fig 6: streamgenerate response body
The structure of the body is similar to the earlier batchexecute response as well. The actual response array is preceded by the length of the response. We can observe three separate responses for each of the questions (as also seen in the UI draft answers.). Each of the responses is accompanied by some ids suggesting the responses are uniquely identifiable.
Analytics and logging
As with any other Google service, analytics and logger traffic was observed for Bard also. Let's look the hosts we have observed:
- play.google.com – We have observed this host present in two separate streams using TLS 1.3 and QUIC Version-1 respectively. Leading DNS record was observed for this host.
Fig 7: Data logging
During our observation, we noticed periodic logging with this host. After each chat request the log requests were seen. The POST requests contained an array of values, likely related to front-end performance logs, and the response also returned an array of values.
Fig 8: Response of play logs.
- www.google-analytics.com – This host was observed to be using QUIC Version-1 as transport Layer. We have found multiple requests during the chat session. The response was observed to be 204 (No Content Success).
- myactivity.google.com – This is a centralized location to view user activities across Google services. The host is using TLS 1.3 as transport layer. A leading DNS request was observed for this host.
- www.googletagmanager.com – This is an analytics tool to deploy and manage marketing and analysis tag on the web application. The host is using TLS 1.3 as transport layer. A leading DNS request was observed for this host.
Bard in BreakingPoint
The Google Bard has gained significant popularity on the internet, resulting in a large amount of related network traffic. If you're wondering how to test and calibrate your network equipment to ensure accuracy and resiliency against this traffic, then BreakingPoint Systems is the perfect solution for you.
The Keysight Application and Threat Intelligence (ATI) team have analysed the network traffic related to Google Bard and released a set of simulations in our ATI-2023-07 bi-weekly strikepack release.
Fig 9: Google Bard in BreakingPoint Systems
Keysight's Application and Threat Intelligence subscription provides daily malware and bi-weekly updates of the latest application protocols and vulnerabilities for use with Keysight test platforms. The ATI Research Centre continuously monitors threats as they appear in the wild. Customers of BreakingPoint now have access to attack campaigns for different advanced persistent threats, allowing them to test their currently deployed security control's ability to detect or block such attacks