Projects 项目
Selected technical work. 部分技术工作。
Environmental protection, fielded autonomy, heterogeneous cyber data, geoscience AI, and scientific systems. 涵盖环境保护、自主系统、异构网络安全数据、地球科学 AI 与科学计算系统。
Field AI · Metron · 2007-2013 现场 AI · Metron · 2007-2013
Knifefish
I was the lead machine-learning engineer for the U.S. Navy's Knifefish unmanned undersea vehicle program while at Metron. The work combined sonar, signal processing, machine learning, vendor coordination, and production-oriented system specifications.
在 Metron 期间,我担任美国海军 Knifefish 无人水下航行器项目的机器学习负责人。这个项目结合了声纳、 信号处理、机器学习、供应商协作和面向生产的系统规格。
The project remains a useful reference point for fielded AI: models had to operate against sensor limitations, environmental variation, uncertainty, and operator workflows, not benchmark performance alone.
该项目仍然是现场 AI 的重要参照:模型需要同时面对传感器限制、环境变化、不确定性和操作者流程, 而不只是 benchmark 表现。
Deep heterogeneous models · 2014-2025 深度异构模型 · 2014-2025
pysparkplug
I created and led pysparkplug from 2014 through 2025 to model complex heterogeneous data without reducing it to a single flat feature table.
In one malware-analysis example, SSL certificate records combined timestamps, issuer and subject strings, IP addresses, protocol fields, nested lists, categorical values, and missing values. hTSNE embeddings exposed structure associated with Gozi MITM, TorrentLocker C&C, Dridex C&C, Gootkit C&C, and broader malware C&C activity.
The same approach supported PCAP, code, trace routes, host and user activity, text, and other irregular sources where schema, sequence, graph, and categorical structure interact.
我于 2014 至 2025 年创建并主导 pysparkplug。它用于为复杂真实世界数据构建深度异构模型, 而不是将所有信息转换为单一的扁平特征表。
一个例子使用了恶意软件基础设施中的 SSL 证书记录。原始对象混合了时间戳、issuer 与 subject 字符串、 IP 地址、协议字段、嵌套列表、类别值和缺失值;hTSNE 将这些混合结构嵌入成可分离的模式, 对应 Gozi MITM、TorrentLocker C&C、Dridex C&C、Gootkit C&C 以及更广义的 Malware C&C 活动。
同样的建模思路也用于 PCAP、代码、trace routes、主机与用户行为、文本以及其他不规则数据源, 这些场景中 schema、序列、图结构与类别结构会同时出现。
import json
import sys
sys.path.append("..")
from pysp.utils.htsne import htsne
with open("ssl_certs.json", "rt") as fin:
ssl_certs, ssl_labels = json.load(fin)
emb_certs = htsne(ssl_certs)
# ssl_certs[2]
[
[2015, 12, 23, 7, 52, 6],
'C=XX, L=Default City, O=Default Company Ltd',
[193, 218, 145, 50],
'C=XX, L=Default City, O=Default Company Ltd',
1, '', '', '', '',
[2015, 12, 23, 7, 22, 11],
'',
[['XX'], None, ['Default', 'City'],
['Default', 'Company', 'Ltd'], None, None],
443,
[['XX'], None, ['Default', 'City'],
['Default', 'Company', 'Ltd'], None, None],
'C=XX, L=Default City, O=Default Company Ltd',
'TLS 1.2', 2, ''
]
Scientific AI · 2025- 科学智能 · 2025-
GeoGPT
At Zhejiang Lab, I work on GeoGPT and scientific AI systems for geoscience research. GeoGPT is a public community platform for geoscience research with more than 50,000 users.
The system supports research workflows involving literature, maps, figures, retrieval, tool use, personalized knowledge bases, and task-oriented analysis.
A central goal is Open Science: making geoscience evidence, methods, and analysis easier to inspect, reuse, and share.
在浙江实验室,我从事 GeoGPT 与地球科学智能系统相关工作。GeoGPT 已经发展为面向地球科学家的公开社区平台, 支持科研工作流中的 AI 使用,用户超过 5 万。
这个系统围绕科研流程构建:文献、地图、图像、检索、工具调用、个性化知识库与实际研究任务, 而不仅是一个聊天界面。
其中一个核心目标是开放科学:让地球科学证据、方法与分析更容易被检查、复用和共享。