Scope of Work #1

New Issue

Open

opened 2022-10-10 22:01:09 +00:00 by jonny · 1 comment

jonny commented

2022-10-10 22:01:09 +00:00

Owner

Sketching out the projects I'm proposing to tackle:

Preliminary Work

Prepare to work, taking stock of what's needed, clarifying approach, and coordinating work.

Familiarize with lab systems:
Drafting software architecture:
Learning Rust:
Initial Coordination:

Linked Data Tool Development

Build a set of tools to

Declare data formats with linked data triplets
Ingest existing data stored in vernacular and standardized formats
Translate schema for existing formats, starting with neurodata without borders, to put them in a common space with vernacular formats.
Store data with an interchangeable set of I/O formats to be able to interface with different tools and formats.

These will be built as a series of smaller, independent, and modular components with clear interfaces rather than one massive project. Roughly, these would be

A library to model and manipulate triplet-based schemas, including import/export of major formats (OWL, JSON-LD, etc.)
A set of web-based widgets for interacting with triplet schema, acting as a frontend for ^ and serving as a basis for both researcher use in data ingestion but also web systems design for later p2p systems for sharing data/etc.
An I/O framework for declaring I/O interfaces with triplet schemas, allowing the binary representation of data indicated by instantiations of schemas to have arbitrary form as content-addressed SQL databases, HDF5 files, raw binary, etc.

These tools may also serve as the basis of the data management system for the mouse matrix experiment (and any other experiments that would like to participate!), making all of its data clean, indexable, and archive-ready at the time of (continuous) acquisition.

p2p-ld Development

Build a Linked Data-driven peer to peer protocol (as described in https://jon-e.net/infrastructure/ ) for data sharing to serve as the backbone for a new kind of scientific communications infrastructure.

Develop a protocol for an identity-based peer-to-peer system that uses Merkle triplet-DAGs to model versioned, authored schema and metadata that can indicate content-addressed binary data
Develop an implementation of the protocol
Develop a client for the protocol that can be trivially deployed
Develop vocabularies for defining data permissions and federations
Build a seed cluster of projects using the p2p system both within UCLA and throughout the larger research community.

Interoperability with Existing Projects

Coordinate work across related infrastructure projects like Open Ephys, Bonsai, and indexing projects like Open Behavior and Open Neuroscience to build a continuous space of tool development and information sharing!

Integrate data produced by Open Ephys, Miniscopes, and other projects as part of a generalized ingestion/export system.
Build interfaces for direct export to p2p system
Throughout all described work, do as much work in coordination with other groups as possible, building technical and social bridges across projects.
Investigate possibility of using Autopilot (See #4) as an integrative framework for mouse matrix project (or an autopilot-like framework) and continue work on writing integration systems for different projects.

Organizing an Infrastructure Movement

Of equal importance to developing the above technologies is the development of a broader infrastructural social movement. I also propose to facilitate a series of workshops and organizations in an attempt to encourage prolonged and mutual development of currently isolated projects into a larger infrastructural project that can displace extractive and rent-seeking information industries. This organizational project may serve to improve the state of academic work, but it should also be oriented towards using publicly funded research as a means of funding information technologies that benefit society broadly. Specifically, by building tools to improve our own ability to manage our data and communication systems, we should be seeking to build the infrastructure that allows people to live their lives free from the surveillance and manipulation of information giants.

I will be working to organize labs within our department and across disciplines and institutions to build a data sharing ecosystem, building a communication medium on top of it, and eventually working towards a plausible alternative to traditional scientific communication systems. This work will require communication and coordination with organizations far afield from neuroscience, including privacy activists, librarians, and hackers -- which will hopefully enrich our other work while also building much needed solidarity across disciplines.

This work is necessarily much more abstract and less certain than technological development, but I hope to approach with equal weight.

Sketching out the projects I'm proposing to tackle: # Preliminary Work Prepare to work, taking stock of what's needed, clarifying approach, and coordinating work. - **Familiarize with lab systems:** - **Drafting software architecture:** - **Learning Rust:** - **Initial Coordination:** # Linked Data Tool Development Build a set of tools to - **Declare** data formats with linked data triplets - **Ingest** existing data stored in vernacular and standardized formats - **Translate** schema for existing formats, starting with neurodata without borders, to put them in a common space with vernacular formats. - **Store** data with an interchangeable set of I/O formats to be able to interface with different tools and formats. These will be built as a series of smaller, independent, and modular components with clear interfaces rather than one massive project. Roughly, these would be - A library to model and manipulate triplet-based schemas, including import/export of major formats (OWL, JSON-LD, etc.) - A set of web-based widgets for interacting with triplet schema, acting as a frontend for ^ and serving as a basis for both researcher use in data ingestion but also web systems design for later p2p systems for sharing data/etc. - An I/O framework for declaring I/O interfaces with triplet schemas, allowing the binary representation of data indicated by instantiations of schemas to have arbitrary form as content-addressed SQL databases, HDF5 files, raw binary, etc. These tools may also serve as the basis of the data management system for the mouse matrix experiment (and any other experiments that would like to participate!), making all of its data clean, indexable, and archive-ready at the time of (continuous) acquisition. # p2p-ld Development Build a Linked Data-driven peer to peer protocol (as described in https://jon-e.net/infrastructure/ ) for data sharing to serve as the backbone for a new kind of scientific communications infrastructure. - Develop a **protocol** for an identity-based peer-to-peer system that uses Merkle triplet-DAGs to model versioned, authored schema and metadata that can indicate content-addressed binary data - Develop an **implementation** of the protocol - Develop a **client** for the protocol that can be trivially deployed - Develop **vocabularies** for defining data permissions and federations - Build a **seed cluster** of projects using the p2p system both within UCLA and throughout the larger research community. # Interoperability with Existing Projects Coordinate work across related infrastructure projects like Open Ephys, Bonsai, and indexing projects like Open Behavior and Open Neuroscience to build a continuous space of tool development and information sharing! - Integrate data produced by Open Ephys, Miniscopes, and other projects as part of a generalized ingestion/export system. - Build interfaces for direct export to p2p system - Throughout all described work, do as much work in coordination with other groups as possible, building technical and social bridges across projects. - Investigate possibility of using Autopilot (See https://git.jon-e.net/work/postdoc/issues/4) as an integrative framework for mouse matrix project (or an autopilot-like framework) and continue work on writing integration systems for different projects. # Organizing an Infrastructure Movement Of equal importance to developing the above technologies is the development of a broader infrastructural social movement. I also propose to facilitate a series of workshops and organizations in an attempt to encourage prolonged and mutual development of currently isolated projects into a larger infrastructural project that can displace extractive and rent-seeking information industries. This organizational project may serve to improve the state of academic work, but it should also be oriented towards using publicly funded research as a means of funding information technologies that benefit society broadly. Specifically, by building tools to improve our own ability to manage our data and communication systems, we should be seeking to build the infrastructure that allows people to live their lives free from the surveillance and manipulation of information giants. I will be working to organize labs within our department and across disciplines and institutions to build a data sharing ecosystem, building a communication medium on top of it, and eventually working towards a plausible alternative to traditional scientific communication systems. This work will require communication and coordination with organizations far afield from neuroscience, including privacy activists, librarians, and hackers -- which will hopefully enrich our other work while also building much needed solidarity across disciplines. This work is necessarily much more abstract and less certain than technological development, but I hope to approach with equal weight.

jonny added this to the Project Details project 2022-10-10 22:01:09 +00:00

jonny added the

detail

label 2022-10-10 23:46:57 +00:00

jonny added the

proposal

label 2022-10-10 23:51:59 +00:00

jonny referenced this issue

2022-10-11 01:44:46 +00:00

Contract #2

jonny referenced this issue

2022-10-11 01:50:41 +00:00

Coordinating with other labs and organizations #7

jonny added a new dependency 2022-10-11 02:23:50 +00:00

#8 IT Bureaucracy

jonny added a new dependency 2022-10-11 02:24:08 +00:00

#7 Coordinating with other labs and organizations

jonny added a new dependency 2022-10-11 02:24:37 +00:00

#2 Contract

daharoni commented

2022-10-11 04:47:02 +00:00

Collaborator

YES!

After your visit, I have been thinking a lot about what this all might look like. I need some time to organize my thoughts but then we should chat about what all of this would concretely look like to start.

Very roughly speaking, there are a bunch of topics these ideas address:

data storage
data sharing
social interactions
active documentation
framework for lab/research efforts to not get lost within and across labs
publication
and so on.

But these all basically reduce down to one, integrated solution. I would love to chat with you about how to tackle any of these individual projects vs directly tackling the one, integrated total solution.

YES! After your visit, I have been thinking a lot about what this all might look like. I need some time to organize my thoughts but then we should chat about what all of this would concretely look like to start. Very roughly speaking, there are a bunch of topics these ideas address: * data storage * data sharing * social interactions * active documentation * framework for lab/research efforts to not get lost within and across labs * publication * and so on. But these all basically reduce down to one, integrated solution. I would love to chat with you about how to tackle any of these individual projects vs directly tackling the one, integrated total solution.