Laufende Arbeiten

Bachelorarbeiten

A Summary of the Most Recent Code-Based Digital Signature Schemes based on Variations of the Syndrome Decoding Problem

Beschreibung

In this thesis, the student will study and summarize the most recent code-based digital signature schemes based on the syndrome decoding problem

Betreuer:

[identification] Implementation of identification with non-cryptographic hash functions

Stichworte:
universal hash identification

Beschreibung

Identification is a communication scheme that allows rate doubly exponential in the blocklemght, with the tradeoff that identities cannot be decoded (as messages do) but can only be verified.

The double exponential growth presents various challenges in the finite regime: there are heavy computational costs introduced at the encoder and decoder and heavy trade-offs between the error and the codes sizes.

The ultimate goal is to find a fast, reliable implementation while still achieving large code sizes.

Identification codes can be achieved by first removing the errors from the channel with regular transmission channel coding, and then sending a challenge though the corrected channel. For every identity i, The channenge is generated by picking a random input m and computing the corresponding output T_i(m) using a function T_i that depends on the identity. The challenge is then the pair m,T_i(m) and the receiver wanting to verify an identity j will verify whether j=i by testing the challenge. This is done by recomputing the output with T_j and verifying whether T_j(m)= T_i(m). The errors are reduced by ensuring that the various functions collide on a small fraction of the possible inputs.

It turns out that choosing good sets of funtions {T_i} is the same as choosing error-correction codes {c_i} with large distance, where now each codeword c_i defines a function by mapping positions m (sometimes called code locators) to symbols c_im of the codeword.
We can thus construct identification codes by choosing error-correction codes where we are only interested in the performance of the error correction encoders (we are not interested in the error-correction decoder or error-correction codes).

Your task will be implementing the identification codes described in

aiming at the fastest implementation, and testing their performance in comparison to other current implementations.

For reference, our previous work on identification based on Reed-Solomon and Reed-Muller code can be found at

The coding will be in Python/Sagemath.
The working language will be in English.

Environment: we collaborate with LTI. At LNT and LTI there is currently a lot of funding for research in identification. Therefore you will find a large group of people that might be available for discussion and collaboration.

Betreuer:

Implementation of model poisoning attacks in federated learning

Beschreibung

Federated learning is a machine learning paradigm where decentralized entities (clients) collaboratively learn using their private data. A central server acts as a coordinator of the learning process. Due to the sensitivity of the private data involved,  the data cannot be transferred. A salient problem of federated learning is the presence of malicious clients, which are clients that try to destroy the learning process. Malicious clients can do this by corrupting their data and/or by modifying their local model updates. The goal of this project is to understand how model poisoning attacks and defense strategies perform under different scenarios of federated learning using experiments. 

References: 

[1]- https://www.ndss-symposium.org/wp-content/uploads/ndss2021_6C-3_24498_paper.pdf

[2]- https://arxiv.org/pdf/1903.03936.pdf

[3]- https://arxiv.org/pdf/2304.00160.pdf

 

Voraussetzungen

  • Basic knowledge of machine learning
  • Python programming skills, knowledge of PyTorch is an advantage 

Kontakt

marvin.xhemrishi@tum.de

Betreuer:

Search for goldilock quantum devices

Beschreibung

DiVincenzo in 2000, provides a comprehensive framework for identifying the necessary requirements to achieve successful quantum computation. In this project, we search for ideal conditions necessary for buidling distributed quantum devices. 

Betreuer:

Masterarbeiten

Coding Methods for Composite DNA Synthesis

Beschreibung

DNA Data Storage

Data storage on DNA molecules is a promising approach for archiving massive data.

In classical DNA storage systems, binary information is encoded into sequences consisting of the four DNA bases {A, C, G, T}. The encoded sequences are used to generate DNA molecules called strands using the biochemical process of DNA synthesis. The synthesized strands are stored together in a tube. To retrieve the binary information, the strand must be read via DNA sequencing and decoded back into the binary representation.

The synthesis and the sequencing procedures are error-prone, and with the natural degradation of DNA they introduce errors to the DNA strands. To ensure data reliability, the errors have to be corrected by algorithms and error-correcting codes (ECCs).

A 5min video with an overview of DNA storage: https://youtu.be/r8qWc9X4f6k?si=Yzm5sOW-a6VDnBu3

Composite DNA

Recently, to allow higher potential information capacity, [1,2] introduced the DNA composite synthesis method. In this method, the multiple copies created by the standard DNA synthesis method are utilized to create composite DNA symbols, defined by a mixture of DNA bases and their ratios in a specific position of the strands. By defining different mixtures and ratios, the alphabet can be extended to have more than 4 symbols. More formally, a composite DNA symbol in a specific position can be abstracted as a quartet of probabilities {p_A, p_C, p_G, p_T}, in which p_X, 0 ≤ p_X ≤ 1, is the fraction of the base X in {A, C, G, T} in the mixture and p_A+p_C+ p_G+ p_T =1. Thus, to identify composite symbols it is required to sequence multiple reads and then to estimate p_A, p_C, p_G, p_T in each position.

Problem description

ECCs for DNA data storage differ in many aspects from classical error correction codes. In this model, new error type gain relevance, like deletions and insertions which affect the synchronization of the sequences. Especially for composite DNA data storage, these error types received only little attention.

The most related work to this problem was recently studied by Zhang et al. in [6]. The authors initiated the study of error-correcting codes for DNA composite. They considered an error model for composite symbols, which assumes that errors occur in at most t symbols, and their magnitude is limited by l. They presented several code constructions as well as bounds for this model. In this thesis, we want to analyse a different way to model the composite synthesis method and studies additional error models. We already have some results for substitution and single deletion errors. This thesis should focus on extending these to combinations of error models [5] or two deletions [3,4].

This should only roughly introduce the problem. No need to review all references. If you are interested, please reach out to me, and we can discuss a suitable direction for you.

References

[1] L. Anavy, I. Vaknin, O. Atar, R. Amit, and Z. Yakhini, “Data storage in DNA with fewer synthesis cycles using composite DNA letters,” Nat Biotechnol, vol. 37, no. 10, pp. 1229–1236, Oct. 2019, doi: 10.1038/s41587-019-0240-x.

[2] Y. Choi et al., “High information capacity DNA-based data storage with augmented encoding characters using degenerate bases,” Sci Rep, vol. 9, no. 1, Art. no. 1, Apr. 2019, doi: 10.1038/s41598-019-43105-w.

[3] V. Guruswami and J. Håstad, “Explicit Two-Deletion Codes With Redundancy Matching the Existential Bound,” IEEE Transactions on Information Theory, vol. 67, no. 10, pp. 6384–6394, Oct. 2021, doi: 10.1109/TIT.2021.3069446.

[4] J. Sima, N. Raviv, and J. Bruck, “Two Deletion Correcting Codes From Indicator Vectors,” IEEE Trans. Inform. Theory, vol. 66, no. 4, pp. 2375–2391, Apr. 2020, doi: 10.1109/TIT.2019.2950290.

[5] I. Smagloy, L. Welter, A. Wachter-Zeh, and E. Yaakobi, “Single-Deletion Single-Substitution Correcting Codes,” IEEE Transactions on Information Theory, pp. 1–1, 2023, doi: 10.1109/TIT.2023.3319088.

[6] W. Zhang, Z. Chen, and Z. Wang, “Limited-Magnitude Error Correction for Probability Vectors in DNA Storage,” in ICC 2022 - IEEE International Conference on Communications, Seoul, Korea, Republic of: IEEE, May 2022, pp. 3460–3465. doi: 10.1109/ICC45855.2022.9838471.

Voraussetzungen

Good knowledge and high interest in mathematics, especially

  • Linear algebra
  • Combinatorics

I highly recommend the channel coding lecture for this thesis.

Kontakt

Frederik Walter (frederik.walter@tum.de)

Betreuer:

Frederik Walter

Construction of Shaped Polar Codes for List Decoding

Beschreibung

We investigate code construction algorithms for SC list decoding of polar codes [1] based on methods like [2].

 

Betreuer:

Entropy Estimation and Compression Scheme for Wildfire Detection

Beschreibung

We apply information-theoretic perspectives from rate-distortion theory to optimize wildfire notifications from earth observation satellites.

Betreuer:

Constantin Runge - Martin Lülf (OroraTech GmbH)

Polymer Waveguide Fiber to the Room

Beschreibung

.

Betreuer:

Thomas Wiegart - Christian Bluemm (Huawei)

Post-Quantum Secure Signature Schemes based on the Lee Metric

Beschreibung

This work shall deal with post-quantum secure schemes utilizing the Lee metric. The student's task is to get familiar with this metric and to design a signature scheme. The student shall show that known attacks in the Hamming metric are not applicable for the Lee metric.

Voraussetzungen

Channel Coding

Betreuer:

Student

Stefan Ritterhoff

Code Construction for Restricted Errors

Stichworte:
code-based cryptography, restricted errors, code construction

Beschreibung

Due to the recent advances in quantum computers, the search for cryptosystems that survive quantum attacks is of great interest. Code-based cryptography is a promising candidate, since it is build on the NP-hard problem of decoding a random code [1].

Recently, different variants of the classical syndrome decoding problem (SDP) in the Hamming metric have been proposed [2,3].
The main reason for this is that it appears hard to build an efficient digital signature scheme around the classical SDP.
One such variant is the restricted syndrome decoding which was introduced in [2].

The goal of this the construction of codes for this error model, which has not been done before.
For this, a first approach is to follow the general idea given in [4].

Open questions are:

- discussion of the appropriate choice of the error set of a McEliece-type cryptosystem
- optimality bounds for codes in the restricted setting
- construction of short codes that are efficiently decodable and/or optimal
- construction of longer codes from short codes and the evaluation of their perfomance

 

 

 

References:

[1] Weger, V., Gassner, N., & Rosenthal, J. (2022). A Survey on Code-Based Cryptography. arXiv preprint arXiv:2201.07119.

[2] Baldi, M., Battaglioni, M., Chiaraluce, F., Horlemann-Trautmann, A. L., Persichetti, E., Santini, P., & Weger, V. (2020). A new path to code-based signatures via identification schemes with restricted errors. arXiv preprint arXiv:2008.06403.

[3] Baldi, M., Bitzer, S., Pavoni, A., Santini, P., Wachter-Zeh, A., & Weger, V. (2023). Generic Decoding of Restricted Errors. arXiv preprint arXiv:2303.08882.

[4] Rohweder, D., Freudenberger, J., & Shavgulidze, S. (2018). Low-density parity-check codes over finite Gaussian integer fields. In ISIT.

Voraussetzungen

Security in Communications and Storage

Channel Coding

 

 

 

Betreuer:

Event Cameras for Industrial Applications

Beschreibung

Compared with traditional frame-based cameras, an event-based camera has the advantages of low latency, high dynamic range, (almost) no motion blur, etc., and can respond fast to a brightness change at the image plane with thresholds determined by the previous state of brightness. It can generate event data of structural features without signal processing, such as edges and corners, which saves time, energy and computing effort.

However, it still lacks standard processes for analyzing, characterizing and evaluating event-based camera information. Moreover, data acquisition of this camera demands changes in brightness on a respective pixel. This can be introduced artificially by relative movement of the camera to the object. This might miss out part of the data due to linear translation along structural elements like edges or introduce some error due to error motion or vibration.

This thesis aims to ensure safety and quality when implementing event-based cameras in the field of industrial inspection, by dedicated experiments and related discussion of results. Event based camera imageswill be characterized and evaluated. New methods for event generation and signal processing will be proposed which will make use of the special characteristics of event-based cameras and their special characteristics arising from their working principle.

Betreuer:

Paraskevi Papadopoulou - Dr. Thomas Engel (Siemens)

AI-Aided LDPC Decoders

Beschreibung

In theory, the check nodes operations of belief propagation rely on tanh() and arctanh() functions which require high computational power. Hence, in most of the practical applications, an approximation called “MinSum” is used. This method aims to exploit the structure of tanh() function where the absolute value of output sufficiently converges to 1 with increasing absolute value of input. Hence, in a series multiplication of absolute outputs, the most dominant effect comes from the minimum element and other contributions are considered negligible. However, neglection of other attenuation factors cause overcalculated outputs in “MinSum” algorithm which can accumulated by multiple iterations. This drawback can be compensated through adding attenuation or/and offset factors. These factors are mostly iteration specific and intuitively determined, which means one factor which is determined by educated guess is applied to all leaving edges. However, every edge in an unfolded Tanner graph has its own unique identity corresponding to the previous nodes and edges that the message is transmitted.

 

In addition to approach aiming to close the performance gap between main algorithm and “MinSum ” approximation, we can intend to improve the qualities of main algorithm. Even though belief propagation decoding in LDPC codes is considered as highly successful, it is still a “suboptimal” method compared to very expensive but accurate Maximum A posteriori Probability (MAP) estimation. It means there might be some room for improvement in performance. Additionally, belief propagation requires multiple iterations to converge and the required number of iterations can dramatically increase by decreasing signal-to-noise ratio (SNR). Additional correction weights imposed on iterated messages can be a candidate to improve performance in overall.

 

5G specification for channel coding is using protograph based LDPC codes. Every node duplicated from same base matrix node is keen to show similar properties, it may be possible to use same weights for these nodes by preserving the good decoding results. This detail can help us to using additional correction weights by minimum additional memory burden.

Betreuer:

Emna Ben Yacoub - Wen Xu (Huawei)

Zero-error capacity for multi-user channels with feedback

Stichworte:
zero-error capacity, multi-user

Beschreibung

In this project the student should calculate the zero-error capacity for 

a multi-user model with feedback.

Voraussetzungen

Information theory

Betreuer:

Student

Gesong Xia

Forschungspraxis (Research Internships)

Automatic Generation of an Infotainment Data Access Layer for In-Vehicle Data Collector

Beschreibung

Modern vehicles boast advanced infotainment systems, delivering a range of services encompassing entertainment, navigation, comfort, and connectivity. These applications require extensive real-time data, necessitating efficient and reliable data processing from vehicle sensors and control units. With every vehicle launch and OS release, new sensors and APIs are introduced, which serve as important data points for high-level analytics and applications. Currently, these changes must be monitored and integrated manually, which is a demanding task prone to errors. The objective of this internship is to develop tools that automatically generate a data collector access layer for the platform APIs in every major OS release. This is done by analyzing the available APIs, followed by the creation of a data dependency tree based on the JSON format. An appropriate graph reduction algorithm is then applied to generate a data structure with minimum edge dependencies while preserving analyzability of the data points. An appropriate lookup schema is then created based on the reduced data points, which calls the requested API based on the supplied configuration. Python and C++ are the languages used within the scope of this project.

Betreuer:

Fleet data evaluation based on MDF files

Beschreibung

Whenever an automobile manufacturer develops a new car model, the aim is to surpass its predecessors in terms of performance, efficiency, safety, and overall user experience. In order to do so, logged data has to be analyzed to find out where improvements can be made. Whenever a vehicle returns from a test drive, the collected data will be stored as an MDF (Measurement Data Format) file. These files are then read and analyzed with a software tool, but as of now, only a few files can be processed simultaneously due to hardware limitations. This research internship explores how data can be efficiently organized to maximize analyzability. Eventually, a program should be created which allows users to analyze all MDF files a vehicle has produced in parallel. 

Betreuer:

Anna Baumeister - (BMW Group)

Privacy-preserving synchronization methods

Beschreibung

In this internship, the student will explore the literature on privacy-preserving synchronization methods and summarizes the findings.

Betreuer:

Effect of Redundancy in Distributed Learning

Stichworte:
Distributed Learning, Gradient Coding, Straggler tolerance

Beschreibung

In this project, we investigate the interplay between redundancy and straggler tolerance in distributed learning.

The setting is that of a main node distributing computational tasks to available workers as part of a machine learning algorithm, e.g., training a neural network. Waiting for all workers to return their computations suffers from the presence of stragglers, i.e., slow or unresponsive nodes. Mitigating the effect of the stragglers can be done through the use of redundancy or by leveraging the properties of the convergence of the machine learning algorithm. 

The goal of this work is to compare when redundancy is helpful. In this case, we aim to analyze the convergence speed with and without redundancy. Then, we compare the convergence as a function of time of all the schemes.

Further reading: 

R. Bitar, M. Wootters and S. El Rouayheb, Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning, IEEE Journal on Selected Areas in Information Theory (JSAIT), Vol. 1, No. 1, May 2020. arXiv:1905.05383

S. Kas Hanna, R. Bitar, P. Parag, V. Dasari and S. El Rouayheb, Adaptive Stochastic Gradient Descent for Fast and Communication-Efficient Distributed Learning, preprint, arXiv:2208.03134.

Voraussetzungen

Knowledge in the following topics:

  • Probability Theory
  • Gradient descent and stochastic gradient descent
  • Coding theory

Independence and motivation to work on a research topic

Knowledge of implementing neural networks is a plus

 

Kontakt

Dr. Rawad Bitar: rawad.bitar@tum.de

Betreuer:

Rate upper Bounds for Bandlimited Channels with a Memoryless Nonlinearity

Beschreibung

We are interested in computing upper bounds (on capacity) for bandlimited channels with a memoryless nonlineary at the transmitter/receiver.

One application for these bounds are short-reach fiber-optic communication systems with a single photodiode at the receiver. The photodiode is a memoryless nonlinearity, as it produces an output that is proportional to the squared magnitude of the input signal.

A simple upper bound for the above model is given in [Sec. III D, 2].

D. Plabst et al., "Achievable Rates for Short-Reach Fiber-Optic Channels With Direct Detection," in Journal of Lightwave Technology, vol. 40, no. 12, pp. 3602-3613, 15 June15, 2022, doi: 10.1109/JLT.2022.3149574.

 

 

Voraussetzungen

Information Theory

Linear System Theory

Betreuer:

Forschungspraxis

Stichworte:
Forschungspraxis

Beschreibung

Forschungspraxis

Betreuer:

Norbert Hanik - Dr. Tobias Fehenberger (ADVAoptical GmbH)

Error-Correction for Partially Stuck Memory Cells

Beschreibung

.

Betreuer:

Student

Venkatesh Satagopan

Ingenieurpraxis

Coding for DNA Microarrays

Beschreibung

Gene expression analysis

(From Wikipedia: https://en.wikipedia.org/wiki/Gene_expression)

Measuring gene expression is an important part of many life sciences, as the ability to quantify the level at which a particular gene is expressed within a cell, tissue or organism can provide a lot of valuable information. For example, measuring gene expression can:

  • Identify viral infection of a cell (viral protein expression).
  • Determine an individual's susceptibility to cancer (oncogene expression).
  • Find if a bacterium is resistant to penicillin (beta-lactamase expression).

Similarly, the analysis of the location of protein expression is a powerful tool, and this can be done on an organismal or cellular scale. Investigation of localization is particularly important for the study of development in multicellular organisms and as an indicator of protein function in single cells. Ideally, measurement of expression is done by detecting the final gene product (for many genes, this is the protein); however, it is often easier to detect one of the precursors, typically mRNA and to infer gene-expression levels from these measurements.

An approach for gene expression analysis is the hybridization microarray. A single array or "chip" may contain probes to determine transcript levels for every known gene in the genome of one or more organisms.

Here is a video explaining the concepts of gene expression analysis and DNA microarrays: https://www.youtube.com/watch?v=Hv5flUOsE0s

Relation to coding

The generation of the DNA microarray bears several interesting challenges from a coding perspective. The microarray needs to be generated, which means that artificial DNA strands are synthesized. Many of these DNA strands are synthesised simultaneously. However, only one of the four bases of the DNA (A, C, G, T) can be written at once. The challenge is to select the best possible sequence in which the bases are written to minimize the overall running time. This problem is called finding the Shortest Common Supersequence (SCS) and is unfortunately np-hard (https://en.wikipedia.org/wiki/Shortest_common_supersequence). Furthermore, there are more than one possible probes that can be used to signal a specific gene. Hence, this is another parameter to optimize.

Problem description

This thesis aims to analyze the problem of finding the best probes for a DNA microarray and implement an algorithm to run small-scale experiments. 

This should only roughly introduce the problem. No need to understand everything or review all references. If you are interested, please reach out to me, and we can discuss a suitable direction for you.

Related references

[1] M. Abu-Sini, A. Lenz, and E. Yaakobi, “DNA Synthesis Using Shortmers,” in 2023 IEEE International Symposium on Information Theory (ISIT), Jun. 2023, pp. 585–590. doi: 10.1109/ISIT54713.2023.10206609.

[2] A. Lenz, Y. Liu, C. Rashtchian, P. H. Siegel, A. Wachter-Zeh, and E. Yaakobi, “Coding for Efficient DNA Synthesis,” in 2020 IEEE International Symposium on Information Theory (ISIT), Jun. 2020, pp. 2885–2890. doi: 10.1109/ISIT44484.2020.9174272.

[3] D. Maier, “The Complexity of Some Problems on Subsequences and Supersequences,” J. ACM, vol. 25, no. 2, pp. 322–336, Apr. 1978, doi: 10.1145/322063.322075.

[4] K. Makarychev, M. Z. Rácz, C. Rashtchian, and S. Yekhanin, “Batch Optimization for DNA Synthesis,” IEEE Transactions on Information Theory, vol. 68, no. 11, pp. 7454–7470, Nov. 2022, doi: 10.1109/TIT.2022.3184903.

[5] N. Xie, S. Xu, and Y. Xu, “A new coding-based algorithm for finding closest pair of vectors,” Theoretical Computer Science, vol. 782, pp. 129–144, Aug. 2019, doi: 10.1016/j.tcs.2019.03.011.

Voraussetzungen

Good knowledge and high interest in mathematics, especially

  • Linear algebra
  • Combinatorics

Good programming knowledge is required for this topic. 

Kontakt

frederik.walter@tum.de

Betreuer:

Frederik Walter

Absicherung des elektrischen Antriebsstranges

Beschreibung

.

Betreuer:

Gerhard Kramer - (BMW Group)

Student

Joachim Sieburg

Erweiterung der Agabiz App um eine automatisierte Standort-Ermittlung und Fahrtzeitkontrolle mit Hilfe von iBeacon-/Bluetooth-Technologie

Beschreibung

.

Betreuer:

Student

Zhou Lu

Umsetzung einer frequenzselektiven IQ-Imbalanz Korrektur für OFDM Direct Conversion Receiver

Beschreibung

.

Betreuer:

Student

Lukas Danninger

translation of coded modulation library from Matlab/C into julia

Stichworte:
Matlab, C, C++, julia
Kurzbeschreibung:
It is the students task to translate function from MATLAB and C into julia language.

Beschreibung

the students should translate functions from Matlab and C to julia. the functions involve calculation of infomation theoretic quantities to basic function of a discrete time transmission chain.

The students task is to learn julia and matlab to a extend that the translation from one to the other language is possible. We furthermore expect the student to learn how to use git and gitlab for managing larger projects.

 

Voraussetzungen

basic programming knowledge

Betreuer:

Student

Houssem Baazoug

Ingenieurpraxis

Beschreibung

.

Betreuer:

Student

Xavier Oliva I Jürgens